MCD2 and MCD3 specific data processing? #7

parasj · 2021-01-27T00:12:05Z

I see there is some specialized logic to process the CFQ dataset for the MCD2 and MCD3 datasets. We are confused why this special path is present. Why did you add this special logic? What what the behavior if you preprocessed MCD2 and MCD3 with the MCD1 preprocessing code paths?

ContextualSP/poset_decoding/preprocess_hierarchical_inference.py

Lines 61 to 67 in 73c8b03

    
           if query.startswith("Did M") or query.startswith("Was M") or query.startswith("Were M") or query.startswith("Was a"): 
        
           	if type in ['mcd2', 'mcd3']: 
        
           		nl_pattern = query.split()[0] +" " + query.split()[1] 
        
           		terms.append((nl_pattern, [f'?x0#is#{query.split()[1]}'], (0, 1))) 
        
           	else: 
        
           		nl_pattern = query.split()[0] +" M" 
        
           		terms.append((nl_pattern, ['?x0#is#M'], (0, 1)))

ContextualSP/poset_decoding/data/generate_phrase_table.py

Lines 731 to 735 in b4890da

    
           if candidate_term.count("M") == 1: 
        
           	if candidate_term.startswith("?x0 is M") and split in ['mcd2', 'mcd3']: 
        
           		candidate_triplets[candidate_skeleton] += [candidate_term] 
        
           	else: 
        
           		candidate_triplets[candidate_skeleton] += [''.join(candidate_term.replace("M", entity[0][0])) for entity in entities]

Thanks,
Paras

linzeqipku · 2021-01-27T09:44:53Z

Hi @parasj ,

Thanks for attention!

Let's start from a simple example.

Natural language question: Did M0 read M1 ?

The original logical form is:

M0 READ M1

We transform it into an equivalent logic form in which each path always starts with "?x0":

?x0 is_M M0
?x0 READ M1

As you can see, we introduce a new predicate is_M to this dataset.

For MCD1:

Here we explicitly wrote 4 lexicons ("Did M", "Was M", "Were M", and "Was a") for this predicate.
This is because: the former 3 lexicons can be found by data/generate_phrase_table.py, but the last one (i.e., "Was a") was missed. Therefore, we manually add it here.

For MCD2/3:

We found that it can significantly improve the performance if we chose to introduce more detailed predicates is_M0, is_M1, ..., is_M6, rather than just use a simple coarse predicate is_M.

For example, in this setup, the logic form will be:

?x0 is_M0
?x0 READ M1

This setup helps reduce the search space.

It brought good performance gain on MCD2/3, while not influencing the performance on MCD1 much.

Feel free to ping me if you have any more question~

parasj · 2021-01-27T23:36:05Z

Thank you @linzeqipku!

Another note -- we are struggling to replicate the results on the splits.

For example, in the paper, the sketch prediction module achieves 73% accuracy on the MCD3 split, but struggles to consistently obtain above 50% when we attempted to retrain the model with the default hyperparameters in the code and number of training epochs. It appears the model is very sensitive to the number of epochs used during training. There is a wide variance between initializations as well.

What hyperparameters did you use for each split and how did you select those? We are currently using the default hparams used for the default 'mcd1' in the code.

This work is very exciting, and we (@parasj and @GaiYu0) are interested in extending it. Would really appreciate your tips on how we can fix the training procedure.

linzeqipku · 2021-01-28T00:35:27Z

@parasj
We used default parameters.
Seems that the output of sketch_prediction/evaluate.sh is not what we want.
I've checked my logs. The output of sketch_prediction/evaluate.sh is ~0.3 while the overall accuracy is ~065 ...

I'll double-check with Yinuo (the first author) and reply you later.

gyn0806 · 2021-01-28T07:46:51Z

@parasj
Hi parasj, I'm Yinuo, thanks for your attention!
There's a bug in the code for calculating sketch accuracy in sketch_prediction/evaluate.sh, like this
?x0 P M . ?x0 a M
?x0 a M ### ?x0 P M

and results reported in our paper calculated by another script which is not included in this repo.
We will fix the bug as soon as possible, thanks a lot.

parasj · 2021-02-02T01:14:52Z

@gyn0806 @linzeqipku Thank you for your reply!

Another question -- I am trying to replicate the HPD test accuracy results on MCD1, MCD2 and MCD3 and see that results depend on the epoch used for testing. Which epoch's checkpoint did you use for test set evaluation for each split?

gyn0806 · 2021-02-02T01:41:41Z

hi, parasj,
we select the model for inference by the code below, using the model which performs best on dev set
https://github.com/microsoft/ContextualSP/blob/36dc4abfa9e525a04328a17615665aa23dc0bb31/poset_decoding/sketch_prediction/main.py#L70-L74

SivilTaram · 2021-03-12T09:18:29Z

@parasj Has Yinuo solved your problem?

parasj · 2021-03-12T18:32:33Z

@SivilTaram Closing this issue -- we got the information we needed! Thank you :)

parasj closed this as completed Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MCD2 and MCD3 specific data processing? #7

MCD2 and MCD3 specific data processing? #7

parasj commented Jan 27, 2021

linzeqipku commented Jan 27, 2021

parasj commented Jan 27, 2021

linzeqipku commented Jan 28, 2021

gyn0806 commented Jan 28, 2021

parasj commented Feb 2, 2021

gyn0806 commented Feb 2, 2021 •

edited

SivilTaram commented Mar 12, 2021

parasj commented Mar 12, 2021

MCD2 and MCD3 specific data processing? #7

MCD2 and MCD3 specific data processing? #7

Comments

parasj commented Jan 27, 2021

linzeqipku commented Jan 27, 2021

parasj commented Jan 27, 2021

linzeqipku commented Jan 28, 2021

gyn0806 commented Jan 28, 2021

parasj commented Feb 2, 2021

gyn0806 commented Feb 2, 2021 • edited

SivilTaram commented Mar 12, 2021

parasj commented Mar 12, 2021

gyn0806 commented Feb 2, 2021 •

edited