Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCD2 and MCD3 specific data processing? #7

Closed
parasj opened this issue Jan 27, 2021 · 8 comments
Closed

MCD2 and MCD3 specific data processing? #7

parasj opened this issue Jan 27, 2021 · 8 comments

Comments

@parasj
Copy link

parasj commented Jan 27, 2021

Hi authors, @SivilTaram

I see there is some specialized logic to process the CFQ dataset for the MCD2 and MCD3 datasets. We are confused why this special path is present. Why did you add this special logic? What what the behavior if you preprocessed MCD2 and MCD3 with the MCD1 preprocessing code paths?

if query.startswith("Did M") or query.startswith("Was M") or query.startswith("Were M") or query.startswith("Was a"):
if type in ['mcd2', 'mcd3']:
nl_pattern = query.split()[0] +" " + query.split()[1]
terms.append((nl_pattern, [f'?x0#is#{query.split()[1]}'], (0, 1)))
else:
nl_pattern = query.split()[0] +" M"
terms.append((nl_pattern, ['?x0#is#M'], (0, 1)))

if candidate_term.count("M") == 1:
if candidate_term.startswith("?x0 is M") and split in ['mcd2', 'mcd3']:
candidate_triplets[candidate_skeleton] += [candidate_term]
else:
candidate_triplets[candidate_skeleton] += [''.join(candidate_term.replace("M", entity[0][0])) for entity in entities]

Thanks,
Paras

@linzeqipku
Copy link

Hi @parasj ,

Thanks for attention!

Let's start from a simple example.

Natural language question: Did M0 read M1 ?

The original logical form is:

M0 READ M1

We transform it into an equivalent logic form in which each path always starts with "?x0":

?x0 is_M M0
?x0 READ M1

As you can see, we introduce a new predicate is_M to this dataset.

For MCD1:

Here we explicitly wrote 4 lexicons ("Did M", "Was M", "Were M", and "Was a") for this predicate.
This is because: the former 3 lexicons can be found by data/generate_phrase_table.py, but the last one (i.e., "Was a") was missed. Therefore, we manually add it here.

For MCD2/3:

We found that it can significantly improve the performance if we chose to introduce more detailed predicates is_M0, is_M1, ..., is_M6, rather than just use a simple coarse predicate is_M.

For example, in this setup, the logic form will be:

?x0 is_M0
?x0 READ M1

This setup helps reduce the search space.

It brought good performance gain on MCD2/3, while not influencing the performance on MCD1 much.

Feel free to ping me if you have any more question~

@parasj
Copy link
Author

parasj commented Jan 27, 2021

Thank you @linzeqipku!

Another note -- we are struggling to replicate the results on the splits.

For example, in the paper, the sketch prediction module achieves 73% accuracy on the MCD3 split, but struggles to consistently obtain above 50% when we attempted to retrain the model with the default hyperparameters in the code and number of training epochs. It appears the model is very sensitive to the number of epochs used during training. There is a wide variance between initializations as well.

What hyperparameters did you use for each split and how did you select those? We are currently using the default hparams used for the default 'mcd1' in the code.

This work is very exciting, and we (@parasj and @GaiYu0) are interested in extending it. Would really appreciate your tips on how we can fix the training procedure.

@linzeqipku
Copy link

@parasj
We used default parameters.
Seems that the output of sketch_prediction/evaluate.sh is not what we want.
I've checked my logs. The output of sketch_prediction/evaluate.sh is ~0.3 while the overall accuracy is ~065 ...

I'll double-check with Yinuo (the first author) and reply you later.

@gyn0806
Copy link

gyn0806 commented Jan 28, 2021

@parasj
Hi parasj, I'm Yinuo, thanks for your attention!
There's a bug in the code for calculating sketch accuracy in sketch_prediction/evaluate.sh, like this
?x0 P M . ?x0 a M
?x0 a M ### ?x0 P M

and results reported in our paper calculated by another script which is not included in this repo.
We will fix the bug as soon as possible, thanks a lot.

@parasj
Copy link
Author

parasj commented Feb 2, 2021

@gyn0806 @linzeqipku Thank you for your reply!

Another question -- I am trying to replicate the HPD test accuracy results on MCD1, MCD2 and MCD3 and see that results depend on the epoch used for testing. Which epoch's checkpoint did you use for test set evaluation for each split?

@gyn0806
Copy link

gyn0806 commented Feb 2, 2021

hi, parasj,
we select the model for inference by the code below, using the model which performs best on dev set
https://github.com/microsoft/ContextualSP/blob/36dc4abfa9e525a04328a17615665aa23dc0bb31/poset_decoding/sketch_prediction/main.py#L70-L74

@SivilTaram
Copy link
Collaborator

@parasj Has Yinuo solved your problem?

@parasj
Copy link
Author

parasj commented Mar 12, 2021

@SivilTaram Closing this issue -- we got the information we needed! Thank you :)

@parasj parasj closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants