-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wip/synthetic data sampler #871
Conversation
3c12623
to
ba2a5de
Compare
fix fix
ba2a5de
to
decbbe7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, let's get those addressed first before moving on.
0e88633
to
cfedd42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good. There are some minor issues. Also I just realize we should add support for input param for queries as well... but let's not worry about that in this pr. Let's get this merged first.
tool/sample-synthetic-data.ts
Outdated
return examples; | ||
} | ||
|
||
function generateActionExamplesByPOS(query : Ast.FunctionDef, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename query
to action
in this function?
fix comment out log and update default output path update generateActionExamplesByPOS fix generateActionAst
cfedd42
to
20a7d81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks good.
One minor issue: you still have a lot of query
, queryCanonical
inside action functions, which will be confusing in the future. Please fix those.
Make sure you have eyeballed the output and they look good (or even better, add some unit test).
After that, you can merge this branch.
Background
This is a simple synthetic data sampler that is intended to help the developer debug the canonical annotations to make sure that they fit well in the templates. The main function is to synthesize a small set of utterance-ThingTalk pairs and showcase the synthetic sentences from the canonical annotations.
The output file is in tsv format, where each row contains the id, utterance, and ThingTalk representation for one example.
Output file example:
TODO
Known issues