Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip/synthetic data sampler #871

Merged
merged 6 commits into from
Mar 1, 2022
Merged

Wip/synthetic data sampler #871

merged 6 commits into from
Mar 1, 2022

Conversation

jmhw0123
Copy link
Contributor

@jmhw0123 jmhw0123 commented Jan 31, 2022

Background

This is a simple synthetic data sampler that is intended to help the developer debug the canonical annotations to make sure that they fit well in the templates. The main function is to synthesize a small set of utterance-ThingTalk pairs and showcase the synthetic sentences from the canonical annotations.
The output file is in tsv format, where each row contains the id, utterance, and ThingTalk representation for one example.

Output file example:

yelp-000	What is the name of the restaurant?	[ id ] of @com.yelp . restaurant ( ) ;
yelp-001	What is the restaurant 's name?	[ id ] of @com.yelp . restaurant ( ) ;
yelp-002	What name does the restaurant have?	[ id ] of @com.yelp . restaurant ( ) ;

TODO

  • Add support for generating synthetic data for specific input params

Known issues

  • Some utterance generations were not handled properly. e.g.
{
  query: 'restaurant',
  queryCanonical: 'restaurant',
  argument: 'price',
  utterance: 'Show me a restaurant with cheap.',
  value: 'cheap',
  paraphrases: []
},
{
  query: 'restaurant',
  queryCanonical: 'restaurant',
  argument: 'price',
  utterance: 'which restaurant has cheap?',
  value: 'cheap',
  paraphrases: []
},
{
  query: 'restaurant',
  queryCanonical: 'restaurant',
  argument: 'price',
  utterance: 'which restaurant is ${value:select: moderate{moderately priced} _{${value}}} luxury?',
  value: 'luxury',
  paraphrases: []
}

@stanford-oval stanford-oval deleted a comment from lgtm-com bot Feb 15, 2022
fix

fix
Copy link
Member

@sileix sileix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, let's get those addressed first before moving on.

tool/genie.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
tool/synthetic-data-sampler.ts Outdated Show resolved Hide resolved
@jmhw0123 jmhw0123 force-pushed the wip/synthetic-data-sampler branch 3 times, most recently from 0e88633 to cfedd42 Compare February 21, 2022 16:43
@jmhw0123 jmhw0123 requested a review from sileix February 22, 2022 01:07
Copy link
Member

@sileix sileix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good. There are some minor issues. Also I just realize we should add support for input param for queries as well... but let's not worry about that in this pr. Let's get this merged first.

tool/sample-synthetic-data.ts Outdated Show resolved Hide resolved
tool/sample-synthetic-data.ts Outdated Show resolved Hide resolved
return examples;
}

function generateActionExamplesByPOS(query : Ast.FunctionDef,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename query to action in this function?

tool/sample-synthetic-data.ts Show resolved Hide resolved
fix

comment out log and update default output path

update generateActionExamplesByPOS

fix generateActionAst
Copy link
Member

@sileix sileix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good.
One minor issue: you still have a lot of query, queryCanonical inside action functions, which will be confusing in the future. Please fix those.
Make sure you have eyeballed the output and they look good (or even better, add some unit test).
After that, you can merge this branch.

@jmhw0123 jmhw0123 merged commit 0846db3 into master Mar 1, 2022
jmhw0123 added a commit that referenced this pull request Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants