-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge all the work we did for PLDI 19 #28
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This command is useful for two reasons: 1. it evaluates exactly what the user types, and tests that everything actually works 2. it evaluates all beams returned by the server (usually 5) 3. it evaluates from a file rather than a fixed test set so you can pass arbitrary tests
"notify me when I have used more than a certain size on dropbox" "send an email to someone" "search something on bing" etc.
Make it used only from policies, because get predicates are dangerously confusing from programs
Defaults to off, pass `--flag-set policies` on the command-line to enable.
All primitives go into the training set
So we are comparable with the old dataset
A hashtag is a single word
Stuff should be typechecked by now
If the user says "--quoted-probability 0.0" there should be no quoted data left
Only allow exact-type parameter passing, no weird casting
It happens often if you use an old Thingpedia snapshot
Otherwise with the old snapshot we don't generate enough
Type annotations are on the chopping block, and we want to be able to evaluate programs without type annotations
To the point that it starts
And add progress/metric extraction
We are not doing any training in this script, it's just for anntoating sentences
decanlp can now read from tsv files directly
With some probability (configurable on the command line, defaults to 1%) use the fallback lists (tt:word, tt:short_free_text, tt:long_free_text) when replacing parameters. This increases the variance of the training set when specialized parameter lists are too short.
It returns a list of devices and a list of examples. almond-cloud/util/dataset.js takes the two lists and generate the final device list and pass it to gen_cheatsheet.js
Also sort the list by device name
- It takes a list of picture urls (for different cheatsheets generated) - The cheatsheet will be hidden once turker starts answering questions
Some more options to control
To account for ambiguity
So we can run the spotify experiments, with a thingpedia that only includes spotify
Makes no difference, but it's good to do anyway
It must be under the if statement
Wait until tomorrow. I have some messy code for preparing spotify/tacl/aggregation cheatsheet. I will clean it up and add it here. |
We should fix the tests I suspect 😂 |
Untyped strings don't help, and if any they cause confusion in evaluation
gcampax
added a commit
that referenced
this pull request
Jun 23, 2020
Builtin manifests should always use the embedded copy, to avoid mismatches and version skew between the client and the server. Fixes #28
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A lot of work.
We'll need to update the documentation on reproducibility as well.
@rayslxu I am assuming you will not want to go through it fully, but if you want I'll wait for you before merging.