Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge all the work we did for PLDI 19 #28

Merged
merged 66 commits into from
Apr 3, 2019
Merged

Conversation

gcampax
Copy link
Contributor

@gcampax gcampax commented Apr 3, 2019

A lot of work.

We'll need to update the documentation on reproducibility as well.

@rayslxu I am assuming you will not want to go through it fully, but if you want I'll wait for you before merging.

gcampax and others added 30 commits February 24, 2019 21:22
This command is useful for two reasons:

1. it evaluates exactly what the user types, and tests that everything
   actually works
2. it evaluates all beams returned by the server (usually 5)
3. it evaluates from a file rather than a fixed test set so you can
   pass arbitrary tests
"notify me when I have used more than a certain size on dropbox"
"send an email to someone"
"search something on bing"
etc.
Make it used only from policies, because get predicates are
dangerously confusing from programs
Defaults to off, pass `--flag-set policies` on the command-line
to enable.
All primitives go into the training set
So we are comparable with the old dataset
A hashtag is a single word
If the user says "--quoted-probability 0.0" there should be no
quoted data left
Only allow exact-type parameter passing, no weird casting
It happens often if you use an old Thingpedia snapshot
Otherwise with the old snapshot we don't generate enough
Type annotations are on the chopping block, and we want to be able
to evaluate programs without type annotations
To the point that it starts
And add progress/metric extraction
We are not doing any training in this script, it's just for anntoating sentences
decanlp can now read from tsv files directly
With some probability (configurable on the command line, defaults to 1%)
use the fallback lists (tt:word, tt:short_free_text, tt:long_free_text)
when replacing parameters.

This increases the variance of the training set when specialized
parameter lists are too short.
sileix and others added 19 commits March 13, 2019 15:04
It returns a list of devices and a list of examples.
almond-cloud/util/dataset.js takes the two lists and generate the final device list
and pass it to gen_cheatsheet.js
- It takes a list of picture urls (for different cheatsheets generated)
- The cheatsheet will be hidden once turker starts answering questions
So we can run the spotify experiments, with a thingpedia that only
includes spotify
Makes no difference, but it's good to do anyway
It must be under the if statement
@sileix
Copy link
Member

sileix commented Apr 3, 2019

A lot of work.

We'll need to update the documentation on reproducibility as well.

@rayslxu I am assuming you will not want to go through it fully, but if you want I'll wait for you before merging.

Wait until tomorrow. I have some messy code for preparing spotify/tacl/aggregation cheatsheet. I will clean it up and add it here.

@gcampax
Copy link
Contributor Author

gcampax commented Apr 3, 2019

We should fix the tests I suspect 😂

@gcampax gcampax merged commit dd7f383 into master Apr 3, 2019
@gcampax gcampax deleted the wip/evaluate-server branch April 3, 2019 22:11
gcampax added a commit that referenced this pull request Jun 23, 2020
Builtin manifests should always use the embedded copy, to avoid
mismatches and version skew between the client and the server.

Fixes #28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants