Merge all the work we did for PLDI 19 #28

gcampax · 2019-04-03T00:23:14Z

A lot of work.

We'll need to update the documentation on reproducibility as well.

@rayslxu I am assuming you will not want to go through it fully, but if you want I'll wait for you before merging.

This command is useful for two reasons: 1. it evaluates exactly what the user types, and tests that everything actually works 2. it evaluates all beams returned by the server (usually 5) 3. it evaluates from a file rather than a fixed test set so you can pass arbitrary tests

"notify me when I have used more than a certain size on dropbox" "send an email to someone" "search something on bing" etc.

Make it used only from policies, because get predicates are dangerously confusing from programs

Defaults to off, pass `--flag-set policies` on the command-line to enable.

All primitives go into the training set

Cause why not?

So we are comparable with the old dataset

A hashtag is a single word

Stuff should be typechecked by now

If the user says "--quoted-probability 0.0" there should be no quoted data left

Only allow exact-type parameter passing, no weird casting

It happens often if you use an old Thingpedia snapshot

Otherwise with the old snapshot we don't generate enough

Type annotations are on the chopping block, and we want to be able to evaluate programs without type annotations

To the point that it starts

And add progress/metric extraction

We are not doing any training in this script, it's just for anntoating sentences

decanlp can now read from tsv files directly

With some probability (configurable on the command line, defaults to 1%) use the fallback lists (tt:word, tt:short_free_text, tt:long_free_text) when replacing parameters. This increases the variance of the training set when specialized parameter lists are too short.

It returns a list of devices and a list of examples. almond-cloud/util/dataset.js takes the two lists and generate the final device list and pass it to gen_cheatsheet.js

Also sort the list by device name

- It takes a list of picture urls (for different cheatsheets generated) - The cheatsheet will be hidden once turker starts answering questions

Some more options to control

To account for ambiguity

So we can run the spotify experiments, with a thingpedia that only includes spotify

Experimental

Makes no difference, but it's good to do anyway

Why not?

It must be under the if statement

sileix · 2019-04-03T04:48:45Z

A lot of work.

We'll need to update the documentation on reproducibility as well.

@rayslxu I am assuming you will not want to go through it fully, but if you want I'll wait for you before merging.

Wait until tomorrow. I have some messy code for preparing spotify/tacl/aggregation cheatsheet. I will clean it up and add it here.

gcampax · 2019-04-03T05:31:01Z

We should fix the tests I suspect 😂

Untyped strings don't help, and if any they cause confusion in evaluation

Builtin manifests should always use the embedded copy, to avoid mismatches and version skew between the client and the server. Fixes #28

gcampax and others added 30 commits February 24, 2019 21:22

Add script to evaluate based on a file of predictions

1dcbc75

Add templates with placeholders

f945686

"notify me when I have used more than a certain size on dropbox" "send an email to someone" "search something on bing" etc.

Fix checking for entities

9ed4128

Add manual_train.js from almond-cloud to genie toolkit

4c1f507

Change output format of evaluate-server to make it machine parseable

09b7ace

Allow additional comment when dropping an example

cb79a3a

constructs: move get predicates into a new non-terminal

f391c77

Make it used only from policies, because get predicates are dangerously confusing from programs

constructs: put policies and remote_programs behind a flag

bbc19ba

Defaults to off, pass `--flag-set policies` on the command-line to enable.

Don't generate placeholdered "count" parameters

6cd396f

Allow replacing constant $-placeholders with "some X"

c1b4ddd

Hide aggregation templates behind a flag

b061e55

DatasetSplitter: split new-combinations correctly

6bf3b73

All primitives go into the training set

english/postprocess: randomly add "." at the end of sentences

d5b404b

Cause why not?

Restore old blowup factors

67c1e46

So we are comparable with the old dataset

Use tt:word for hashtag by default

ec644ce

A hashtag is a single word

ReplaceParameters: die hard if a function is missing from Thingpedia

b739b7e

Stuff should be typechecked by now

ReplaceParameters: try harder to avoid QUOTED_STRING if we don't want it

80626a8

If the user says "--quoted-probability 0.0" there should be no quoted data left

templates: fix parameter passing

a888f3b

Only allow exact-type parameter passing, no weird casting

SentenceGenerator: survive with non-type-checking primitive templates

3078ed7

It happens often if you use an old Thingpedia snapshot

SentenceGenerator: fix filtering of @Builtin.*

c821c51

SentenceGenerator: restore old worst-case pruning parameters

0985a1e

Otherwise with the old snapshot we don't generate enough

evaluate-file: add option to return csv format

b001246

evaluators: strip out type annotations

2cb2ba1

Type annotations are on the chopping block, and we want to be able to evaluate programs without type annotations

Add experimental decanlp backend for training

4800bdf

Fix decanlp training backend

a7babb0

To the point that it starts

Fix decanlp training harder

b48f0c2

And add progress/metric extraction

Rename manual-train to manual-annotate

7407e4b

We are not doing any training in this script, it's just for anntoating sentences

training/decanlp: remove datagen step

1ac6c62

decanlp can now read from tsv files directly

sileix and others added 19 commits March 13, 2019 15:04

Fix a minor bug in sampler

ead15b2

Add interface to generate cheatsheet from file_thingpedia_client

58154c0

It returns a list of devices and a list of examples. almond-cloud/util/dataset.js takes the two lists and generate the final device list and pass it to gen_cheatsheet.js

Give genCheatsheet the option to randomly pick utterances (or not)

e2646e0

Also sort the list by device name

Add mturk page for collecting cheatsheet data

688240d

- It takes a list of picture urls (for different cheatsheets generated) - The cheatsheet will be hidden once turker starts answering questions

Minor fixes for paraphrase and validation turking page

49db2d6

wip: more templates

877d5d0

Tiny tweaks to the english templates

9c44fbb

Tweaks to the construct templates around placeholders

5b79141

Some more options to control

predictor: support decanlp with grammar

1d67817

Evaluator: allow multiple gold programs per sentence

26923ef

To account for ambiguity

thingtalk.genie: push timers more

f005c7e

add predict subcommand

c2ddc44

SentenceGenerator: survive without the standard schemas

6532388

So we can run the spotify experiments, with a thingpedia that only includes spotify

add spotify/aggregate split strategies

9bc2345

predictor: support almond_with_thingpedia_as_context decanlp task

705cd55

Experimental

evaluator: normalize keyword parameters

95c1728

Makes no difference, but it's good to do anyway

misc bug fixes

97c1c0d

Use placeholder templates for enums too

c16336d

Why not?

fix @builtin.say with spotify

c55f2fe

It must be under the if statement

gcampax and others added 5 commits April 3, 2019 10:37

save thingpedia.json when training

f22901a

Add html for spotify/tacl/aggregation turking

3c586df

augment: change default untyped string probability to 0

1ef0281

Untyped strings don't help, and if any they cause confusion in evaluation

fix linter

a316ee8

update tests and make them pass

f603932

gcampax merged commit dd7f383 into master Apr 3, 2019

gcampax deleted the wip/evaluate-server branch April 3, 2019 22:11

gcampax mentioned this pull request Apr 5, 2019

DatasetSplitter should not put equivalent sentences in the evaluation sets #9

Closed

gcampax added a commit that referenced this pull request Jun 23, 2020

Embed builtin manifests

7762fd5

Builtin manifests should always use the embedded copy, to avoid mismatches and version skew between the client and the server. Fixes #28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge all the work we did for PLDI 19 #28

Merge all the work we did for PLDI 19 #28

gcampax commented Apr 3, 2019

sileix commented Apr 3, 2019

gcampax commented Apr 3, 2019

Merge all the work we did for PLDI 19 #28

Merge all the work we did for PLDI 19 #28

Conversation

gcampax commented Apr 3, 2019

sileix commented Apr 3, 2019

gcampax commented Apr 3, 2019