This repo contains data for plot formatting actions in VegaLite, which you can see in the viewer. You can find the procssed data in the releases.
To see the last five lines of processed data, try jq . plot-data.sample.jsonl
The processed data is inside ./data, which is generated by make from the content of hits.
URL of procssed: https://raw.githubusercontent.com/stanfordnlp/plot-data/master/data/plot-data.jsonl
-
The
contextIdfield ofplot-data.jsonlcorresponds to items incontexts.json, where 47 different context plots from VegaLite examples are used. -
contexts: https://raw.githubusercontent.com/stanfordnlp/plot-data/master/data/contexts.json
-
statistics: https://raw.githubusercontent.com/stanfordnlp/plot-data/master/data/stats.json
-
querylog form: https://raw.githubusercontent.com/stanfordnlp/plot-data/master/data/query.jsonl
- First, deploy speaker HITs:
python mturk/create_speaker_hit.py --num-hit 10 --num-assignment 5, optionally--is-sandbox- This creates
hits/timestamp/speaker.HITs.txt, andspeaker.sample_hitand deploys the HITs - note that assignment_ids are only available once someone works on the hit
- run
make speaker.assignmentsto check if these are completed
- This creates
- In
Makefileset theSPEAKER_EXECvariable to correspond to where the server log is located make speaker.jsonlto filter and process the data, andmake speaker.reviewto approve and reject hits- Restart the server and use the previous speaker data as
VegaResources.examplesPathwhich selects randomly from the specified examples as the listeners - Run
python mturk/create_listener_hit.py hits/SPEAKER_HIT --num-hit 10 --num-assignment 5optionally--is-sandbox- Wait for these HITs to complete,
make listener.assignmentsto check andmake listener.reviewto approve
- Wait for these HITs to complete,
- Set
LISTENER_EXECas well, and runmake speaker.listener.jsonlto process the data - Alternatively, wait for both speaker and listener hits to complete, and run
make visualize- There seems to be some need to inspect
speaker.statusto make sure there is no incorrect rejections, and no new weird spam before deploying them to the listener. This prevents the process from being fully automated.
- There seems to be some need to inspect
jq -c 'if .q[0]=="accept" then .q[1] else empty end' speaker.raw.jsonl
cat data/query.json | jq -c '.q[1].utterance'
Use split_data.py to split data into train/test (no dev since all the Turk data is dev data):
python split_data.py randomWithNoCanon.jsonl randomWithNoCanon_splitIndep # Split each example separately
python split_data.py -s randomWithNoCanon.jsonl randomWithNoCanon_splitSess # Split by sessionId == MTurk ID