Added script and instructions for downloading corenlp. Added official…

…_eval.py to default download.sh for convenience.
facebookresearch · Jul 28, 2017 · 9c851d1 · 9c851d1
1 parent 91fab68
commit 9c851d1
Show file tree

Hide file tree

Showing 5 changed files with 64 additions and 1 deletion.
diff --git a/.gitignore b/.gitignore
@@ -3,4 +3,5 @@
 *~
 data/
 *.tar.gz
-*.egg-info
+*.egg-info
+scripts/reader/official_eval.py
diff --git a/README.md b/README.md
@@ -114,6 +114,21 @@ drqa.tokenizer.set_default('corenlp_classpath', '/your/corenlp/classpath/*')
 
 Ex: `export CLASSPATH=$CLASSPATH:/path/to/corenlp/download/*`.
 
+If you do not already have a CoreNLP [download](https://stanfordnlp.github.io/CoreNLP/index.html#download) you can run:
+
+```bash
+./install_corenlp
+```
+
+_You can also specify a download location: `./install_corenlp /path/to/jars`_
+
+Verify that it runs:
+```python
+from drqa.tokenizers import CoreNLPTokenizer; 
+tok = CoreNLPTokenizer()
+tok.tokenize('hello world').words()  # Should complete immediately
+```
+
 For convenience, the Document Reader, Retriever, and Pipeline modules will try to load default models if no model argument is given. See below for downloading these models.
 
 ### Trained Models and Data

diff --git a/download.sh b/download.sh
@@ -37,6 +37,9 @@ python scripts/convert/squad.py "$DATASET_PATH/SQuAD-v1.1-train.json" "$DATASET_
 wget -O "$DATASET_PATH/SQuAD-v1.1-dev.json" "https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json"
 python scripts/convert/squad.py "$DATASET_PATH/SQuAD-v1.1-dev.json" "$DATASET_PATH/SQuAD-v1.1-dev.txt"
 
+# Download official eval for SQuAD
+curl "https://worksheets.codalab.org/rest/bundles/0xbcd57bee090b421c982906709c8c27e1/contents/blob/" >  "./scripts/reader/official_eval.py"
+
 # Get WebQuestions train
 wget -O "$DATASET_PATH/WebQuestions-train.json.bz2" "http://nlp.stanford.edu/static/software/sempre/release-emnlp2013/lib/data/webquestions/dataset_11/webquestions.examples.train.json.bz2"
 bunzip2 -f "$DATASET_PATH/WebQuestions-train.json.bz2"

diff --git a/install_corenlp.sh b/install_corenlp.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+# Copyright 2017-present, Facebook, Inc.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+
+set -e
+
+# By default download to the data directory I guess
+read -p "Specify download path or enter to use default (data/corenlp): " path
+DOWNLOAD_PATH="${path:-data/corenlp}"
+echo "Will download to: $DOWNLOAD_PATH"
+
+# Download zip, unzip
+pushd "/tmp"
+wget -O "stanford-corenlp-full-2017-06-09.zip" "http://nlp.stanford.edu/software/stanford-corenlp-full-2017-06-09.zip"
+unzip "stanford-corenlp-full-2017-06-09.zip"
+rm "stanford-corenlp-full-2017-06-09.zip"
+popd
+
+# Put jars in DOWNLOAD_PATH
+mkdir -p "$DOWNLOAD_PATH"
+mv "/tmp/stanford-corenlp-full-2017-06-09/"*".jar" "$DOWNLOAD_PATH/"
+
+# Append to bashrc, instructions
+while read -p "Add to ~/.bashrc CLASSPATH (recommended)? [yes/no]: " choice; do
+    case "$choice" in
+        yes )
+            echo "export CLASSPATH=\$CLASSPATH:$DOWNLOAD_PATH/*" >> ~/.bashrc;
+            break ;;
+        no )
+            break ;;
+        * ) echo "Please answer yes or no." ;;
+    esac
+done
+
+printf "\n*** NOW RUN: ***\n\nexport CLASSPATH=\$CLASSPATH:$DOWNLOAD_PATH/*\n\n****************\n"
diff --git a/scripts/reader/README.md b/scripts/reader/README.md
@@ -132,6 +132,12 @@ Optional arguments:
 
 Note: The CoreNLP NER annotator is not fully deterministic (depends on the order examples are processed). Predictions may fluctuate very slightly between runs if `num-workers` > 1 and the model was trained with `use-ner` on.
 
+Evaluation is done with the official_eval.py script from the SQuAD creators, available [here](https://worksheets.codalab.org/rest/bundles/0xbcd57bee090b421c982906709c8c27e1/contents/blob/). It is also available by default at `scripts/reader/official_eval.py` after running `./download.sh`.
+
+```bash
+python scripts/reader/official_eval.py /path/to/format/B/dataset.json /path/to/predictions/with/--official/flag/set.json
+```
+
 ## Interactive
 
 The Document Reader can also be used interactively (like the [full pipeline](../../README.md#quick-start-demo)).