Skip to content

Commit

Permalink
Switching to a fractional split between test and training data (#30)
Browse files Browse the repository at this point in the history
  • Loading branch information
mtlynch committed May 1, 2018
1 parent f7d196a commit d3d8c20
Show file tree
Hide file tree
Showing 4 changed files with 1,375 additions and 1,379 deletions.
6 changes: 4 additions & 2 deletions test_e2e
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ set -e
set -x

LABELLED_DATA_FILE=nyt-ingredients-snapshot-2015.csv
COUNT_TRAIN=20000
COUNT_TEST=2000
LABELLED_EXAMPLE_COUNT=22000
TRAINING_DATA_PERCENT=0.9
COUNT_TRAIN=$(python -c "print int($TRAINING_DATA_PERCENT * $LABELLED_EXAMPLE_COUNT)")
COUNT_TEST=$(python -c "print int((1.0 - $TRAINING_DATA_PERCENT) * $LABELLED_EXAMPLE_COUNT)")
OUTPUT_DIR=$(mktemp -d)
CRF_TRAINING_FILE="${OUTPUT_DIR}/training_data.crf"
CRF_TESTING_FILE="${OUTPUT_DIR}/testing_data.crf"
Expand Down
12 changes: 6 additions & 6 deletions tests/golden/eval_output
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@

Sentence-Level Stats:
correct: 1854
total: 2000
% correct: 92.7
correct: 2044
total: 2199
% correct: 92.9513415189

Word-Level Stats:
correct: 11252
total: 11459
% correct: 98.1935596474
correct: 12328
total: 12547
% correct: 98.2545628437

0 comments on commit d3d8c20

Please sign in to comment.