Error Analysis #31

j6mes · 2018-02-09T14:45:45Z

how often did DR return the right page?
how often did SR return the right page?
how often did SR return the original evidence?
for the times where SR returned different evidence. What are the differences between BLEU/ROUGE similarities between the claim and returned evidence vs claim and gold evidence?
Error coding scheme

j6mes · 2018-02-10T14:50:52Z

Metric	NLTK	DRQA Sents Precomputed IDF	DRQA Sents New IDF
Runtime	2 hours	10 hours	12 hours
Strict Accuracy (strict) requirement for correct evidence	0.2476	0.1827	0.2698
Classification Accuracy Without Need For Evidence	0.4885	0.4588	0.4922
Correct Document Return Rate (dmatch)	0.5793	0.5893	0.5893
Correct Document Return Rate after sentence selection (smatch)	0.4773	0.2690	0.5596
Correct Text Return Rate (for Refutes/Supports)	0.3647	0.1083	0.4680

j6mes · 2018-02-10T14:53:58Z

@andreasvlachos using DrQA instead of NLTK for sentence selection gives us a 2% boost - at the cost of an extra 10 hours. dmatch and smatch figures give us upper bounds for strict accuracy (considering the supported/refuted class). In the case of DrQA - the number of times the correct document is in the evidence after sentence selection is 55% of the time whereas using NLTK, this is only 47%.

j6mes closed this as completed Apr 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Analysis #31

Error Analysis #31

j6mes commented Feb 9, 2018 •

edited by pmadhyastha

Loading

j6mes commented Feb 10, 2018

j6mes commented Feb 10, 2018

Error Analysis #31

Error Analysis #31

Comments

j6mes commented Feb 9, 2018 • edited by pmadhyastha Loading

j6mes commented Feb 10, 2018

j6mes commented Feb 10, 2018

j6mes commented Feb 9, 2018 •

edited by pmadhyastha

Loading