Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

dhimmel · 2018-01-10T20:22:24Z

Refs #15.

Analyzes accuracy on 200 DOIs (100 where PennText was true, 100 where PennText was false).

Select 500 DOIs for an expanded manual assessment. Stratified on PennText to match the proportion in the entire DOI set. Reuses as many DOIs with calls as possible.

Todo:

Switch notebook to project environment

dhimmel · 2018-01-10T20:34:49Z

Note the overall inaccuracy of PennText calls is only 12.4%. This is because accuracy when PennText was true is 94%, and most DOIs are PennText == true.

dhimmel · 2018-01-10T20:49:17Z

The idea here is that curation would continue in manual-doi-checks-500.tsv. Currently, this file doesn't have the date queried columns and has different names than before. Let me know if that's a problem. You could always edit the column names / add new ones manually if you wanted.

dhimmel · 2018-01-12T17:52:28Z

Pinging @Publicus

jglev · 2018-01-12T20:10:13Z

I've reviewed your sample, and it looks good to me. I've updated the facilitation script to use manual-doi-checks-500.tsv, as well, and have gotten started. The facilitation script does add the date columns back automatically; I do prefer keeping them if the data are going to be public, since they can help if there's a question later about journal subscription timelines. My understanding from your comment above is that you're fine with those date columns being retained; is that correct?

dhimmel · 2018-01-12T21:13:43Z

My understanding from your comment above is that you're fine with those date columns being retained; is that correct?

Yep.

I've updated the facilitation script to use manual-doi-checks-500.tsv

So this PR is ready to merge? If everything looks good to you, "approve" it under "files changed" > "review changes".

jglev · 2018-01-12T21:36:04Z

A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.

dhimmel · 2018-01-12T21:47:25Z

A quick logistics question: Is your idea that the edits to the facilitation script, and the results from the 500, be in their own PR? If so, yes, this is ready, and I'll mark it as approved.

Yes

dhimmel added 2 commits January 4, 2018 19:31

Exploratory notebook on PennText accuracy

baf783a

Overall accuracy and 500 random DOIs

bc59764

dhimmel mentioned this pull request Jan 10, 2018

Accuracy analysis of full_text_indicator calls #15

Closed

Update notebook to use library-access env

661736b

dhimmel requested a review from jglev January 10, 2018 20:49

jglev approved these changes Jan 12, 2018

View reviewed changes

dhimmel merged commit dcf151b into greenelab:master Jan 12, 2018

dhimmel deleted the penntext-accuracy branch January 12, 2018 21:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

dhimmel commented Jan 10, 2018 •

edited

Loading

dhimmel commented Jan 10, 2018 •

edited

Loading

dhimmel commented Jan 10, 2018

dhimmel commented Jan 12, 2018

jglev commented Jan 12, 2018

dhimmel commented Jan 12, 2018

jglev commented Jan 12, 2018

dhimmel commented Jan 12, 2018

Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

Analyze PennText accuracy on 200 DOIs, Select 500 for additional calls #19

Conversation

dhimmel commented Jan 10, 2018 • edited Loading

dhimmel commented Jan 10, 2018 • edited Loading

dhimmel commented Jan 10, 2018

dhimmel commented Jan 12, 2018

jglev commented Jan 12, 2018

dhimmel commented Jan 12, 2018

jglev commented Jan 12, 2018

dhimmel commented Jan 12, 2018

dhimmel commented Jan 10, 2018 •

edited

Loading

dhimmel commented Jan 10, 2018 •

edited

Loading