Code in "applications" doesn't work; experiments not reproducible #5

tmadl · 2021-07-15T16:38:59Z

A lot of the imports in the scripts under the "applications" subfolder fail - for example:
from trove.utils import score_umls_ontologies
from trove.labelers.norm import lowercase, strip_affixes

This makes it impossible to reproduce the experiments in the paper

Could you please share these functions (or the previous version of the repo) to make the code runnable?
Thanks a lot in advance!

The text was updated successfully, but these errors were encountered:

dlrenz · 2021-07-16T06:09:10Z

I can confirm this. I even went through the repo's git history and couldn't find any working version.
Functions such as load_*_dict(ionary), load_*_dataset(s), load_*_abbrvs, score_umls_ontologies, umls_ontology_dicts are all non-existent, and I also cannot find anything in the codecase (functions that have different names etc.) that would implement similar functionalities.

jason-fries · 2021-07-16T07:25:45Z

Hi @tmadl , @dlrenz,

Apologies for the confusion! On the main branch applications are being refactored to better serve as a reproducible benchmark, so they currently don't work out-of-box. I expect the updated versions will be ready in a few weeks.

The manuscript branch includes implementations of the functions you are referencing (score_umls_ontologies, strip_affixes, umls_ontology_dicts, etc) and some scripts for applying labeling functions, training the label model, and training BioBERT. There is some complexity around initializing and using the UMLS in the older branch, since it assumes a parquet file has already been generated. In general, this branch isn't as polished as it should be, but it should be possible to reproduce the public NER results. Don't hesitate to spam Issues if you encounter other blockers.

Thank you for your interest and questions!!

tmadl · 2021-07-20T18:28:29Z

Thanks Jason. I've tried the manuscript branch, but there is an error trying to reproduce the experiments in the scripts folder

When running scripts/experiments/drug_lfs.sh, the file trove/labelers/tools.py, at line 64, in load_medispan requires a file called MEDNAME and looks for it under data/supervision/ontologies/MEDNAME

However, the "ontologies" folder in the shared GDrive folder (linked in the QuickStart section of Readme) does not contain any file called MEDNAME - nor does any GDrive subfolder - and neither is this file present in the github repo it seems

Could you please share where to find MEDNAME (as well as any other dependencies or instructions required to reproduce the experiments in your manuscript? At a minimum, I'd love to be able to reproduce the i2b2 results...)

Thanks a lot in advance!

tmadl · 2021-07-20T18:49:34Z

Similar issue when running disorder_lfs.sh - ontologies/CARD/cui2sty.tsv is missing...

Please let us know how to obtain the missing files (or, if it's easier, maybe do a bulk upload/share of the necessary files?)

tmadl · 2021-07-23T13:09:31Z

Since MEDNAME is not essential I was able to get i2b2 to work. However, without ontologies/CARD/cui2sty.tsv, ontologies/CARD/VABBR_CV_beta.txt and ontologies/CARD/VABBR_DS_beta.txt the shareclef2014 results cannot be reproduced

Please do share where nad how to obtain these files - thanks in advance

jason-fries · 2021-07-24T06:44:58Z

Hi @tmadl,
Thanks for the great debugging! I went ahead and uploaded the CARD dictionaries to Google Drive (they are also available for direct download here). I've also updated the documentation for running the full pipeline and cleaned up the drug task scripts. Let us know if you encounter any more issues!

jason-fries · 2021-08-07T04:24:30Z

Let me know if you have additional problems! I'll close this issue for now, but happy to re-open.

jason-fries closed this as completed Aug 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code in "applications" doesn't work; experiments not reproducible #5

Code in "applications" doesn't work; experiments not reproducible #5

tmadl commented Jul 15, 2021

dlrenz commented Jul 16, 2021 •

edited

jason-fries commented Jul 16, 2021 •

edited

tmadl commented Jul 20, 2021

tmadl commented Jul 20, 2021

tmadl commented Jul 23, 2021

jason-fries commented Jul 24, 2021 •

edited

jason-fries commented Aug 7, 2021

Code in "applications" doesn't work; experiments not reproducible #5

Code in "applications" doesn't work; experiments not reproducible #5

Comments

tmadl commented Jul 15, 2021

dlrenz commented Jul 16, 2021 • edited

jason-fries commented Jul 16, 2021 • edited

tmadl commented Jul 20, 2021

tmadl commented Jul 20, 2021

tmadl commented Jul 23, 2021

jason-fries commented Jul 24, 2021 • edited

jason-fries commented Aug 7, 2021

dlrenz commented Jul 16, 2021 •

edited

jason-fries commented Jul 16, 2021 •

edited

jason-fries commented Jul 24, 2021 •

edited