Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference

Instructions

The code has been tested with Python 3. To install the dependencies, please run:

pip install -r requirements.txt

We use two datasets in this work:

ADE. We conducted 10-fold cross-validation. The dataset can be downloaded from here.
BioRelEx. The train and dev sets can be downloaded here. The test set is unreleased and can only be evaluated using CodeLab.

After downloading the datasets, please create a new folder resources and put the datasets into that folder. Overall, the folder structure of the entire repo should look like:

...
models/
pymetamap/
resources/
--- ade/
------- ade_full.json
------- ade_split_0_test.json
------- ade_split_0_train.json
....
------- ade_split_9_test.json
------- ade_split_9_train.json
------- ade_types.json
--- biorelex/
------- train.json
------- dev.json
--- umls_embs.pkl
--- umls_rels.txt
--- umls_reltypes.txt
--- umls_semtypes.txt
--- text2graph.pkl
scorer/
.gitignore
ade_train.sh
...

Additional files in the resources folder include:

The files umls_rels.txt, umls_reltypes.txt, and umls_semtypes.txt can be extracted directly from UMLS (to use UMLS, you need to request access permission).
umls_embs.pkl contains the embeddings of Maldonado et al. 2019 and also the embeddings of the UMLS definition sentences. Note that some UMLS concepts may not have any definition sentence.
text2graph.pkl is a cache that maps each text input in the datasets into a graph structure of all the concepts and relations from UMLS that can be potentially relevant (found by MetaMap).

For training, please refer to the scripts ade_trainer.py and trainer.py. For example, to train a basic model for BioRelEx, you can simply run:

python trainer.py

Note: If you want me to send you UMLS-related files, please email me at tuanml2@illinois.edu (together with some proof that you have access to UMLS). I am not putting UMLS-related files online because of the UMLS licensing issue.

There are some redundant code in this repo. I am going to remove them soon.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
data		data
external_knowledge		external_knowledge
models		models
pymetamap		pymetamap
scorer		scorer
.gitignore		.gitignore
README.md		README.md
ade_train.sh		ade_train.sh
ade_trainer.py		ade_trainer.py
analyze_attention_patterns.py		analyze_attention_patterns.py
analyze_embs.py		analyze_embs.py
analyze_model.py		analyze_model.py
biorelex_code.py		biorelex_code.py
biorelex_run.sh		biorelex_run.sh
biorelex_train.sh		biorelex_train.sh
compute_performance.py		compute_performance.py
constants.py		constants.py
explore_examples.py		explore_examples.py
metadata		metadata
qualitative_analysis.py		qualitative_analysis.py
requirements.txt		requirements.txt
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference

Instructions

About

Releases

Packages

Languages

laituan245/bio_relex

Folders and files

Latest commit

History

Repository files navigation

Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference

Instructions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages