Code for training NLP models on BioRead dataset

Available models are Gated Attention Reader and BERT. Code for them is in src/ga-reader and src/bert-lm respectively.

Prepare dataset

The BioRead and BioReadLite datasets can be obtained here (from http://nlp.cs.aueb.gr/publications.html).

Data need to be converted to CNN/Daily Mail format. Run

csplit -z --suppress-matched <file> '/^==============================$/' '{*}'

to split each file (train, valid, test) into multiple files (one per example) then format using scripts/bioread_cloze.py.

Clone this repo

git clone --recursive https://github.com/kiendang/bioread.git

Set up Conda environment with all dependencies

conda env create -f environment.yml

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml