Code for training NLP models on BioRead dataset
Available models are Gated Attention Reader and BERT. Code for them is in src/ga-reader
and src/bert-lm
respectively.
The BioRead and BioReadLite datasets can be obtained here (from http://nlp.cs.aueb.gr/publications.html).
Data need to be converted to CNN/Daily Mail format. Run
csplit -z --suppress-matched <file> '/^==============================$/' '{*}'
to split each file (train, valid, test) into multiple files (one per example) then format using scripts/bioread_cloze.py
.
Clone this repo
git clone --recursive https://github.com/kiendang/bioread.git
Set up Conda environment with all dependencies
conda env create -f environment.yml