Skip to content

kiendang/bioread

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for training NLP models on BioRead dataset

Available models are Gated Attention Reader and BERT. Code for them is in src/ga-reader and src/bert-lm respectively.

Prepare dataset

The BioRead and BioReadLite datasets can be obtained here (from http://nlp.cs.aueb.gr/publications.html).

Data need to be converted to CNN/Daily Mail format. Run

csplit -z --suppress-matched <file> '/^==============================$/' '{*}'

to split each file (train, valid, test) into multiple files (one per example) then format using scripts/bioread_cloze.py.

Set up environment

Clone this repo

git clone --recursive https://github.com/kiendang/bioread.git

Set up Conda environment with all dependencies

conda env create -f environment.yml

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published