HUNER is a state-of-the-art NER model for biomedical entities. It comes with models for genes/proteins, chemicals, diseases, species and cell lines.

The is based on the great LSTM-CRF NER tagger implementation glample/tagger by Guillaume Lample.


Installation How to install HUNER
Usage How to use HUNER
Models Available pretrained models
Corpora The HUNER Corpora


  1. Install docker
  2. Clone this repository to $dir
  3. Download the pretrained model you want to use from here, place it into $dir/models/$model_name and untar it using tar xzf $model_name


To tokenize, sentence split and tag a file INPUT.TXT:

  1. Start the HUNER server from $dir using ./start_server $model_name. The model must reside in the directory $dir/models/$model_name.
  2. Tag text with python INPUT.TXT OUTPUT.CONLL --name $model_name.

the output will then be written to OUTPUT.CONLL in the conll2003 format.

The options for are:

  • --asume_tokenized: The input is already pre-tokenized and the tokens are separated by whitespace
  • --assume_sentence_splitted: The input is already split into sentences and each line of the input contains one sentence


Model Test sets P / R / F1 (%) CRAFT P / R / F1 (%)
cellline_all 65.09 / 67.69 / 66.08 -
chemical_all 83.34 / 80.26 / 81.71 53.56 / 35.85 / 42.95
disease_all 75.01 / 77.71 / 76.20 -
gene_all 75.01 / 79.16 / 76.81 59.67 / 65.98 / 62.66
species_all 85.37 / 79.98 / 82.59 98.51 / 73.83 / 84.40


For details and instructions on the HUNER corpora please refer to and the corresponding readme.


Please use the following bibtex entry:

  title={HUNER: Improving Biomedical NER with Pretraining},
  author={Weber, Leon and M{\"u}nchmeyer, Jannes and Rockt{\"a}schel, Tim and Habibi, Maryam and Leser, Ulf},
