Here we will guide you through loading and using the CHEMDNER model pack. First, some imports. Take note, that the first two lines only make any difference with a CUDA-enabled tensorflow installation, by forcing it to run in CPU-only mode. If instead you are willing to use a GPU, you should specify a GPU device number, e.g. `os.environ['CUDA_VISIBLE_DEVICES'] = '0'`. If you don't have a GPU or are using a CPU-only tensorflow version, you can remove these lines altogether.

In [None]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = ''

from scilk.collections.chemdner import loaders
from scilk.corpora import corpus
from scilk.corpora.chemdner import parse_abstracts
from scilk.util import intervals

We then need to specify the paths to serealised data and load the tokeniser and the NER model.

In [None]:
root = 'chemdner-collection'  # this is the unarchived chemdner-collection.tgz file

tokeniser_data = {
    'tokeniser_weights': f'{root}/tokeniser-weights.hdf5',
    'charmap': f'{root}/charmap.joblib'
}
detector_data = {
    'embeddings': f'{root}/vectors.txt.gz',
    'charmap': f'{root}/charmap.joblib',
    'ner_weights': f'{root}/ner-weights.hdf5'
}

tokeniser = loaders.load_tokeniser(tokeniser_data)
detector = loaders.load_detector(detector_data)

Here we use the CHEMDNER corpus, but you can work with any other texts   

In [None]:
texts = [ab.body for ab in parse_abstracts('data/chemdner_corpus/testing.abstracts.txt')]
tokenised = tokeniser(list(texts)[:100])  # only tokenise the first 100 texts to spare some time
annotations = detector(tokenised)  # note! you must pass texts preprocessed by the bundled tokeniser.

In [None]:
annotations[:5]

Take note, the NER model returns annotations as `Interval` objects (defined in `scilk.util.intervals`), literally representing a range within the source text corresponding to a detected named entity, since `Intervals`. Finally, we can extract these intervals.

In [None]:
entities = [intervals.extract(text, ivs) for text, ivs in zip(texts, annotations)]

In [None]:
entities[:5]