custom models for named-entity recognition
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
benchmarks
docs
examples
faq
LICENSE
README.md
__init__.py
anotator.py
api_utils.py
contextualizer.py
dataset_pseudo_generator.py
evaluator.py
fast_utils.py
fastent_install.sh
install.sh
poincare_train.py
reddit_utils.py
requirements.txt
settings.py
text_utils.py
train_spacy_NER.py
wordent_utils.py

README.md

fastent

The fastent Python library is a tool for end-to-end creation of custom models for named-entity recognition.

Custom Models

To train a model for a new type of entity, you just need a list of examples.

You are not limited to only predefined types like person, location and organization.

How It Works

fastent does end-to-end creation: dataset generation, annotation, contextualiziation and training a model.

You can also use fastent modules as standalone tools.

Made for Prod

fastent includes integrations with tools like spaCy, fastText pre-trained models and NLTK.

fastent is built to scale to very large text datasets in many languages.


Installation

fastent is developed for Python 3 on Unix systems.

Clone this repo or install from PyPI:

pip install fastent

Download NLTK data:

python -m nltk.downloader stopwords

Install and set up CouchDB:

wget -O - https://raw.githubusercontent.com/fastent/fastent/master/install.sh | bash

Downloading data files

TODO: fastText stuff

How To

Generation

fastent can generate a dataset from a list

TODO

fastent can even generate a list from one or two examples.

from fastent import dataset_pseudo_generator

model = dataset_pseudo_generator.spacy_initialize('en_core_web_lg')
dataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)

The equivalent on the command line:

python dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin

Annotation

TODO

Contextualization

TODO

Training

To train a model from the annotated and contextualized dataset:

For now the only supported learning framework is spaCy.

Request support for a new learning framework

TODO: sample output

Testing

Coming soon!

Integrations

fastent includes integrations for downloading datasets and pre-trained models.

TODO

More

See how fastent performs on benchmarks

Try the tutorial or fork examples

Browse frequently asked questions

Report bugs or request new features