Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Clone or download
Hironsan Merge pull request #92 from Erechtheus/master
Added allennlp dependency
Latest commit b9a6738 Nov 28, 2018
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
anago Add ELMo Sep 21, 2018
data change load_data_and_labels to load tsv file Aug 28, 2017
docs Update README Jul 2, 2018
examples Update elmo_example Sep 21, 2018
tests Update download url Sep 13, 2018
.gitignore update .gitignore Jun 29, 2017
ISSUE_TEMPLATE.md Add issue template Mar 5, 2018
LICENSE add LICENSE Aug 22, 2017
MANIFEST.in Add manifest file Jun 4, 2018
README.md Update README.md Sep 27, 2018
requirements.txt Added allennlp dependency Nov 21, 2018
setup.py Update setup.py for windows user Jul 17, 2018
tox.ini add tox.ini Aug 24, 2017

README.md

anaGo

anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras.

anaGo can solve sequence labeling tasks such as named entity recognition (NER), part-of-speech tagging (POS tagging), semantic role labeling (SRL) and so on. Unlike traditional sequence labeling solver, anaGo don't need to define any language dependent features. Thus, we can easily use anaGo for any languages.

As an example of anaGo, the following image shows named entity recognition in English:

anaGo Demo

English NER

Get Started

In anaGo, the simplest type of model is the Sequence model. Sequence model includes essential methods like fit, score, analyze and save/load. For more complex features, you should use the anaGo modules such as models, preprocessing and so on.

Here is the data loader:

>>> from anago.utils import load_data_and_labels

>>> x_train, y_train = load_data_and_labels('train.txt')
>>> x_test, y_test = load_data_and_labels('test.txt')
>>> x_train[0]
['EU', 'rejects', 'German', 'call', 'to', 'boycott', 'British', 'lamb', '.']
>>> y_train[0]
['B-ORG', 'O', 'B-MISC', 'O', 'O', 'O', 'B-MISC', 'O', 'O']

You can now iterate on your training data in batches:

>>> import anago

>>> model = anago.Sequence()
>>> model.fit(x_train, y_train, epochs=15)
Epoch 1/15
541/541 [==============================] - 166s 307ms/step - loss: 12.9774
...

Evaluate your performance in one line:

>>> model.score(x_test, y_test)
0.802  # f1-micro score
# For more performance, you have to use pre-trained word embeddings.
# For now, anaGo's best score is 90.94 f1-micro score.

Or tagging text on new data:

>>> text = 'President Obama is speaking at the White House.'
>>> model.analyze(text)
{
    "words": [
        "President",
        "Obama",
        "is",
        "speaking",
        "at",
        "the",
        "White",
        "House."
    ],
    "entities": [
        {
            "beginOffset": 1,
            "endOffset": 2,
            "score": 1,
            "text": "Obama",
            "type": "PER"
        },
        {
            "beginOffset": 6,
            "endOffset": 8,
            "score": 1,
            "text": "White House.",
            "type": "LOC"
        }
    ]
}

To download a pre-trained model, call download function:

>>> from anago.utils import download

>>> url = 'https://s3-ap-northeast-1.amazonaws.com/dev.tech-sketch.jp/chakki/public/conll2003_en.zip'
>>> weights, params, preprocessor = download(url)
>>> model = anago.Sequence.load(weights, params, preprocessor)
>>> model.score(x_test, y_test)
0.909446369856927

If you want to use ELMo for better performance(f1: 92.22), you can use ELModel and ELMoTransformer:

# Transforming datasets.
p = ELMoTransformer()
p.fit(x_train, y_train)

# Building a model.
model = ELModel(...)
model, loss = model.build()
model.compile(loss=loss, optimizer='adam')

# Training the model.
trainer = Trainer(model, preprocessor=p)
trainer.train(x_train, y_train, x_test, y_test)

For futher details, see anago/examples/elmo_example.py.

Feature Support

anaGo supports following features:

  • Model Training
  • Model Evaluation
  • Tagging Text
  • Custom Model Support
  • Downloading pre-trained model
  • GPU Support
  • Character feature
  • CRF Support
  • Custom Callback Support
  • 💥(new) ELMo

anaGo officially supports Python 3.4–3.6.

Installation

To install anaGo, simply use pip:

$ pip install anago

or install from the repository:

$ git clone https://github.com/Hironsan/anago.git
$ cd anago
$ python setup.py install

Documentation

(coming soon)

Reference

This library is based on the following papers: