Release Release 0.6.1 · flairNLP/flair

Release 0.6.1 is bugfix release that fixes the issues caused by moving the server that originally hosted the Flair models. Additionally, this release adds a ton of new NER datasets, including the XTREME corpus for 40 languages, and a new model for NER on German-language legal text.

New Model: Legal NER (#1872)

Add legal NER model for German. Trained using the German legal NER dataset available here that can be loaded in Flair with the LER_GERMAN corpus object.

Uses German Flair and FastText embeddings and gets 96.35 F1 score.

Use like this:

# load German LER tagger
tagger = SequenceTagger.load('de-ler')

# example text
text = "vom 6. August 2020. Alle Beschwerdeführer befinden sich derzeit gemeinsam im Urlaub auf der Insel Mallorca , die vom Robert-Koch-Institut als Risikogebiet eingestuft wird. Sie wollen am 29. August 2020 wieder nach Deutschland einreisen, ohne sich gemäß § 1 Abs. 1 bis Abs. 3 der Verordnung zur Testpflicht von Einreisenden aus Risikogebieten auf das SARS-CoV-2-Virus testen zu lassen. Die Verordnung sei wegen eines Verstoßes der ihr zugrunde liegenden gesetzlichen Ermächtigungsgrundlage, des § 36 Abs. 7 IfSG , gegen Art. 80 Abs. 1 Satz 1 GG verfassungswidrig."

sentence = Sentence(text)

# predict and print entities
tagger.predict(sentence)

for entity in sentence.get_spans('ner'):
    print(entity)

New Datasets

Add XTREME and WikiANN corpora for multilingual NER (#1862)

These huge corpora provide training data for NER in 176 languages. You can either load the language-specific parts of it by supplying a language code:

# load German Xtreme
german_corpus = XTREME('de')
print(german_corpus)

# load French Xtreme
french_corpus = XTREME('fr')
print(french_corpus)

Or you can load the default 40 languages at once into one huge MultiCorpus by not providing a language ID:

# load Xtreme MultiCorpus for all
multi_corpus = XTREME()
print(multi_corpus)

Add Twitter NER Dataset (#1850)

Dataset of tweets annotated with NER tags. Load with:

# load twitter dataset
corpus = TWITTER_NER()

# print example tweet
print(corpus.test[0])

Add German Europarl NER Dataset (#1849)

Dataset of German-language speeches in the European parliament annotated with standard NER tags like person and location. Load with:

# load corpus
corpus = EUROPARL_NER_GERMAN()
print(corpus)

# print first test sentence
print(corpus.test[1])

Add MIT Restaurant NER Dataset (#1177)

Dataset of English restaurant reviews annotated with entities like "dish", "location" and "rating". Load with:

# load restaurant dataset
corpus = MIT_RESTAURANTS()

# print example sentence
print(corpus.test[0])

Add Universal Propositions Banks for French and German (#1866)

Our kickoff into supporting the Universal Proposition Banks adds the first two UP datasets to Flair. Load with:

# load German UP
corpus = UP_GERMAN()
print(corpus)

# print example sentence
print(corpus.dev[1])

Add Universal Dependencies Dataset for Chinese (#1880)

Adds the Kyoto dataset for Chinese. Load with:

# load Chinese UD dataset
corpus = UD_CHINESE_KYOTO()

# print example sentence
print(corpus.test[0])

Bug fixes

Move models to HU server (#1834 #1839 #1842)
Fix deserialization issues in transformer tokenizers #1865
Documentation fixes (#1819 #1821 #1836 #1852)
Add link to a repo with examples of Flair on GCP (#1825)
Correct variable names (#1875)
Fix problem with custom delimiters in ColumnDataset (#1876)
Fix offensive language detection model (#1877)
Correct Dutch NER model (#1881)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.6.1