Support for more languages? #2

EmilStenstrom · 2018-07-05T18:28:56Z

Hi! Flair looks amazing. Clean code, easy to use. Thanks for making it open source!

I was wondering if you plan to add support for more languages? Maybe all the languages where Zalando operates? :) I'm working for a company that need NLP-code that works across pretty much the same set of countries.

Looking at different available libraries, pre-trained models for more than just English (and German in this case!), is lacking in all the other libraries.

alanakbik · 2018-07-05T19:01:13Z

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

EmilStenstrom · 2018-07-05T19:08:36Z

It's a bit different depending on if it's for hobby projects or work projects.

Hobby projects: Swedish / POS, Swedish / NER
Business projects: Nordics (Swedish/Norwegian/Danish/Finish), German, Spanish, English, Polish. POS and NER.

alanakbik · 2018-07-05T19:44:48Z

Ok great! The German models (POS/NER) will be put online probably sometime next week.

We will also progressively add more languages in the near future. Of your list, I think Polish and Spanish are the most likely to be added soonish, though I can't say exactly when.

EmilStenstrom · 2018-07-05T21:24:29Z

Is there something about adding a new language that I could help with?

For instance, there one big Swedish dataset with POS and NER tags called SUC 3.0. It's available for download here: https://spraakbanken.gu.se/eng/resource/suc3

alanakbik · 2018-07-09T13:14:39Z

Yes, if you're interested you could train a new model for Swedish POS or NER. You would probably need to adapt the NLPTaskDataFetcher for the task you want to train it on, but otherwise could probably use pretty much the same code as given here (and in the experiments section).

I've added Swedish word embeddings to the project. I will also add issues for this task if you are interested!

EmilStenstrom · 2018-07-10T08:31:17Z

Fantastic! If you add the issues I’ll see where I can help.

eduardompereira · 2018-08-23T17:39:04Z

Are you planning to work on Portuguese language?

alanakbik · 2018-09-04T16:34:06Z

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

mhham · 2018-10-24T09:52:41Z

Hi there.
Any news on the french models ?
For NER and POS-Tagging there is the WikiNER french dataset which comes in a quite easily adaptable format :
https://github.com/dice-group/FOX/tree/master/input/Wikiner

For the word embeddings one can also use french fasttext embeddings:
https://fasttext.cc/docs/en/crawl-vectors.html

alanakbik · 2018-10-24T17:12:45Z

Hi @mhham thanks for the pointers - more languages are definitely planned and French is high up on our priority list. I am hoping that the next release will be a lot more multilingual than currently, but I am not sure how quickly we can get around to which language. Of course contributions are always welcome!

GH-2: Add protuguese models.

lz-chen · 2019-02-13T12:22:05Z

Hi, thanks for the great work! I wonder how many languages does flair support for NER now? From what I see on release 0.4 it seems that English, German, Dutch, French, italian, Spanish, Portuguese, Polish are supported?
Btw is there any updates on Nordic language models @EmilStenstrom? I am currently working with NER in Norwegian so it would be very useful:) Thanks!

stefan-it · 2019-02-13T12:38:10Z

I've trained FlairEmbeddings on Wikipedia dumps + OPUS (1 epoch) for some more languages:

no, fa, ar, id, pl, da, hi, nl, eu, sl, he, hr, fi, bg, cs and sv.

I'll provide them as soon as I have checked their performance on UD :)

lz-chen · 2019-02-13T14:55:40Z

Thanks for the reply! @stefan-it
I just read the Tutorial 2, so the pretrained NER model is available in German, French and Dutch, right?

alanakbik · 2019-02-13T14:58:03Z

Yes, you could also test our multilingual NER model, which can detect entities in English, German, Dutch and Spanish (and even other languages a little) even though it is only one model.

lz-chen · 2019-02-13T14:59:25Z

Thanks for the pointer! Will try that out:)

pvcastro · 2019-02-27T19:44:33Z

@eduardompereira I am not sure how quickly we can get around to Portuguese, so we'd welcome contributions here! If it helps, we could package standard word embeddings for Portuguese with the next release? Are you aware of good NER datasets for Portuguese?

Hi @alanakbik . I study NER for Portuguese, and for "general" NER models, I believe the best dataset is the one spacy uses, which is the one from WikiNER (Learning multilingual named entity recognition from Wikipedia) .
As for Portuguese word embeddings, there's a lab from an university here in Brazil that trained many different models of word embeddings for Portuguese here. In order for them to be available in flair, should they be added to embeddings.WordEmbeddings?

alanakbik · 2019-02-28T12:45:57Z

@pvcastro yes good idea. We've already added the WikiNER dataset for Portuguese (see tutorial). You can load it with:

original_corpus = NLPTaskDataFetcher.load_corpus(NLPTask.WIKINER_PORTUGUESE)

Aside from this, I think it would be good to support a downloading and conversion routine for word embeddings such as the ones you linked, to make it easy to start experimenting with them!

pvcastro · 2019-02-28T18:15:13Z

OK, great. I'll work on this and submit a PR soon.
Thanks @alanakbik!

jimkts · 2019-05-20T18:47:46Z

Hi guys! Flair is amazing....I am reading your project because I am writing my Msc thesis in NLP. I was wondering if Flair support Greek language?

alanakbik · 2019-05-21T06:09:39Z

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:

embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)

In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

stefan-it · 2019-05-21T10:03:00Z

@jimkts I could train Flair embeddings for Greek if you want :)

Meanwhile, you could also try the multilingual BERT model (it also includes Greek, trained on Wikipedia).

jimkts · 2019-07-01T14:32:01Z

Hello @jimkts - only one embedding type currently supports Greek, namely BytePairEmbeddings, which you could use to embed sentences and train models for Greek:
embeddings = BytePairEmbeddings("el")

sentence = Sentence('Αγαπώ την Ελλάδα')
embeddings.embed(sentence)

for token in sentence:
    print(token)
    print(token.embedding)
In order to train a model, you would need to add a Greek training dataset. For instance, the Greek Universal Dependency Treebank or a dataset for Named Entity Recognition. You can check out the tutorials on how to read in your own datasets or train your own models. If you have questions do let us know - we'd be happy to add Greek support.

Hello @alanakbik ....I trained a big Greek corpus(~17 Gb and ~3500000 words) on gensim Word2Vec. How can I use this pre trained model on Flair?

stale · 2020-04-30T00:10:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

update

marcomoriatbi · 2020-12-05T11:55:14Z

Hello Emil! Thanks for the interest - we are thinking of adding more models in more languages. In particular, we are currently looking at French, Italian and Dutch. Which languages / tasks are you most interested in?

Dear Alan, is it available any support to Italian NER? Is it required a new training for Italian NER? Thanks

alanakbik · 2020-12-07T07:32:43Z

Hello @marcomoriatbi there is no pre-trained model for Italiian NER yet. You could try 'ner-multi' which was trained over 4 languages and kind of works also for related languages it was trained for. I tried this model for French and it worked ok, so maybe that extends to Italian as well.

Otherwise, you would need to train your own Italian NER model. There are Italian Flair embddings included, but on the dataset side, we currently only include NER datasets for Italian that were automatically generated: WIKINER_ITALIAN, WIKIANN and XTREME (see here for more info). I think there are better NER datasets for Italian out there.

#2: Initial Multitask Model

Virtualize negative relations

longsc2603 · 2022-12-16T09:00:19Z

Hi, I am looking through Flair and wondering if it support Vietnamese or not. If not, will it in the future? Thank you!

alanakbik added the enhancement Improving of an existing feature label Jul 9, 2018

alanakbik mentioned this issue Jul 9, 2018

Train NER for Swedish #3

Closed

alanakbik mentioned this issue Jul 29, 2018

Add Trainer for custom language models #17

Closed

tabergma added the language model Related to language model label Oct 4, 2018

iamyihwa mentioned this issue Dec 10, 2018

'list' object has no attribute 'embed' when trying to predict with pretrained model #294

Closed

tabergma added a commit that referenced this issue Dec 12, 2018

GH-2: Add protuguese models.

0ab8648

tabergma mentioned this issue Dec 12, 2018

GH-2: Add protuguese models. #304

Merged

alanakbik pushed a commit that referenced this issue Dec 12, 2018

Merge pull request #304 from zalandoresearch/GH-2-portuguese-models

f96160c

GH-2: Add protuguese models.

tabergma added a commit that referenced this issue Dec 13, 2018

GH-270: Change semantic #2 - not working

3ea68a1

tabergma added a commit that referenced this issue Jan 17, 2019

GH-270: Change semantic #2 - not working

060ab5b

tabergma added a commit that referenced this issue Jan 17, 2019

GH-270: Change semantic #2 - not working

fd3dfa9

abeermohamed1 mentioned this issue Feb 3, 2020

aggregated_embedding not working with CPU #1406

Closed

sawankumar94 mentioned this issue Feb 4, 2020

Flair Model gives different results on same data #1404

Closed

CatarinaPC mentioned this issue Mar 10, 2020

By default, what is the best model? #1472

Closed

stale bot added the wontfix This will not be worked on label Apr 30, 2020

stale bot closed this as completed May 7, 2020

CourtVision mentioned this issue Aug 3, 2020

OSError when trying to load model trained on another commputer #1747

Closed

makcedward mentioned this issue Oct 10, 2020

Sentence Augmentation (NLPAug) #1903

Closed

alanakbik pushed a commit that referenced this issue Oct 15, 2020

Merge pull request #2 from flairNLP/master

3ebfa73

update

ashokxnarang mentioned this issue Dec 26, 2020

TARSClassifier crashes #2043

Closed

whoisjones added a commit that referenced this issue Feb 4, 2021

pull reviews #2 implemented

1eb9eb5

whoisjones added a commit that referenced this issue Feb 4, 2021

removed unecessary import from review of PR #2

9a4e954

whoisjones added a commit that referenced this issue Feb 4, 2021

Merge pull request #2 from whoisjones/multitask_model

0a50594

#2: Initial Multitask Model

alanakbik pushed a commit that referenced this issue Jun 8, 2021

Merge pull request #2 from melvelet/virtualize-negative-relations

ce5a1d8

Virtualize negative relations

whoisjones added a commit that referenced this issue Nov 9, 2021

multitask model adjustments #2

d84c368

Madhu000 mentioned this issue Aug 8, 2022

Fine-tuning t5-base model raises an error #1661

Closed

longsc2603 mentioned this issue Dec 21, 2022

Support for Vietnamese #3036

Closed

None-Such mentioned this issue May 29, 2023

[Question]: How to Train a Multi-label Text Classifier? #3255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for more languages? #2

Support for more languages? #2

EmilStenstrom commented Jul 5, 2018

alanakbik commented Jul 5, 2018

EmilStenstrom commented Jul 5, 2018 •

edited

alanakbik commented Jul 5, 2018

EmilStenstrom commented Jul 5, 2018

alanakbik commented Jul 9, 2018

EmilStenstrom commented Jul 10, 2018

eduardompereira commented Aug 23, 2018

alanakbik commented Sep 4, 2018

mhham commented Oct 24, 2018 •

edited

alanakbik commented Oct 24, 2018

lz-chen commented Feb 13, 2019

stefan-it commented Feb 13, 2019

lz-chen commented Feb 13, 2019

alanakbik commented Feb 13, 2019

lz-chen commented Feb 13, 2019

pvcastro commented Feb 27, 2019

alanakbik commented Feb 28, 2019

pvcastro commented Feb 28, 2019

jimkts commented May 20, 2019

alanakbik commented May 21, 2019

stefan-it commented May 21, 2019

jimkts commented Jul 1, 2019 •

edited

stale bot commented Apr 30, 2020

marcomoriatbi commented Dec 5, 2020

alanakbik commented Dec 7, 2020

longsc2603 commented Dec 16, 2022

Support for more languages? #2

Support for more languages? #2

Comments

EmilStenstrom commented Jul 5, 2018

alanakbik commented Jul 5, 2018

EmilStenstrom commented Jul 5, 2018 • edited

alanakbik commented Jul 5, 2018

EmilStenstrom commented Jul 5, 2018

alanakbik commented Jul 9, 2018

EmilStenstrom commented Jul 10, 2018

eduardompereira commented Aug 23, 2018

alanakbik commented Sep 4, 2018

mhham commented Oct 24, 2018 • edited

alanakbik commented Oct 24, 2018

lz-chen commented Feb 13, 2019

stefan-it commented Feb 13, 2019

lz-chen commented Feb 13, 2019

alanakbik commented Feb 13, 2019

lz-chen commented Feb 13, 2019

pvcastro commented Feb 27, 2019

alanakbik commented Feb 28, 2019

pvcastro commented Feb 28, 2019

jimkts commented May 20, 2019

alanakbik commented May 21, 2019

stefan-it commented May 21, 2019

jimkts commented Jul 1, 2019 • edited

stale bot commented Apr 30, 2020

marcomoriatbi commented Dec 5, 2020

alanakbik commented Dec 7, 2020

longsc2603 commented Dec 16, 2022

EmilStenstrom commented Jul 5, 2018 •

edited

mhham commented Oct 24, 2018 •

edited

jimkts commented Jul 1, 2019 •

edited