Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tweet sentiment analysis in french #21

Closed
ghost opened this issue Mar 28, 2020 · 7 comments
Closed

tweet sentiment analysis in french #21

ghost opened this issue Mar 28, 2020 · 7 comments

Comments

@ghost
Copy link

ghost commented Mar 28, 2020

Hi,

Hope you are all well !

Is it possible with Flaubert to do some tweet sentiment analysis written in french ? If so, how can we do that ?

Vive la France ! :-)

Cheers,
X

@schwabdidier
Copy link
Member

Hello,

Yes it is possible, for instance in FLUE (French Language Understanding Evaluation), we use the The Cross Lingual (PrettenhoferandStein,2010) dataset. We obtain the best results on the French part with Flaubert-large.

See the article here https://arxiv.org/pdf/1912.05372.pdf (sections 4.1 and 5.1)

Hope, it helps.

@schwabdidier
Copy link
Member

Of course, a similar method can be used with other kind of sentiment analysis.

@ghost
Copy link

ghost commented Mar 28, 2020

Thanks for your reply, is there any repository available to test ?

@schwabdidier
Copy link
Member

The FLUE part of this GitHub should help you.

https://github.com/getalp/Flaubert/tree/master/flue

Please give us feedback about your work.

@ghost
Copy link

ghost commented Mar 28, 2020

Re,

Je vais etre honnete, je ne sais pas vraiment comment faire car je ne suis pas très spécialisé en NLP.

Je peux faire le scraping avec https://github.com/twintproject/twint et l'API.

Je vous ai ajouté sur twitter mais je ne peux pas vous envoyer de message direct pour ne pas polluer l'issue avec mes questions, mon twitter est https://twitter.com/lucmichalski

En tout cas merci pour vos précisions.

Cheers,
Luc

@schwabdidier
Copy link
Member

d'accord, si vous voulez mieux comprendre, je vous conseille alors de commencer par un tuto de huggingface, tout est expliqué ici. https://github.com/huggingface/transformers Flaubert est intégré à cette librairie donc ce qui est expliqué pour l'anglais, ne sera pas trop compliqué à adapter au français. Je vous ajoute sur Twitter

@formiel
Copy link
Contributor

formiel commented Mar 28, 2020

Hi @lucmichalski,

If you want to fine-tune FlauBERT for a sentiment analysis task, you can base on the following section of the FLUE. There is only a few things to modify if you fine-tune on another task:

  1. Data processor: use existing data processors provided by HuggingFace's Transformer if your task has the same number of labels, or add new data processors depending on your needs.

  2. Prepare data: the input data to DataProcessor class should be in .tsv format, so you should prepare your text in that format accordingly. You may want to check the _create_examples function within the corresponding class to get the content in each column of the .tsv file correctly.

After getting the data in the right format, you can use the script run_glue.py for fine-tuning. The run_glue.py script available in the HuggingFace's Transformer currently does not include the testing and saving the best validation model after fine-tuning, so we modified it a little bit to output the test result and save the best validation model (and the last checkpoint if you want to resume training later as well). You can check it out here.

@formiel formiel closed this as completed Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants