Skip to content

Cross-lingual classification using multilingual word2vec embeddings

Notifications You must be signed in to change notification settings

sunnymodi21/crosslingual-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Cross Lingual Classification without translation or retraining

This project aims to create a sentiment classification model to be trained in one language and use it without retraining or translation for a new language

Dependencies

Datasets

Amazon review datasets:

  • Book review dataset in data/amazon-data
  • 2000 for training and 2000 for testing
  • Rating used as labels for positve or negative sentiment

You can download the English (en) French (es) and German (de) embeddings this way:

# English MUSE embeddings
curl -o data/wiki.en.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.en.vec
# French MUSE Wikipedia embeddings
curl -o data/wiki.fr.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.fr.vec
# German MUSE Wikipedia embeddings
curl -o data/wiki.de.vec https://dl.fbaipublicfiles.com/arrival/vectors/wiki.multi.de.vec

Train and test classifier

This project includes testing all language pair to i.e En-En, En-Fr, En-De ,Fr-Fr, Fr-En, Fr-De, De-De, De-En, De-Fr:

To evaluate the results simply run:

python crosslingual-classification.py

About

Cross-lingual classification using multilingual word2vec embeddings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages