MeLi Data Challenge 2019

Fifth place solution (balanced accuracy: 0.9108) for MercadoLibre's text classification challenge.

Model

The solution is a word level ensemble (average) of 10 LSTMs trained on FastText MUSE (Multilingual Unsupervised) embeddings. Our objective was to allow transfer learning between languages, mapping every word on the same space.

Our model is trained with Adam in two stages:

We first make sure the LSTM learns the embedding space given by MUSE.
We then fine tune the embeddings for words in the vocabulary of MUSE, and learn the embeddings of missing words.

Overall, the model is simple. More work should be done to improve the vocabulary, train with subsampling and iterate the model architecture to make it faster and more expressive.

Setting up resources

To set up the environment with all the datasets and resources, you must first call:

bash get_datasets.sh
bash get_embeddings.sh

Installing and running package

You can install the package in development mode and get our submission.

pip3 install -e .
python3 -m multilingual_title_classifier.src.train
python3 -m multilingual_title_classifier.src.submission

Running Docker image

You must first install nvidia-docker to run the Docker image with a GPU.

Then, install the base image we use:

docker pull nvidia/cuda

Finally, build and run our image:

docker build -t multilingual_title_classifier .
docker run --runtime=nvidia --ipc=host --mount source=${path_to_resources},target=/home/user/resources,type=bind multilingual_title_classifier

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
multilingual_title_classifier		multilingual_title_classifier
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
connect.sh		connect.sh
create_instance.sh		create_instance.sh
get_datasets.sh		get_datasets.sh
get_embeddings.sh		get_embeddings.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeLi Data Challenge 2019

Model

Setting up resources

Installing and running package

Running Docker image

About

Releases

Packages

Languages

License

pablozivic/meli-challenge-2019-multilingual-classifier

Folders and files

Latest commit

History

Repository files navigation

MeLi Data Challenge 2019

Model

Setting up resources

Installing and running package

Running Docker image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages