Skip to content
Consensus Categorization for MercadoLivre Data Challenge 2019
Jupyter Notebook Python
Branch: master
Clone or download
Pull request Compare This branch is even with rmarcacini:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Consensus Categorization (C²) for MercadoLivre Data Challenge 2019

The C² method is a supervised and transductive learning method based on the Consensus Clustering method that I investigated during my doctorate at ICMC-USP.

Here, the C² method has been adapted to handle the dataset provided by Meli Data Challenge 2019.

In short, the C² method has the following steps:

  • Preprocess product titles by removing stopwords (English, Portuguese, and Spanish), numbers, and special characters. Source: meli/
  • Learn a textual representation for product titles by using fasttext word embeddings. This word embedding is useful for initializing classification models.
  • Get different dataset samples, both by sampling instances and features. Source: meli/
  • Get different classification models for each sampling. It is important that there is diversity in classification model solutions. Source: meli/
  • Build a heterogeneous network with the following node types: product, terms, and classification models. Some network nodes are labeled considering the training set and the categories predicted by the classification models. The heterogeneous network is regularized through a consensus function that will return the final categorization. Source: meli/

The C² method ranked fourth (private leaderboard) in the Meli Data Challenge 2019. It can be improved by either adding more classification models or tuning the consensus function.

Requirements and Dependencies

  • python 3
  • numpy
  • pandas
  • keras
  • gensim
  • pickle
  • tqdm
  • sklearn
  • networkx
  • nltk
  • fasttext (compiled from source code)

How to use?

There is a jupyter notebook describing all the steps for executing the C² method. Some parts need to be adapted to your hardware requirements (if you have multiple GPUs).

The jupyter notebook is available here: meli2019.ipynb.


This software is available under MIT license.

You can’t perform that action at this time.