Overview | References | Code | Presentation | Report | About us
The following project aims to implement a text classification pipeline for the Meetup events description.
Within the Meetup platform, every organized event needs to be manually tagged by the organizers to allow the platform's recommendation system to suggest the event to users based on their interests.
In this context, a system that suggests to organizers how to label their event based on how it was described by them would be a useful tool.
The text classification task is well known in literature and involves a series of operations and tricks, starting from the preprocessing of the texts up to the text representation.
Our goal was to find the best combination of preprocessing and text representation to be submitted to the best classifier, based on the classification performance, to maximize some performance evaluation metrics.
- D. M. Blei, A. Y. Ng, and M. I. Jordan, (2003). "Latent dirichlet allocation", The Journal of Machine Learning Research, 3, 993-1022.
- T. Mikolov, G.s Corrado, K. Chen, J. Dean, (2013). "Efficient Estimation of Word Representations in Vector Space", ICLR 2013, 1-12.
- Q. Le and T. Mikolov , (2014). "Distributed Representations of Sentences and Documents", Proceedings of the 31st International Conference on Machine Learning, in PMLR, 32(2), 1188-1196.
- D. Xue and F. Li, (2015). "Research of Text Categorization Model based on Random Forests", IEEE International Conference on Computational Intelligence \& Communication Technology, 173-176.
All the produced code is contained into the src folder, and described in the src README.
Slides available here in pdf and pptx formats.
Full report here.
⊜ Dario Bertazioli
- Current Studies: Data Science Master Student at Università degli Studi di Milano-Bicocca;
- Past Studies: Bachelor's degree in Physics at Università degli Studi di Milano.
⊜ Fabrizio D'Intinosante
- Cosa studio: Studente Magistrale di Data Science presso l'Università degli Studi di Milano-Bicocca;
- Studi precedenti: Laurea triennale in Economia e Statistica per le organizzazioni presso l'Università degli Studi di Torino.
⊜ Massimiliano Perletti
- Cosa studio: Studente Magistrale di Data Science presso l'Università degli Studi di Milano-Bicocca;
- Studi precedenti: Laurea triennale in Ingegneria dei materiali e delle nano-tecnologie presso il Politecnico di Milano.