Skip to content

a Text Mining and Search project. Full repo with data and pre-trained models available on Gitlab.

License

Notifications You must be signed in to change notification settings

faber6911/meetup-topics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

a Text Mining and Search Project

Overview   |   References   |   Code   |   Presentation   |   Report   |   About us  

☍   Overview

The following project aims to implement a text classification pipeline for the Meetup events description. Within the Meetup platform, every organized event needs to be manually tagged by the organizers to allow the platform's recommendation system to suggest the event to users based on their interests. In this context, a system that suggests to organizers how to label their event based on how it was described by them would be a useful tool. The text classification task is well known in literature and involves a series of operations and tricks, starting from the preprocessing of the texts up to the text representation. Our goal was to find the best combination of preprocessing and text representation to be submitted to the best classifier, based on the classification performance, to maximize some performance evaluation metrics.

☍   References

  • D. M. Blei, A. Y. Ng, and M. I. Jordan, (2003). "Latent dirichlet allocation", The Journal of Machine Learning Research, 3, 993-1022.
  • T. Mikolov, G.s Corrado, K. Chen, J. Dean, (2013). "Efficient Estimation of Word Representations in Vector Space", ICLR 2013, 1-12.
  • Q. Le and T. Mikolov , (2014). "Distributed Representations of Sentences and Documents", Proceedings of the 31st International Conference on Machine Learning, in PMLR, 32(2), 1188-1196.
  • D. Xue and F. Li, (2015). "Research of Text Categorization Model based on Random Forests", IEEE International Conference on Computational Intelligence \& Communication Technology, 173-176.

☍   Code

All the produced code is contained into the src folder, and described in the src README.

☍   Presentation

Slides available here in pdf and pptx formats.

☍   Report

Full report here.

☍   About us

⊜   Dario Bertazioli

  • Current Studies: Data Science Master Student at Università degli Studi di Milano-Bicocca;
  • Past Studies: Bachelor's degree in Physics at Università degli Studi di Milano.

⊜   Fabrizio D'Intinosante

  • Cosa studio: Studente Magistrale di Data Science presso l'Università degli Studi di Milano-Bicocca;
  • Studi precedenti: Laurea triennale in Economia e Statistica per le organizzazioni presso l'Università degli Studi di Torino.

⊜   Massimiliano Perletti

  • Cosa studio: Studente Magistrale di Data Science presso l'Università degli Studi di Milano-Bicocca;
  • Studi precedenti: Laurea triennale in Ingegneria dei materiali e delle nano-tecnologie presso il Politecnico di Milano.

About

a Text Mining and Search project. Full repo with data and pre-trained models available on Gitlab.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published