Skip to content

luiscruz/remla-baseline-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Multilabel classification on Stack Overflow tags

Predict tags for posts from StackOverflow with multilabel classification approach.

Dataset

  • Dataset of post titles from StackOverflow

Transforming text to a vector

  • Transformed text data to numeric vectors using bag-of-words and TF-IDF.

MultiLabel classifier

MultiLabelBinarizer to transform labels in a binary form and the prediction will be a mask of 0s and 1s.

Logistic Regression for Multilabel classification

  • Coefficient = 10
  • L2-regularization technique

Evaluation

Results evaluated using several classification metrics:

Libraries

  • Numpy — a package for scientific computing.
  • Pandas — a library providing high-performance, easy-to-use data structures and data analysis tools for the Python
  • scikit-learn — a tool for data mining and data analysis.
  • NLTK — a platform to work with natural language.

Note: this sample project was originally created by @partoftheorigin

About

Simple ML project used as baseline for the REMLA course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published