Skip to content

Classic ML starter project to classify messages from a sms dataset into spam and ham

Notifications You must be signed in to change notification settings

raphelemmanuvel/ml-spam-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ML- SMS Spam/Ham Clustering

A Machine Learning classic starter project using Python libraries to cluster a data set of 'sms' messages into 'spam' and 'ham' using k-means.

The dataset is a collection of 5,574 SMS meesages taken from UCI Machine Learning repository, need to be tagged as "spam" and "ham".

The whole pipeline conists of the following steps:

  • Loading data
  • Data wrangling and pre-processing
  • Feature Selection
  • Feature Vector Modelling
  • k-means clustering and evaluation
  • Writing results

Although there are multiple methods for solving the problem, tfidf approach is employed here to obtain high prediction accuracy.

Releases

No releases published

Packages

No packages published

Languages