Skip to content

ranjith-acharya/SpamFiltration

Repository files navigation

SpamFiltration

Spam Filtration or Classification of E-Mails using Naive Bayes.

Abstract

Email spam is operations which are sending the undesirable messages to different email client. E-Mail spam is the very recent problem for every individual. The E-Mail spam is nothing, it's just an advertisement of any company/product or any kind of virus receiving by the email client mailbox without any notification. To solve this problem the different spam filtering techniques are used. Here we are using real-time data set for classification of spam and non-spam emails. The result is to increase the Accuracy of the system.

Working

The Data set used is Enron E-Mail Data Set, in these project it contains about (2000) E-Mail Data set.

The File
spam.py, it uses Naive bayesian Classification creates Dictionary of all Repeated words and generates a Model which is "text-classification.mdl" File.

The File
detector.py, it uses the generated text-classification.mdl, file for testing purpose, so as to classify whether the input email is Spam or Ham

To Download Enron Email Data set Click here.

Commands

  • $ python spam.py    , then execute
  • $ python detector.py