SMS_spam_classification

Modules needed before running the program

Numpy, scipy, sklearn
Tensorflow
Genism

pip install --upgrade gensim

How to run the program

python cleanData.py  - separate the spam and non-spam data into 2 files
python train.py      - train the model on the existing data (creates d2v file for storing the model)
python test.py       - test the accuracy of the model

This repository is my basic step towards Doc2Vec a module of Tensorflow. In naive machine learning terms it generates a vector from a document. It is related to word2vec in many ways. We normally generate this vector using bag of words method where we analyze the words present in all documents and using n-grams. But tensorflow provides us with a module Doc2Vec that does this part for us and outputs a vector. Basics of Doc2Vec can be found here. Even the relation between word2vec and doc2vec and how the later is build on top of the other are discussed in that link.

We have training data for the sms spam data in the SMSSpamCollection file from which the spam and non-spam(called ham here) are separated out into two different files (This will be useful while using Doc2Vec) using cleanData.py.

Then train.py file generates vectors for each document (here each message is a document) using Doc2Vec module. LabeledLineSentence of Genism is used to generate the vectors.

Classification Report

                precision    recall  f1-score      support

    Ham            0.95      1.00      0.97         1449
    Spam           0.97      0.68      0.80         225

avg / total        0.95      0.95      0.95         1674

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
SMSSpamCollection		SMSSpamCollection
cleanData.py		cleanData.py
cleanedHam		cleanedHam
cleanedSpam		cleanedSpam
spamModel.d2v		spamModel.d2v
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMS_spam_classification

Modules needed before running the program

How to run the program

Classification Report

About

Releases

Packages

Languages

maheshreddykukunooru/SMS_spam_classification

Folders and files

Latest commit

History

Repository files navigation

SMS_spam_classification

Modules needed before running the program

How to run the program

Classification Report

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages