Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

API usage for the custom classifier

Tasdik Rahman edited this page Apr 10, 2016 · 2 revisions

Your directory structure should look something like this. (Keep the following files after cloning the repo)

$ tree spamfilter_api_demo/
spamfilter_api_demo/
├── classifier.py
├── demo.py
├── logfiles
│   └── logfile.txt
├── saved_classifiers
│   ├── __init__.py
│   ├── spam_classifier.pickle
│   └── trainer.pickle
└── train.py

Inside demo.py

# -*- coding: utf-8 -*-
# @Author: tasdik
# @Date:   2016-04-03 22:32:38
# @Last Modified by:   tasdik
# @Last Modified time: 2016-04-03 22:59:28
# @GPLv3 License
# @http://tasdikrahman.me
# @https://github.com/prodicus

HAM_TEXT = \
"""
Hi, you must understand machine learning algorithms to get good at 
machine learning.

If you are like me then you understand something best when you can 
implement it from scratch. You need to understand each piece so you can understand 
the whole.

A sticking point with machine learning is the math. You want to dive into 
the details of machine learning algorithms but you don't want to spend 
the next 3 years studying advanced mathematics.

I've been working on this problem. It's a sticking point for a lot of beginners 
in machine learning. The solution involves two pieces:

1. Clear procedures for how machine learning algorithms learn from data and make predictions.
2. Step-by-step tutorials that show exactly how to make each procedure work, with 
real numbers rather than abstract equations.
"""

SPAM_TEXT = \
"""
My Dear Friend,

How are you and your family? I hope you all are fine.

My dear I know that this mail will come to you as a surprise, but it's for my 
urgent need for a foreign partner that made me to contact you for your sincere
genuine assistance My name is Mr.Herman Hirdiramani, I am a banker by 
profession currently holding the post of Director Auditing Department in 
the Islamic Development Bank(IsDB)here in Ouagadougou, Burkina Faso.

I got your email information through the Burkina's Chamber of Commerce 
and industry on foreign business relations here in Ouagadougou Burkina Faso 
I haven'disclose this deal to any body I hope that you will not expose or 
betray this trust and confident that I am about to repose on you for the 
mutual benefit of our both families.

I need your urgent assistance in transferring the sum of Eight Million,
Four Hundred and Fifty Thousand United States Dollars ($8,450,000:00) into
your account within 14 working banking days This money has been dormant for 
years in our bank without claim due to the owner of this fund died along with 
his entire family and his supposed next of kin in an underground train crash 
since years ago. For your further informations please visit 
(http://news.bbc.co.uk/2/hi/5141542.stm)
"""

import dill
import bs4

classifier_file = open('saved_classifiers/spam_classifier.pickle', 'rb')
classifier_object = dill.load(classifier_file)
classifier_file.close()

trainer_file = open('saved_classifiers/trainer.pickle', 'rb')
trainer_object = dill.load(trainer_file)
trainer_file.close()

def classify(text):
    email_text = bs4.UnicodeDammit.detwingle(text).decode('utf-8')
    email_text = email_text.encode('ascii', 'ignore')
    return classifier_object.classify(trainer_object.extract_features(email_text))

def main():
    print classify(HAM_TEXT)
    ## returns 'ham'
    print classify(SPAM_TEXT)
    ## returns 'spam'

if __name__ == "__main__":
    main()
Clone this wiki locally