The __textblob.classifiers__ module makes it simple to create custom classifiers.
As an example, Let's create a custom sentiment analyzer.

### Loading Data and Creating a Classifier
First we'll create some training and test data.

In [32]:
train = [('I love this sandwich.', 'pos'),
 ('this is an amazing place!', 'pos'),
 ('I feel very good about these beers.', 'pos'),
 ('this is my best work.', 'pos'),
 ('what an awesome view', 'pos'),
 ('I do not like this restaurant', 'neg'),
 ('I am tired of this stuff.', 'neg'),
 ("I can't deal with this", 'neg'),
 ('he is my sworn enemy!', 'neg'),
 ('my boss is horrible.', 'neg')]

In [5]:
test = [('the beer was good.', 'pos'),
 ('I do not enjoy my job', 'neg'),
 ("I ain't feeling dandy today.", 'neg'),
 ('I feel amazing!', 'pos'),
 ('Gary is a friend of mine.', 'pos'),
 ("I can't believe I'm doing this.", 'neg')]

Create a __Naive Bayes classifier__, passing the training data into the constructor.

In [6]:
from textblob.classifiers import NaiveBayesClassifier

In [33]:
cl = NaiveBayesClassifier(train)

### Loading data from files
You can also load data from common file formats including CSV, JSON, and TSV.

CSV files should be formatted like so:
```csv
I love this sandwich.,pos
This is an amazing place!,pos
I do not like this restaurant,neg
```

In [40]:
with open("files/test_classifier.csv", "r") as fp:
    cl = NaiveBayesClassifier(fp, format='csv')

### Classifying Text
Call the __classify(text)__ method to use the classifier.

Use __prob_classify(text)__ method to get the label probability distribution.

In [41]:
cl.classify("This is an amazing library!")

'pos'

In [42]:
prob_dist = cl.prob_classify("This one's a dozzy.")
prob_dist.max()

'pos'

In [43]:
round(prob_dist.prob("pos"),2)

0.58

In [44]:
round(prob_dist.prob('neg'),2)

0.42

### Classifying TextBlobs
Another way to classify text is to pass a classifier into the construtor of __TextBlob__ and call its classify() method.

__Advantage:__We can classify sentences within a TextBlob.

In [45]:
from textblob import TextBlob
blob = TextBlob("The beer is good. But the hangover is horrible.", classifier = cl)
blob.classify()

'pos'

In [46]:
for s in blob.sentences:
    print(s, "|",s.classify())

The beer is good. | pos
But the hangover is horrible. | neg


### Evaluating Classifiers
To compute the accuracy on our test set, use the __accuracy(test_data)__ method.

__Note:__
You can also pass in a file object into the accuracy method. The file can be in any of the formats listed in the Loading Data section.

In [47]:
cl.accuracy(test)

0.8333333333333334

Use the __show_informative_features()__ method to display a listing of the most informative features.

In [57]:
cl.show_informative_features(5)

Most Informative Features
        contains(friend) = False             neg : pos    =      1.5 : 1.0
             contains(I) = True              neg : pos    =      1.3 : 1.0
          contains(best) = False             neg : pos    =      1.3 : 1.0
            contains(an) = False             neg : pos    =      1.3 : 1.0
          contains(this) = True              neg : pos    =      1.3 : 1.0


### Update Classifiers with New Data
Use the __update(new_data)__ method to update a classifier with new training data.

In [49]:
new_data = [('She is my best friend.', 'pos'),
           ("I'm happy to have a new friend.", 'pos'),
           ("Stay thirsty, my friend.", 'pos'),
           ("He ain't from around here.", 'neg')]

In [50]:
cl.update(new_data)

True

In [51]:
cl.accuracy(test)

1.0

### Feature Extractors
By defualt, the __NaiveBayesClassifier__ uses a simple feature extractor that indicates which words in the training set are contained in a document.

For example, the sentence _“I feel happy”_ might have the features __contains(happy): True__ or __contains(angry): False__.

You can override this feature extractor by writing your own. A feature extractor is simply a function with __document__ (the text to extract features from) as the first argument. The function may include a second argument, __train_set__ (the training dataset), if necessary.

The function should return a dictionary of features for __document__.

For example, let’s create a feature extractor that just uses the first and last words of a document as its features.

In [81]:
>>> def end_word_extractor(document):
...     tokens = document.split()
...     first_word, last_word = tokens[0], tokens[-1]
...     feats = {}
...     feats["first({0})".format(first_word)] = True
...     feats["last({0})".format(last_word)] = False
...     return feats
>>> features = end_word_extractor("I feel happy")
>>> assert features == {'last(happy)': False, 'first(I)': True}

In [76]:
>>> cl2 = NaiveBayesClassifier(train, feature_extractor=end_word_extractor)
>>> blob = TextBlob("I'm excited to try my new classifier.", classifier=cl2)
>>> blob.classify()

'pos'