## Naive Bayes Classifier

We are gonna use a BBC news dataset to train our model with category of the article.

#### Importing Data

import pandas as pd
import numpy
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

In [3]:
df = pd.read_csv('bbc-text.csv')

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2225 entries, 0 to 2224
Data columns (total 2 columns):
category    2225 non-null object
text        2225 non-null object
dtypes: object(2)
memory usage: 34.8+ KB


In [6]:
df.sample(5)

Unnamed: 0,category,text
2034,politics,errors doomed first dome sale the initial att...
1449,sport,taylor poised for scotland return simon taylor...
159,business,orange colour clash set for court a row over t...
1155,politics,bid to cut court witness stress new targets to...
1949,business,huge rush for jet airways shares indian airlin...


#### Training our model

We are using CountVectorizer to split each and every word into an array. Then we will use Multinomial Naive Bayes (MNB) classifier to train it.

MNB is used for document classification.

In [8]:
vectorizer = CountVectorizer()
counts = vectorizer.fit_transform(df['text'].values)

In [13]:
counts

<2225x29421 sparse matrix of type '<class 'numpy.int64'>'
	with 449254 stored elements in Compressed Sparse Row format>

In [10]:
classifier = MultinomialNB()
targets = df['category'].values
classifier.fit(counts, targets)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

Our classifier is ready to classify new data.

#### Testing

In [14]:
sample = ['Barnubry has scored 1 goal against Futsul in the semi-finals of ASL.']

This should be predicted as a sports article (You score a goal in football! Which is a sport!)

In [15]:
sample_counts = vectorizer.transform(sample)
predictions = classifier.predict(sample_counts)

In [16]:
predictions

array(['sport'],
      dtype='<U13')

Yay! We have made our computer smart enough! I wish it would have completed my assignment, but it's still dumb enough! 