
# Naive Bayes

Naive Bayes methods are a set of supervised learning algorithms based on applying Bayes' theorem with the "naive" assumption of conditional independence between every pair of features given the value of the class variable. Bayes'theorem states the following relationship, given class variable $y$ and dependent feature vector $x_1$ through $x_n$,:

$$P(y \mid x_1, \dots, x_n) = \frac{P(y) P(x_1, \dots x_n \mid y)}
                                 {P(x_1, \dots, x_n)}$$

Using the naive conditional independence assumption, we have

$$\begin{align}\begin{aligned}P(y \mid x_1, \dots, x_n) \propto P(y) \prod_{i=1}^{n} P(x_i \mid y)\\\Downarrow\\\hat{y} = \arg\max_y P(y) \prod_{i=1}^{n} P(x_i \mid y),\end{aligned}\end{align}$$

Then, we can use Maximum A Posteriori (MAP) estimation to estimate $P(y)$ and $P(x_i \mid y)$; the former is then the relative frequency of class $y$ in the training set.

*References*:
H. Zhang (2004). The optimality of Naive Bayes. Proc. FLAIRS.

# 1 Gaussian Naive Bayes

GaussianNB implements the Gaussian Naive Bayes algorithm for classification.   
The likelihood of the features is assumed to be Gaussian:

$$P(x_i \mid y) = \frac{1}{\sqrt{2\pi\sigma^2_y}} \exp\left(-\frac{(x_i - \mu_y)^2}{2\sigma^2_y}\right)$$

The parameters $\sigma_y$ and $\mu_y$  are estimated using maximum likelihood.

**Example** - The training data is generated as follows:

In [1]:
import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])

In [2]:
print(X)
print(Y)

[[-1 -1]
 [-2 -1]
 [-3 -2]
 [ 1  1]
 [ 2  1]
 [ 3  2]]
[1 1 1 2 2 2]


**Q1**: Training a GaussianNB model:

In [5]:
#[Code...]
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X, Y)

GaussianNB(priors=None, var_smoothing=1e-09)

In [10]:
print(clf.predict([[-0.8, -1]]))
print(clf.predict([[4, 5]]))
print(clf.predict([[10, -5]]))
print(clf.predict([[2, -5]]))

[1]
[2]
[2]
[1]


In [7]:
clf_pf = GaussianNB()
clf_pf.partial_fit(X, Y, np.unique(Y))

GaussianNB(priors=None, var_smoothing=1e-09)

In [8]:
print(clf_pf.predict([[-0.8, -1]]))

[1]


# 2 MultinomialNB

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). 

*References*   
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classification-1.html

**Example** - The training data is generated as follows:

In [12]:
import numpy as np
X = np.random.randint(5, size=(6, 100))
print(X)
y = np.array([1, 2, 3, 4, 5, 6])
print(y)

[[2 4 2 0 2 1 2 3 0 2 2 0 4 0 4 1 4 1 0 4 3 4 2 1 2 0 3 1 3 1 4 3 2 3 2 0
  2 0 2 1 4 4 3 3 3 0 0 0 4 4 1 0 4 1 3 0 3 0 1 3 4 1 2 4 2 0 3 0 2 4 3 2
  1 2 0 4 3 4 3 4 4 3 0 2 3 4 1 0 0 2 4 4 3 0 0 1 2 3 3 2]
 [3 3 3 4 1 2 3 0 0 1 1 3 0 1 4 3 3 4 0 4 0 2 3 2 1 3 2 2 2 2 3 0 4 1 4 0
  0 0 4 1 2 1 3 3 4 0 1 0 3 4 4 1 3 4 3 1 2 3 0 0 2 0 0 4 4 3 0 1 3 3 0 1
  4 2 4 0 3 1 3 1 1 4 3 1 3 2 2 4 3 1 0 1 3 1 2 3 2 0 1 2]
 [0 3 3 0 2 4 2 2 4 1 1 3 2 0 2 4 1 1 3 2 1 0 4 2 3 0 3 2 3 2 2 2 2 3 3 2
  4 1 0 3 4 4 2 2 0 2 3 4 1 4 4 0 3 2 3 3 2 1 1 0 4 1 0 4 2 1 2 2 2 0 1 4
  1 2 4 1 4 2 4 1 3 0 1 2 1 3 0 4 0 1 0 0 4 4 4 1 0 4 3 1]
 [3 3 1 0 2 0 0 2 1 4 2 0 2 0 1 1 1 0 1 4 4 2 1 4 4 2 1 2 1 3 3 4 4 4 3 3
  3 4 4 4 3 1 2 3 3 0 1 4 4 3 1 0 1 2 1 2 0 1 1 2 4 3 0 0 0 1 4 3 3 0 4 4
  2 2 1 2 2 1 4 2 4 3 3 3 4 4 1 1 0 3 4 1 3 3 0 4 4 2 0 3]
 [0 2 0 3 4 2 2 3 4 1 1 0 4 0 4 0 1 0 3 0 4 0 1 0 3 3 1 0 4 4 1 3 0 3 0 3
  1 1 2 4 1 0 0 3 0 0 4 0 1 2 0 1 4 2 2 1 1 2 0 3 3 1 1 0 2 3 3 4 1 3 3 0
  3 4 4 3 4 3 4 4 4 3 0 

**Q3**: Training a MultinomialNB model:

In [13]:
#[Code...]
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(X, y)

MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)

In [16]:
print(clf.predict(X[2:3]))
print(clf.predict(X[3:5]))
print(clf.predict(X[1:5]))

[3]
[4 5]
[2 3 4 5]


# 3 Process on 'Iris' Data

In Week 9, we have studied how to use KNN algorithm to do classification task on 'iris' data. Here,we are going to employ the GaussianNB to conduct the same task. 

In [19]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris_dataset = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris_dataset['data'], iris_dataset['target'], random_state=142)

**Q5**：Report the acuracy result on test data:

In [20]:
#[Code...]
clf = GaussianNB()
clf.fit(X_train, y_train)

GaussianNB(priors=None, var_smoothing=1e-09)

In [22]:
y_pred = clf.predict(X_test)
np.mean(y_test == y_pred)

0.8947368421052632

In [25]:
y_train_pred = clf.predict(X_train)
np.mean(y_train == y_train_pred)

0.9821428571428571

In [26]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,y_pred))

0.8947368421052632


In [27]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_train,y_train_pred))

0.9821428571428571


In [28]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[ 7,  0,  0],
       [ 0, 15,  2],
       [ 0,  2, 12]], dtype=int64)

In [30]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_train, y_train_pred)

array([[43,  0,  0],
       [ 0, 32,  1],
       [ 0,  1, 35]], dtype=int64)

# 4 Predict Human Activity Recognition (HAR)

The objective of this practice exercise is to predict current human activity based on phisiological activity measurements from 53 different features based in the [HAR dataset](http://groupware.les.inf.puc-rio.br/har#sbia_paper_section). The training (`har_train.csv`) and test (`har_validate.csv`) datasets are provided.

**Q6**: Build a Naive Bayes model, predict on the test dataset and compute the [confusion matrix](https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62). Note: Please refer to the [`sklearn.metrics.confusion_matrix`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html)

In [53]:
#[Code...]
import pandas as pd
x = pd.read_csv("har_train.csv")
y = pd.read_csv("har_validate.csv")
print(x.shape)
print(y.shape)

(13737, 53)
(5885, 53)


In [54]:
x.columns

Index(['classe', 'roll_belt', 'pitch_belt', 'yaw_belt', 'total_accel_belt',
       'gyros_belt_x', 'gyros_belt_y', 'gyros_belt_z', 'accel_belt_x',
       'accel_belt_y', 'accel_belt_z', 'magnet_belt_x', 'magnet_belt_y',
       'magnet_belt_z', 'roll_arm', 'pitch_arm', 'yaw_arm', 'total_accel_arm',
       'gyros_arm_x', 'gyros_arm_y', 'gyros_arm_z', 'accel_arm_x',
       'accel_arm_y', 'accel_arm_z', 'magnet_arm_x', 'magnet_arm_y',
       'magnet_arm_z', 'roll_dumbbell', 'pitch_dumbbell', 'yaw_dumbbell',
       'total_accel_dumbbell', 'gyros_dumbbell_x', 'gyros_dumbbell_y',
       'gyros_dumbbell_z', 'accel_dumbbell_x', 'accel_dumbbell_y',
       'accel_dumbbell_z', 'magnet_dumbbell_x', 'magnet_dumbbell_y',
       'magnet_dumbbell_z', 'roll_forearm', 'pitch_forearm', 'yaw_forearm',
       'total_accel_forearm', 'gyros_forearm_x', 'gyros_forearm_y',
       'gyros_forearm_z', 'accel_forearm_x', 'accel_forearm_y',
       'accel_forearm_z', 'magnet_forearm_x', 'magnet_forearm_y',
       'ma

In [55]:
y.columns

Index(['classe', 'roll_belt', 'pitch_belt', 'yaw_belt', 'total_accel_belt',
       'gyros_belt_x', 'gyros_belt_y', 'gyros_belt_z', 'accel_belt_x',
       'accel_belt_y', 'accel_belt_z', 'magnet_belt_x', 'magnet_belt_y',
       'magnet_belt_z', 'roll_arm', 'pitch_arm', 'yaw_arm', 'total_accel_arm',
       'gyros_arm_x', 'gyros_arm_y', 'gyros_arm_z', 'accel_arm_x',
       'accel_arm_y', 'accel_arm_z', 'magnet_arm_x', 'magnet_arm_y',
       'magnet_arm_z', 'roll_dumbbell', 'pitch_dumbbell', 'yaw_dumbbell',
       'total_accel_dumbbell', 'gyros_dumbbell_x', 'gyros_dumbbell_y',
       'gyros_dumbbell_z', 'accel_dumbbell_x', 'accel_dumbbell_y',
       'accel_dumbbell_z', 'magnet_dumbbell_x', 'magnet_dumbbell_y',
       'magnet_dumbbell_z', 'roll_forearm', 'pitch_forearm', 'yaw_forearm',
       'total_accel_forearm', 'gyros_forearm_x', 'gyros_forearm_y',
       'gyros_forearm_z', 'accel_forearm_x', 'accel_forearm_y',
       'accel_forearm_z', 'magnet_forearm_x', 'magnet_forearm_y',
       'ma

In [56]:
print(x.head())

  classe  roll_belt  pitch_belt  yaw_belt  total_accel_belt  gyros_belt_x  \
0      A       1.41        8.07     -94.4                 3          0.00   
1      A       1.41        8.07     -94.4                 3          0.02   
2      A       1.42        8.07     -94.4                 3          0.00   
3      A       1.48        8.05     -94.4                 3          0.02   
4      A       1.45        8.06     -94.4                 3          0.02   

   gyros_belt_y  gyros_belt_z  accel_belt_x  accel_belt_y  ...  \
0           0.0         -0.02           -21             4  ...   
1           0.0         -0.02           -22             4  ...   
2           0.0         -0.02           -20             5  ...   
3           0.0         -0.03           -22             3  ...   
4           0.0         -0.02           -21             4  ...   

   total_accel_forearm  gyros_forearm_x  gyros_forearm_y  gyros_forearm_z  \
0                   36             0.03             0.00       

In [57]:
X_train = x.drop(['classe'], axis =1)
y_train = x['classe']
X_test = y.drop(['classe'], axis = 1)
y_test = y['classe']
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)

(13737, 52)
(13737,)
(5885, 52)
(5885,)


In [58]:
clf = GaussianNB()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(np.mean(y_test == y_pred))
y_train_pred = clf.predict(X_train)
print(np.mean(y_train == y_train_pred))

0.5542905692438402
0.5580548882579893


In [59]:
confusion_matrix(y_test, y_pred)

array([[1070,   95,  262,  212,   35],
       [ 127,  685,  145,   76,  106],
       [ 223,  106,  512,  136,   49],
       [ 102,   35,  271,  441,  115],
       [  51,  239,   95,  143,  554]], dtype=int64)

In [60]:
confusion_matrix(y_train, y_train_pred)

array([[2491,  164,  667,  502,   82],
       [ 306, 1608,  358,  191,  195],
       [ 519,  252, 1197,  309,  119],
       [ 240,   70,  603, 1075,  264],
       [ 143,  518,  252,  317, 1295]], dtype=int64)

In [68]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [69]:
y_pred = knn.predict(X_test)
print(np.mean(y_test == y_pred))
y_train_pred = knn.predict(X_train)
print(np.mean(y_train == y_train_pred))

0.9097706032285472
0.9552303996505788


In [71]:
from sklearn import tree
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)

In [72]:
y_pred = clf.predict(X_test)
print(np.mean(y_test == y_pred))
y_train_pred = clf.predict(X_train)
print(np.mean(y_train == y_train_pred))

0.9575191163976211
1.0


In [73]:
from sklearn.neural_network import MLPClassifier
clf = MLPClassifier(solver='lbfgs', alpha=1e-5,
                    hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X_train, y_train)

MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(5, 2), learning_rate='constant',
              learning_rate_init=0.001, max_iter=200, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='lbfgs', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)

In [74]:
y_pred = clf.predict(X_test)
print(np.mean(y_test == y_pred))
y_train_pred = clf.predict(X_train)
print(np.mean(y_train == y_train_pred))

0.27850467289719627
0.28135691926912715


In [75]:
from sklearn import svm
clf = svm.SVC(gamma='scale')
clf.fit(X_train, y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

In [76]:
y_pred = clf.predict(X_test)
print(np.mean(y_test == y_pred))
y_train_pred = clf.predict(X_train)
print(np.mean(y_train == y_train_pred))

0.827017841971113
0.8270364708451627
