# **Day 11**

**Naive Bayes Classifier**

It is a kind of classifier that works on Bayes theorem. Prediction of membership probabilities is made for every class such as the probability of data points is associated a particular class.

**Problem Statement**

To classify patients as diabetic or non-diabetic. The dataset has several different medical predictor features and a target that is **Outcome**. Predictor variables include the number of pregnancies that patient had, their BMI, insulin level, age and so on.....

In [None]:
#import libraries
import numpy as np
import pandas as pd

import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline

#sklearn libraries
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import StandardScaler
from sklearn import metrics

In [None]:
#lets define col names
colnames = ['preg','plas','pres','skin','test','mass','pedi','age','class']
pimadf = pd.read_csv("../input/data-science-machine-learning-and-ai-using-python/pima-indians-diabetes.data", names=colnames)

In [None]:
pimadf.head()

In [None]:
std = StandardScaler()

In [None]:
X = pimadf.drop("class", axis=1)
Y = pimadf['class']

In [None]:
X = std.fit_transform(X)

In [None]:
#Lets split the data
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.30, random_state=7)

In [None]:
#lets choose the model
model = GaussianNB()

In [None]:
#Lets train the model
model.fit(X_train, Y_train)
print(model)

In [None]:
#lets make the predictions
pred = model.predict(X_test)

In [None]:
#lets check the accuracy of the model by printing its score
from sklearn.metrics import accuracy_score, confusion_matrix

model_score = model.score(X_test, Y_test)
model_score

In [None]:
metrics.confusion_matrix(pred, Y_test)

In [None]:
#lets find the probability
y_pred_prob = model.predict_proba(X_test)

In [None]:
from sklearn.metrics import auc, roc_curve

fpr,tpr,thresholds = roc_curve(Y_test, y_pred_prob[::,1])
roc_auc = auc(fpr,tpr)
roc_auc

In [None]:
#lets plot the roc curve
plt.plot(fpr,tpr, color='darkorange', label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0,1],[0,1], color='navy', linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Reciever operating characteristic')
plt.legend(loc='lower right')
plt.show()

**KMeans Clustering**

Division of datapoints into clusters such that each data point is present in only one cluster

**Problem Statement**

To analyze the type of customers in the market based on the features

In [None]:
#lets import the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

In [None]:
#load the data into data variable
data = pd.read_csv('../input/data-science-machine-learning-and-ai-using-python/Example.csv')
data.head()

In [None]:
#plot the data using scatter plot: hint: x=data[Satisfaction], y=data['Loyalty]
plt.scatter(data['Satisfaction'], data['Loyalty'])
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')
plt.show()

In [None]:
#copy the entire data into X and select the features
X = data.copy()

In [None]:
#Clustering
from sklearn.cluster import KMeans

kmeans = KMeans(2)
kmeans.fit(X)

In [None]:
#Lets copy the clustering result
clusters = X.copy()
clusters['cluster_pred'] = kmeans.fit_predict(X)

In [None]:
#lets plot the clustered data
plt.scatter(clusters['Satisfaction'],clusters['Loyalty'], c = clusters['cluster_pred'], cmap='rainbow')
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')
plt.show()

In [None]:
#lets standardize the variables
from sklearn import preprocessing

x_scaled = preprocessing.scale(X)
x_scaled

In [None]:
#Elbow method
wcss = []

for i in range(1,30):
  kmeans = KMeans(i)
  kmeans.fit(x_scaled)
  wcss.append(kmeans.inertia_)

wcss

In [None]:
#lets visualize the elbow method
plt.plot(range(1,30), wcss)
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

In [None]:
#now lets take clusters = 4
kmeans_new = KMeans(4)
kmeans_new.fit(x_scaled)

cluster_new = X.copy()
cluster_new['cluster_pred'] = kmeans_new.fit_predict(x_scaled)
cluster_new

In [None]:
plt.scatter(cluster_new['Satisfaction'],cluster_new['Loyalty'], c = cluster_new['cluster_pred'], cmap='rainbow')
plt.xlabel('Satisfaction')
plt.ylabel('Loyalty')
plt.show()

**Analysis**

1. **Blue Churn:** These are the customers who are less satisfied and less loyal, and therfore can be termed as *Alienated*

2. **Yellow Churn :** These people are less satisfied but are highly loyal

3. **Purple Churn :** These people are with high loyality and high satisfaction and they are termed as *Fans*

4. **Red Churn :** These are the people who are in midst of the things.