# **[Beginners Guide to Classification Analysis and Plot Intrepretation](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)**
![](https://expertsystem.com/wp-content/uploads/2017/03/machine-learning-definition.jpeg)
### Earlier I made a notebook on Regression([Open Here](https://www.kaggle.com/kshitijmohan/regression-complete-analysis)) and this time We'll be focussing on Classification.

## [Table of Contents:](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
* **Understanding Classification**
* **How does Classification Work?**
* **Types of Algorithms**
* **Testing of Algorithms**

## **[Algorithms that we will consider:-](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)**
* Logistic Regression
* K-Nearest Neighbour
* Support Vector Machine
* Kernel SVM
* Naive Bayes
* Decision Tree Classification
* Random Forest Classification

## [Lets Start with Understanding what is Classification?](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**Classification is a type of supervised learning. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. A classification model attempts to draw some conclusion from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes.
For example:- When filtering emails “spam” or “not spam”, when looking at transaction data, “fraudulent”, or “authorized”.**

## [How does Classification Work?](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**The classification predictive modeling is the task of approximating the mapping function from input variables to discrete output variables. The main goal is to identify which class/category the new data will fall into.**
![](https://www.comodo.com/images/best-free-spam-removal-software.png)
**Spam mail detection can be identified as a classification problem, this is a binary classification since there can be only two classes i.e mail is spam or not. The classifier, in this case, needs training data to understand how the given input variables are related to the class. And once the classifier is trained accurately, it can be used to detect whether a particular mail is spam or not.**

## [We can apply machine learning model by following six steps:-](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
* Indentifying Problem
* Analysing Data
* Preparing Data
* Evaluating Algorithm
* Improving Results
* Presenting Results

# [Logistic Regression](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
It is a classification algorithm in machine learning that uses one or more independent variables to determine an outcome. The outcome is measured with a dichotomous variable meaning it will have only two possible outcomes. It was derives by equating the Linear Regresson function with the Sigmoid function:
* **y = a0 + a1X1 + a2X2 …. + anXn - Linear Regression Function**
* **p = 1 / (1 + e^(-y)) - Sigmoid Function**

On equating the above 2, We get:-
* **p = 1 / 1 + e^(-(a0 + a1X1 + a2X2 …. + anXn)) - Logistic Regression Function**


## [Preparing Dataset](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)

In [None]:
# Importing Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## [About Dataset:](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
I am using Breast-Cancer dataset for this kernel as it is one of the most popular and easy to understand dataset. I will be predict a tumour as malignant or benign on the basis of mean-texture and mean-radius (To show the working of classifying algorithms).

In [None]:
dataset = pd.read_csv('../input/breast-cancer-wisconsin-data/data.csv')
X = dataset.iloc[:, 2:4].values
y = dataset.iloc[:, 1].values
dataset.head()

In [None]:
# As y contains text, we need to encode it.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)

In [None]:
# Splitting dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [None]:
# Applying Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
# Trainig the Model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Logistic Regression')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [K-Nearest Neighbor](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**Classification is computed from a simple majority vote of the k nearest neighbors of each point. It is supervised and takes a bunch of labeled points and uses them to label other points. To label a new point, it looks at the labeled points closest to that new point also known as its nearest neighbors. It has those neighbors vote, so whichever label the most of the neighbors have is the label for the new point. The “k” is the number of neighbors it checks.**

In [None]:
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 35, metric = 'minkowski', p = 2)
# I increased the value of K (Number of neighbours) as the model was overfitting with less number of neighbours.
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('K-Nearest Neighbor')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [Support Vector Machine](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**The support vector machine is a classifier that represents the training data as points in space separated into categories by a gap as wide as possible. New points are then added to space by predicting which category they fall into and which space they will belong to.**

In [None]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Support Vector Machine')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [Kernel SVM](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**The support vector machine is a classifier that represents the training data as points in space separated into categories by a gap as wide as possible. New points are then added to space by predicting which category they fall into and which space they will belong to.**

In [None]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0, C=0.5, gamma=0.5)
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Kernel SVM')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [Naive Bayes](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**It is a classification algorithm based on Bayes’s theorem which gives an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Even if the features depend on each other, all of these properties contribute to the probability independently. Naive Bayes model is easy to make and is particularly useful for comparatively large data sets.**

In [None]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Naive Bayes')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [Decision Tree Classification](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**The decision tree algorithm builds the classification model in the form of a tree structure. It utilizes the if-then rules which are equally exhaustive and mutually exclusive in classification. The process goes on with breaking down the data into smaller structures and eventually associating it with an incremental decision tree. The final structure looks like a tree with nodes and leaves. The rules are learned sequentially using the training data one at a time. Each time a rule is learned, the tuples covering the rules are removed. The process continues on the training set until the termination point is met.**

In [None]:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0, max_depth=3, min_samples_split=0.8)
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Decision Tree Classification')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# [Random Forest Classification](https://www.kaggle.com/kshitijmohan/lstm-stock-prediction)
**Random decision trees or random forest are an ensemble learning method for classification, regression, etc. It operates by constructing a multitude of decision trees at training time and outputs the class that is the mode of the classes or classification or mean prediction(regression) of the individual trees.**

In [None]:
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 20, criterion = 'entropy', random_state = 0, max_depth=5)
classifier.fit(X_train, y_train)

In [None]:
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print('Accuracy = '+str(accuracy_score(y_test, y_pred)))

import seaborn as sns
plt.subplots(figsize=(5,5))
ax= plt.subplot()
sns.heatmap(cm, annot=True, ax = ax);

ax.set_xlabel('Prediction');ax.set_ylabel('Label'); 
ax.set_title('Confusion Matrix'); 

In [None]:
def Label(val):
    if val==0:
        return 'Malignant'
    else:
        return 'Benign'
from matplotlib.colors import ListedColormap
plt.style.use('fivethirtyeight')
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.15),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.3, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = Label(j), alpha = 0.7, s = 50)
plt.title('Random Forest Classification')
plt.xlabel('Mean-Texture')
plt.ylabel('Mean-Radius')
plt.axis([5,25,0,50])
plt.legend(loc = 'upper left')
plt.show()

# Thank you very much for your attention to my work. I wish you great datasets for research!!..
![](https://i.pinimg.com/originals/4f/92/fe/4f92fe4ee07e79bc3495e41bb5ae1bd3.gif)