# Perceptron
You should build an end-to-end machine learning pipeline using a perceptron model. In particular, you should do the following:
- Load the `mnist` dataset using [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html). You can find this dataset in the datasets folder.
- Split the dataset into training and test sets using [Scikit-Learn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html).
- Build an end-to-end machine learning pipeline, including a [perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html) model.
- Optimize your pipeline by cross-validating your design decisions. 
- Test the best pipeline on the test set and report various [evaluation metrics](https://scikit-learn.org/0.15/modules/model_evaluation.html).  
- Check the documentation to identify the most important hyperparameters, attributes, and methods of the model. Use them in practice.

## Importing libraries and the data

In [1]:
# importing the important libraries
import pandas as pd
from sklearn.model_selection import train_test_split

from sklearn.linear_model import Perceptron

from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.metrics import accuracy_score

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV



In [2]:
# importing the data
df = pd.read_csv("mnist.csv")


In [3]:
# dropping the id column
df.drop('id', inplace = True, axis = 1)
df.head()

Unnamed: 0,class,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
0,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,8,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,3,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Splitting the data into train and test data

In [4]:
# splitting the data into features and target
X = df.drop('class', axis = 1)
y = df['class']

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(2800, 784)
(1200, 784)
(2800,)
(1200,)


## Building the model

In [5]:
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)

In [6]:
param = {"alpha": [0.0001, 0.001, 0.01],
        'tol' : [0.001, 0.01]}

model = GridSearchCV(Perceptron(), param, cv = 3)

model.fit(x_train, y_train)

# Checking accurary of the model and the best hyperparameters
print("Accurary of best decision tree classifier = {:.2f}%".format(model.best_score_*100))
print("Best hyperparameter for the model = {}".format(model.best_params_))


Accurary of best decision tree classifier = 82.75%
Best hyperparameter for the model = {'alpha': 0.0001, 'tol': 0.01}


In [7]:
# predicting using the model
y_pred = model.predict(x_test)
y_pred

array([1, 8, 7, ..., 7, 9, 1], dtype=int64)

## Model evaluation

In [8]:
# finding the accuracy
accuracy = accuracy_score(y_pred, y_test)
accuracy

0.7983333333333333

In [9]:
precision, recall, fscore,support = score(y_pred, y_test)
pd.DataFrame()