<a href="https://colab.research.google.com/github/sivasaiyadav8143/Machine-Learning-with-Python/blob/master/Pipeline_%26_Make_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# importing all required libraries
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

In [None]:
#read the dataset
pima_df = pd.read_csv("pima-indians-diabetes.csv")

In [None]:
# print first 5 observations
pima_df.head()

Unnamed: 0,preg,plas,Pres,Skin,Test,Mass,Pedi,Age,Class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
# check for NA / null values in each feature/column
pima_df.isna().sum()

preg     0
plas     0
Pres     0
Skin     0
Test     0
Mass     0
Pedi     0
Age      0
Class    0
dtype: int64

In [None]:
# separate features and target
X = pima_df.iloc[:,:-1] 
Y = pima_df.iloc[:,-1]   
test_size = 0.30 # taking 70:30 training and test set
seed = 7  # Random numbmer seeding for reapeatability of the code
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

## Pipeline
Pipeline is used to automate machine learning workflows.It undergoes all the steps mentioned in sequence. While training, it performs fit_transform and in testing it transforms all the operations that underwent in training.<br>
<img src="Screenshots/pipeline.png"><br>
Image Source : Python Deeper Insights Into Machine Learning

In [None]:
from sklearn.pipeline import Pipeline

# it takes a list of tuples as parameter
pipeline = Pipeline([
    ('scaler',StandardScaler()),
    ('knn', KNeighborsClassifier())
])

pipeline.fit(X_train,y_train)

Pipeline(memory=None,
         steps=[('scaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('knn',
                 KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                      metric='minkowski', metric_params=None,
                                      n_jobs=None, n_neighbors=5, p=2,
                                      weights='uniform'))],
         verbose=False)

In [None]:
pipeline.named_steps['knn']

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [None]:
from sklearn import metrics

predict = pipeline.predict(X_test)
print(f'Accuracy on Test : {metrics.accuracy_score(y_test,predict)}\n ')
print(metrics.classification_report(y_test,predict))

Accuracy on Test : 0.7012987012987013
 
              precision    recall  f1-score   support

           0       0.75      0.80      0.77       147
           1       0.60      0.54      0.57        84

    accuracy                           0.70       231
   macro avg       0.68      0.67      0.67       231
weighted avg       0.70      0.70      0.70       231



## Make_pipeline
The difference b/w Pipeline and Make_pipeline is that make_pipeline automatically set the names of the estimators the lowercase.it does not require, and does not permit, naming the estimators. 

In [None]:
from sklearn.pipeline import make_pipeline

make_pipe = make_pipeline(StandardScaler(),KNeighborsClassifier())
make_pipe.fit(X_train,y_train)

Pipeline(memory=None,
         steps=[('standardscaler',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('kneighborsclassifier',
                 KNeighborsClassifier(algorithm='auto', leaf_size=30,
                                      metric='minkowski', metric_params=None,
                                      n_jobs=None, n_neighbors=5, p=2,
                                      weights='uniform'))],
         verbose=False)

In [None]:
make_pipe.named_steps['kneighborsclassifier']

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [None]:
from sklearn import metrics

predict = make_pipe.predict(X_test)
print(f'Accuracy on Test : {metrics.accuracy_score(y_test,predict)}\n ')
print(metrics.classification_report(y_test,predict))

Accuracy on Test : 0.7012987012987013
 
              precision    recall  f1-score   support

           0       0.75      0.80      0.77       147
           1       0.60      0.54      0.57        84

    accuracy                           0.70       231
   macro avg       0.68      0.67      0.67       231
weighted avg       0.70      0.70      0.70       231

