<a href="https://colab.research.google.com/github/plaban1981/Pipelines/blob/master/Auomate_ML_with_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Automate the Machine Learning Model Implementation with Sklearn Pipeline

Many times while working on Machine Learning problems, we come across the Machine Learning task where we want to preprocess our data and test our model with different classifiers to choose the best one. 

In such cases, fitting each classifier individually on training data and then testing the model is too tedious, not to mention there’s a large amount of redundant coding is also involved. 

Plus, if your algorithm involves cross-validation and your preprocessing step involves operation like normalization or standardization, performing normalization or standardization on the full training set before learning will influence your training set with the scale of the test set. Wouldn’t it be nice if there was a single solution to all these problems?

Well, there’s! **Scikit-Learn has a Pipeline module** that provides an easy way to tackle the above problems.


**Pipeline** is a function that sequentially applies a list of transforms and a final estimator. The purpose of the pipeline is to assemble several steps that can be cross-validated together while setting different parameters.
Now let’s see an implementat

In [0]:
from sklearn import datasets

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import LinearSVC
from sklearn.ensemble import RandomForestClassifier

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# load data

In [0]:
iris = datasets.load_iris()
X = iris.data
Y = iris.target

# split the data into training and test set

In [0]:
X_train,X_test,y_train,y_test = train_test_split(X,Y,test_size=0.15,random_state=10)

# make a list of classifier names and their respective functions from Scikit-Learn. And finally, zip them together. This step will ensure that we pass all the classifiers to our Pipeline function in a single shot along with their names.

In [0]:
classifier_names = ["Logistic Regression", "KNN", "Random Forest","SVM"]
classifiers = [LogisticRegression(), KNeighborsClassifier(), RandomForestClassifier(), LinearSVC()]
zipped_clf = zip(classifier_names,classifiers)

#Prepare Pipeline of Standardscalar with classifiers, and feed the result of Pipeline to fit_classifier() function

In [0]:
def classifier(classifier,X_train, y_train, X_test, y_test):
  result = []
  for n,c in classifier:
    print('Classifier : ',n)
    print('*'*80)
    checker_pipeline = Pipeline([('stndarize',StandardScaler()),
                                 ('classifier',c) ])
    print("Validation result for {}".format(n))
    print(c)
    print('*'*80)
    clf_acc = fit_classifier(checker_pipeline,X_train, y_train, X_test, y_test)
    result.append((n,clf_acc))
    
  return result

# define fit_classifier() 

In [0]:
def fit_classifier(pipeline, x_train, y_train, x_test, y_test):
    model_fit = pipeline.fit(x_train, y_train)
    y_pred = model_fit.predict(x_test)
    accuracy = accuracy_score(y_test, y_pred)
    print("accuracy score: {0:.2f}%".format(accuracy*100))
    return accuracy

# Test the algorithm

In [30]:
result = classifier(zipped_clf, X_train, y_train, X_test, y_test)

Classifier :  Logistic Regression
********************************************************************************
Validation result for Logistic Regression
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)
********************************************************************************
accuracy score: 100.00%
Classifier :  KNN
********************************************************************************
Validation result for KNN
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')
********************************************************************************
accuracy score: 95.

In [31]:
result

[('Logistic Regression', 1.0),
 ('KNN', 0.9565217391304348),
 ('Random Forest', 1.0),
 ('SVM', 0.9565217391304348)]