# Sklearn Pipeline

* pipeline chains together multiple steps so that the output of each step is used as input to the next step.
* Pipelines makes it easy to apply the same preprocessing to train and test easily.

we can create pipeline with two classes : 
1. "Pipeline" class  - we can import it from the **sklearn.pipeline import pipeline**.
2. "make_pipeline" class - we can import it from the **sklearn.pipeline import make_pipeline**.

we can also display the whole pipeline with small configuration

       from sklearn import set_config
       set_config(display = 'diagram')

# demonstaring "sklearn pipeline" using a simple dataset

In [84]:
# import libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler,MinMaxScaler,OneHotEncoder,LabelEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import accuracy_score

In [64]:
# Importing notebook

df = pd.read_csv('weather_classification_data.csv')

df.head()

Unnamed: 0,Temperature,Humidity,Wind Speed,Precipitation (%),Cloud Cover,Atmospheric Pressure,UV Index,Season,Visibility (km),Location,Weather Type
0,14.0,73,9.5,82.0,partly cloudy,1010.82,2,Winter,3.5,inland,Rainy
1,39.0,96,8.5,71.0,partly cloudy,1011.43,7,Spring,10.0,inland,Cloudy
2,30.0,64,7.0,16.0,clear,1018.72,5,Spring,5.5,mountain,Sunny
3,38.0,83,1.5,82.0,clear,1026.25,7,Spring,1.0,coastal,Sunny
4,27.0,74,17.0,66.0,overcast,990.67,1,Winter,2.5,mountain,Rainy


In [65]:
x = df.drop('Weather Type',axis = 1)
y = df['Weather Type']

In [66]:
x_train,x_test,y_train,y_test = train_test_split(x,y,
                                                test_size = 0.2,
                                                random_state = 20,
                                                stratify = y)

In [67]:
sta_sca_tuple = ('sta_sca',StandardScaler(),['Temperature','Humidity','Precipitation (%)','Atmospheric Pressure'])
min_max_tuple = ('min_max',MinMaxScaler(),['Wind Speed','UV Index','Visibility (km)'])
one_hot_tuple = ('one_hot',OneHotEncoder(sparse_output = False),['Cloud Cover','Season','Location'])

In [68]:
preprocessor = ColumnTransformer(transformers = [
    sta_sca_tuple,
    min_max_tuple,
    one_hot_tuple
],remainder = 'passthrough')

## Creating pipeline

In [69]:
from sklearn.pipeline import make_pipeline

pipe_1 = make_pipeline(preprocessor)

In [70]:
# applying pipeline to x_train 

x_train_transformed = pipe_1.fit_transform(x_train)

In [71]:
# applying pipeline to x_test

x_test_transformed = pipe_1.transform(x_test)

## Displaying pipeline

In [91]:
from sklearn import set_config
set_config(display = 'diagram')

In [92]:
pipe_1

In [76]:
# converting categorical target to numeric

label_encoder = LabelEncoder()

y_train_transformed = label_encoder.fit_transform(y_train)

y_test_transformed = label_encoder.transform(y_test)

## Model Training and prediction

In [78]:
from sklearn.tree import DecisionTreeClassifier

In [79]:
model = DecisionTreeClassifier(random_state = 20)

In [80]:
model.fit(x_train_transformed,y_train_transformed)

In [82]:
y_pred = model.predict(x_test_transformed)

In [86]:
print("Accuracy : ",accuracy_score(y_pred,y_test_transformed))

Accuracy :  0.9109848484848485
