# **PIPELINES (Process flows)**

## **1. Introduction**

The scikit-learn library in Python allows you to **automate** machine learning processes using the **Pipeline class**. This function **encapsulates** the different **steps** of a learning model with their respective parameters—for example, preprocessing and classification.

### **Advantages**



* Pipelines behave like a new model, meaning they can be used with other tools such as **GridSearchCV** for hyperparameter tuning.

* You only need to call ```.fit()``` and ```.predict()``` once to run the **full sequence** of estimators.

* All intermediate steps in a pipeline must implement a ```.transform()``` method—except the final step, which can be an estimator like a classifier or regressor.

* The pipeline will expose all the methods of the final step (e.g., ```.predict()```).

## **2. Creating a Pipeline**

To use the ```Pipeline``` class, you provide a list of tuples. Each tuple should contain:

* A name (or label) for the step (e.g., ```scale```)

* The transformer or process function (e.g., ```StandardScaler()```)



```
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron

pipe = Pipeline([
    ('scale', StandardScaler()),
    ('perceptron', Perceptron(max_iter=1000))
])
```



You can also **abreviate** pipelines using the function ```make_pipeline()```, which doesn't require naming the steps



```
from sklearn.pipeline import make_pipeline

pipe_short = make_pipeline(
    StandardScaler(),
    Perceptron(max_iter=1000)
)
```



## **3. Accessing Pipeline Steps and Parameters**

Once a pipeline is created, it's important to know how to inspect and interact with its components. Scikit-learn provides two main ways to do this:

> **Accessing individual steps**


You can use the ```.named_steps``` attribute to access the components of the pipeline as a dictionary.

Example:


```
scaler = pipe.named_steps['scale']
classifier = pipe.named_steps['perceptron']
```



This is useful if you want to:

* Check the fitted scaler parameters

* Access the classifier's attributes like ```.coef_```, ```.intercept_```, etc.




```
print(scaler.mean_)         # Mean of each feature
print(classifier.coef_)     # Coefficients learned by the Perceptron
```



> **Accessing all parameters**

You can get a full list of all parameters of all steps in the pipeline using the ```.get_params()``` method.

This will return a list of parameter names in the format

```
<step_name>__<parameter_name>
```

This is especially useful for:

* Modifying parameters after creation

* Passing the pipeline to ```GridSearchCV``` for hyperparameter tuning



# **Example**

In [25]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline
from sklearn.decomposition import PCA
from sklearn.metrics import balanced_accuracy_score

In [16]:
# Load and split the data
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data,
                                                    data.target,
                                                    test_size=0.3,
                                                    random_state=1)

In [22]:
# Construct some pipelines
model_svm = Pipeline([('scale', StandardScaler()),
                      ('pca', PCA(n_components=2)),
                      ('svm', LinearSVC(loss='hinge', max_iter=1000, tol=1e-3))])

In [23]:
# Fit the model
model_svm.fit(X_train, y_train)

In [28]:
print("PCA explained variance ratio:", model_svm.named_steps['pca'].explained_variance_ratio_)
print("SVM coefficients:", model_svm.named_steps['svm'].coef_)

PCA explained variance ratio: [0.7215793  0.23455023]
SVM coefficients: [[-1.25743722  0.28778348]
 [ 0.19186945 -1.0235599 ]
 [ 1.94758891 -0.10479047]]


In [27]:
# Evaluate accuracy of models
y_pred = model_svm.predict(X_test)
balanced_acc = balanced_accuracy_score(y_test, y_pred)
print(f"Balanced Accuracy Score: {balanced_acc:.4f}")

Balanced Accuracy Score: 0.8262


In [10]:
# Access step parameters
model_svm.named_steps

{'scale': StandardScaler(),
 'pca': PCA(n_components=2),
 'svm': LinearSVC(loss='hinge')}

In [11]:
# Access to a step
model_svm.named_steps["pca"]

In [12]:
# Access step parameters
model_svm.steps

[('scale', StandardScaler()),
 ('pca', PCA(n_components=2)),
 ('svm', LinearSVC(loss='hinge'))]

In [13]:
# Access all parameters of pipeline
model_svm.get_params()

{'memory': None,
 'steps': [('scale', StandardScaler()),
  ('pca', PCA(n_components=2)),
  ('svm', LinearSVC(loss='hinge'))],
 'transform_input': None,
 'verbose': False,
 'scale': StandardScaler(),
 'pca': PCA(n_components=2),
 'svm': LinearSVC(loss='hinge'),
 'scale__copy': True,
 'scale__with_mean': True,
 'scale__with_std': True,
 'pca__copy': True,
 'pca__iterated_power': 'auto',
 'pca__n_components': 2,
 'pca__n_oversamples': 10,
 'pca__power_iteration_normalizer': 'auto',
 'pca__random_state': None,
 'pca__svd_solver': 'auto',
 'pca__tol': 0.0,
 'pca__whiten': False,
 'svm__C': 1.0,
 'svm__class_weight': None,
 'svm__dual': 'auto',
 'svm__fit_intercept': True,
 'svm__intercept_scaling': 1,
 'svm__loss': 'hinge',
 'svm__max_iter': 1000,
 'svm__multi_class': 'ovr',
 'svm__penalty': 'l2',
 'svm__random_state': None,
 'svm__tol': 0.0001,
 'svm__verbose': 0}