### Pipeline
`sklearn.pipeline`

##### Why use pipelines?
1. <b>Simplifies Workflow</b> - Automates repetitive steps in machine learning workflows.
2. <b>Prevents Data Leakage</b> - Ensures transformations asre applied only to training data before training.
3. <b>Consistent Data Processing</b> - Keeps preprocessing and modeling in sync, avoiding errors.
4. <b>Hyperparameter Tuning</b> - Easily integrates with `GridSearchCV` or `RandomizedSearchCV` to optimize parameters.
5. <b>Improved Code Readability</b> - Recudes manual transformations and separate function calls.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Sample dataset
X = np.array([[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]])
y = np.array([3, 5, 7, 9, 11])

# Split data into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Step 1: Scale features
    ('model', LinearRegression())  # Step 2: Train Linear Regression
])

# Train pipeline
pipeline.fit(X_train, y_train)

# Predict using pipeline
y_pred = pipeline.predict(X_test)

print("Predictions:", y_pred)


##### How it works
1. <b> Scaling (`StandardScaler`) </b>: Ensures all features have similar scales (mean = 0, variance = 1)
2. <b> Linear Regression</b>: Trains the model on the scaled data
3. <b> Pipeline handles everything</b>: You dont need to manually transform `x_train`, `x_test` before training/predicting