## ðŸ”§ **What is a Pipeline in Machine Learning?**

A **Pipeline** is a tool that allows you to connect multiple steps of a machine-learning workflow into **one single flow**.

It means:

> **You put all steps together so they run automatically in the correct order.**

Example steps:
Data Cleaning â†’ Scaling â†’ Model Training â†’ Prediction

With a pipeline, all these steps run together without you repeating code.

---

## ðŸ‘‰ Simple Example

Normally you write:

* Scale the data
* Train the model

With Pipeline:

```python
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', LogisticRegression())
])
```

Now the pipeline will **automatically scale the data and train the model**.

---

## ðŸŽ¯ Why do we use Pipelines?

âœ” Keeps code clean
âœ” Prevents data leakage
âœ” Works perfectly with cross-validation
âœ” Makes your workflow easy and repeatable
âœ” Ensures every step runs in the correct order



In [None]:
# import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.compose import ColumnTransformer

#load dataset
data = sns.load_dataset('titanic')

#split dataset into features and target variable
X = data.drop(columns=['pclass', 'sex', 'fare', 'embarked'])
y = data['survived']

#split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the columns trasnformer for imputing missing values and encoding categorical variables
numeric_features = ['age', 'fare']
categorical_features = ['pclass', 'sex', 'embarked']

# Define the preprocessing for numeric features
numeric_features = Pipeline(steps=[
    ('imputer',SimpleImputer(strategy='mean'))
])

