### In multiple regression, a pipeline refers to a sequential process of transforming data and applying a machine learning model in a structured way. It is a core concept in modern machine learning workflows, especially when you need to perform a series of operations on your data, like preprocessing and fitting a model, in an organized manner.

#### Why Use a Pipeline? 
#### Streamlines Workflows: Automates and connects steps like data transformation, scaling, encoding, and modeling.
#### Prevents Data Leakage: Ensures preprocessing steps (e.g., scaling or encoding) are only applied to the training data during training, and the exact same transformations are applied to test data during testing.
#### Improves Code Maintenance: Keeps the code modular, clean, and easy to understand.

In [1]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Sample dataset
data = pd.DataFrame({
    'age': [25, 30, 35, 40, 28],
    'salary': [50000, 60000, 70000, 80000, 52000],
    'city': ['A', 'B', 'A', 'C', 'B'],
    'target': [200, 250, 300, 350, 220]
})

# Features and target
X = data[['age', 'salary', 'city']]
y = data['target']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Preprocessing steps
numerical_features = ['age', 'salary']
categorical_features = ['city']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_features),  # Scale numerical features
        ('cat', OneHotEncoder(), categorical_features)  # Encode categorical features
    ])

# Pipeline definition
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),  # Preprocessing step
    ('regressor', LinearRegression())  # Regression model
])

# Fit the pipeline
pipeline.fit(X_train, y_train)

# Predict
y_pred = pipeline.predict(X_test)

# Print predictions
print("Predictions:", y_pred)


Predictions: [249.37840785]
