# Standard Scaler in Machine Learning

This notebook demonstrates how to use `StandardScaler` from `sklearn.preprocessing` to normalize numerical features in a dataset. Standardization is a common preprocessing step that ensures each feature has a mean of 0 and a standard deviation of 1.

We'll create a sample DataFrame and observe how scaling transforms the data.


This script applies StandardScaler from sklearn to normalize the Age and Salary features of a small dataset. Standard scaling adjusts the data to have a mean of 0 and standard deviation of 1, which is important for many machine learning models that rely on distance-based calculations. The output shows the scaled features alongside the original names for reference.

In [1]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import numpy as np
# Importing all the libraries that we will need in the for the in the further code

In [3]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 45],
    'Salary': [50000, 60000, 70000, 80000, 90000]
}
# Providing with the small data set

In [4]:
df = pd.DataFrame(data)

In [5]:
print('Original Data:')
print(df)
#displaying the original data set

Original Data:
      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   60000
2  Charlie   35   70000
3    David   40   80000
4      Eva   45   90000


In [6]:
min_max_scaler = MinMaxScaler()
# using minimax scalar

In [7]:
df[['Age', 'Salary']] = min_max_scaler.fit_transform(df[['Age', 'Salary']])

In [8]:
print("\nAfter Min-Max Normalization:")
print(df)
#result of minimax scalar


After Min-Max Normalization:
      Name   Age  Salary
0    Alice  0.00    0.00
1      Bob  0.25    0.25
2  Charlie  0.50    0.50
3    David  0.75    0.75
4      Eva  1.00    1.00


In [14]:
scaler = StandardScaler()
#using of standard scalar

In [15]:
scaled_values = scaler.fit_transform(df[['Age', 'Salary']])
#fitting and transforming the numerical features

In [16]:
scaled_df = pd.DataFrame(scaled_values, columns=['Age_scaled', 'Salary_scaled'])
#reading new data frame with scaled values

In [17]:
result = pd.concat([df['Name'], scaled_df], axis=1)

In [18]:
print("\nStandard Scaled Data:\n", result)
#displaying the scale data


Standard Scaled Data:
       Name  Age_scaled  Salary_scaled
0    Alice   -1.414214      -1.414214
1      Bob   -0.707107      -0.707107
2  Charlie    0.000000       0.000000
3    David    0.707107       0.707107
4      Eva    1.414214       1.414214


## Conclusion

Standardization helps bring all features to a similar scale, which can improve the performance of many machine learning models. In this notebook, we successfully transformed raw numerical data into a standardized format using `StandardScaler`.
