---
title: "Standardize Features with Sklearn's StandardScaler"
description: "Sklearn can easily standardize features by turning into a Gaussian (or normal) distribution, meaning a mean of 0 and standard deviation of 1. Centering and scaling happen independently on each feature. Standardized features are a common requirement for many machine learning estimators."
tags: Scikit learn, Sklearn, Data Cleaning / Preprocessing
URL: https://www.datacamp.com/community/blog/scikit-learn-cheat-sheet https://github.com/kailashahirwar/cheatsheets-ai/blob/master/Scikit%20Learn.png
Licence: 
Creator: 
Meta: "fit_transform"

---

 <div>
    	<img src="./coco.png" style="float: left;height: 55px">
    	<div style="height: 150px;text-align: center; padding-top:5px">
        <h1>
      	Standardize Features with Sklearn's StandardScaler
        </h1>
        <p>Sklearn can easily standardize features by turning into a Gaussian (or normal) distribution, meaning a mean of 0 and standard deviation of 1. Centering and scaling happen independently on each feature. Standardized features are a common requirement for many machine learning estimators.</p>
    	</div>
		</div> 

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Key Code
    	</span>
		</div>
		</div>
			

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
# for one np array named data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)

In [None]:
# for training / testing batch
scaler = StandardScaler().fit(X_train)
standardized_X = scaler.transform(X_train)
standardized_X_test = scaler.transform(X_test)

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Preliminaries

In [28]:
import numpy as np
from sklearn.preprocessing import StandardScaler

## Example data

In [34]:
data = np.array([[0,0], [1,1], [1,0], [0,1]])

## Standardize it

In [35]:
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data)
scaled_data

array([[-1., -1.],
       [ 1.,  1.],
       [ 1., -1.],
       [-1.,  1.]])

In [37]:
scaled_data.mean(axis = 0)

array([0., 0.])

In [38]:
scaled_data.std(axis = 0)

array([1., 1.])

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Example
    	</span>
		</div>
		</div>
			

## Preliminaries

In [32]:
import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

## Example data

This is the dataset of house prices in Boston. See the learn more to read more about this example dataset.

In [39]:
boston = datasets.load_boston()
columns = boston.feature_names
bos_df = pd.DataFrame(boston.data, columns = columns)
bos_df['PRICE'] = boston.target
bos_df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,PRICE
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


## Split into test and train batches

In [40]:
X = bos_df.drop('PRICE', axis = 1)
y = bos_df['PRICE']
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size = 0.20, random_state = 13)

## Using StandardScaler

In [51]:
scaler = StandardScaler().fit(X_train)
standardized_X = scaler.transform(X_train)
standardized_X_test = scaler.transform(X_test)

 <div style="height:40px">
		<div style="width:100%; text-align:center; border-bottom: 1px solid #000; line-height:0.1em; margin:40px 0 20px;">
    	<span style="background:#fff; padding:0 10px; font-size:25px; font-family: 'Open Sans', sans-serif;">
        Learn More
    	</span>
		</div>
		</div>
			

In [52]:
print(boston.DESCR)

.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pu