## Scikit Learn Preprocessing

In this notebook, we'll use `sklearn.preprocessing` to do some scaling for us. If you need to prepare data for machine learning or feature extraction, the [sklearn.preprocessing documentation](http://scikit-learn.org/stable/modules/preprocessing.html) has great examples.

In [None]:
from sklearn import preprocessing
import pandas as pd
from datetime import datetime

In [None]:
hvac = pd.read_csv('../data/HVAC_with_nulls.csv')

## Checking Data Quality

In [None]:
hvac.dtypes

In [None]:
hvac.shape

In [None]:
hvac.head()

## Impute missing values with mean

In [None]:
imp = preprocessing.Imputer(missing_values='NaN', 
                            strategy='mean')

In [None]:
hvac_numeric = hvac[['TargetTemp', 'SystemAge']]

In [None]:
imp = imp.fit(hvac_numeric.loc[:10])

In [None]:
transformed = imp.fit_transform(hvac_numeric)

In [None]:
transformed

In [None]:
hvac['TargetTemp'], hvac['SystemAge'] = transformed[:,0], transformed[:,1]

In [None]:
hvac.head()

## Scale temperature values

In [None]:
hvac['ScaledTemp'] = preprocessing.scale(hvac['ActualTemp'])

In [None]:
hvac['ScaledTemp'].head()

## Scale using a min and max scaler

In [None]:
min_max_scaler = preprocessing.MinMaxScaler()

In [None]:
temp_minmax = min_max_scaler.fit_transform(hvac[['ActualTemp']])

In [None]:
temp_minmax

### Exercise: add the `temp_minmax` back to the dataframe as a new column

In [None]:
# %load ../solutions/preprocessing.py

