# Feature Normalisation

We will be normalising the features that we selected for clustering by using `MaxAbsScaler`.

As explained by [this article](https://engineering.teknasyon.com/how-to-normalize-your-unsupervised-data-for-clustering-methods-9389298d20d5), `MaxAbsScaler` retains the ability for the algorithm to detect outliers while also being a better choice over `StandardScaler` due to most of our features having different types.

**Setting up**

In [4]:
%load_ext kedro.ipython
%load_ext autoreload
%matplotlib inline
%autoreload 2

The kedro.ipython extension is already loaded. To reload it, use:
  %reload_ext kedro.ipython
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [5]:
import pandas as pd
import polars as pl
import numpy as np

from sklearn.preprocessing import MaxAbsScaler

import logging

from usg.utils import *

log = logging.getLogger(__name__)
log.setLevel(logging.INFO)
sb.set()

In [6]:
df1 = catalog.load('features_eng_1').set_index('appid')
df2 = catalog.load('features_eng_2').set_index('appid')
df = df1.join(df2, how='left')[columns]
scaler = MaxAbsScaler()
df = pd.DataFrame(data=scaler.fit_transform(df), columns=df.columns, index=df.index).reset_index(names="appid")
catalog.save('train', df)
catalog.save('model@scaler', scaler)
df

Unnamed: 0,appid,english,windows,mac,linux,Single-player,Multi-player,Indie,Action,Casual,...,price,est_owners,num_categories,num_genres,num_steamspy_tags,positive_ratings,negative_ratings,ratings_ratio,average_playtime,median_playtime
0,10,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,...,0.017038,0.1000,0.222222,0.0625,1.0,4.709341e-02,0.006855,0.143174,0.092391,0.001663
1,20,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,...,0.009455,0.0500,0.222222,0.0625,1.0,1.254725e-03,0.001300,0.020122,0.001453,0.000325
2,30,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,...,0.009455,0.0500,0.111111,0.0625,1.0,1.291784e-03,0.000817,0.032948,0.000981,0.000178
3,40,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,...,0.009455,0.0500,0.222222,0.0625,1.0,4.813939e-04,0.000548,0.018302,0.001353,0.000965
4,50,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,...,0.009455,0.0500,0.166667,0.0625,1.0,1.985324e-03,0.000591,0.069978,0.003273,0.002177
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27070,1065230,1.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,...,0.004953,0.0001,0.111111,0.1875,1.0,1.134471e-06,0.000000,0.000000,0.000000,0.000000
27071,1065570,1.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,...,0.004005,0.0001,0.055556,0.1875,1.0,3.025256e-06,0.000002,0.030710,0.000000,0.000000
27072,1065650,1.0,1.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,...,0.009455,0.0001,0.388889,0.1875,1.0,0.000000e+00,0.000002,0.000000,0.000000,0.000000
27073,1066700,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,1.0,...,0.012299,0.0001,0.111111,0.1875,1.0,7.563141e-07,0.000000,0.000000,0.000000,0.000000
