### Miscellaneous Concepts about Datasets

Previous example: [/examples/toy_datasets.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/toy_datasets.ipynb)  
Next example: [/examples/shallow/random_forests.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/random_forests.ipynb)

In [1]:
from sklearn.datasets import make_blobs
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.utils import shuffle

In [2]:
data, label = make_blobs(n_samples=256, n_features=3, centers=3, cluster_std=2, random_state=1)
print(type(data), data.shape, data.dtype)
print(type(label), label.shape, label.dtype)

<class 'numpy.ndarray'> (256, 3) float64
<class 'numpy.ndarray'> (256,) int64


In [3]:
def print_stats(vec, msg): print(f'{msg}   -   Mean: {vec.mean():.2},  Std: {vec.std():.2},  Min: {vec.min():.2},  Max: {vec.max():.2}')
print_stats(data, 'Original')
minmax_scaler = MinMaxScaler()
minmax_scaler.fit(data)
minmax_scaled = minmax_scaler.transform(data)
print_stats(minmax_scaled, 'MinMax Scaled')
inv_minmax_data = minmax_scaler.inverse_transform(minmax_scaled)
print_stats(inv_minmax_data, 'Inverse MinMax Scaled')
std_scaler = StandardScaler()
std_scaler.fit(data)
std_scaled = std_scaler.transform(data)
print_stats(std_scaled, 'STD Scaled')
inv_std_data = std_scaler.inverse_transform(std_scaled)
print_stats(inv_std_data, 'Inverse STD Scaled')

Original   -   Mean: -4.1,  Std: 4.4,  Min: -1.4e+01,  Max: 9.2
MinMax Scaled   -   Mean: 0.45,  Std: 0.22,  Min: 0.0,  Max: 1.0
Inverse MinMax Scaled   -   Mean: -4.1,  Std: 4.4,  Min: -1.4e+01,  Max: 9.2
STD Scaled   -   Mean: -4.5e-17,  Std: 1.0,  Min: -2.3,  Max: 2.8
Inverse STD Scaled   -   Mean: -4.1,  Std: 4.4,  Min: -1.4e+01,  Max: 9.2


In [4]:
def print_head(data, label, msg): print(f'{msg}\ndata: {data[:5,0]},  label: {label[:5]}')
print_head(data, label, 'Original')
data_shuffled, labels_shuffled = shuffle(data, label, random_state=1)
print_head(data, label, 'Original after shuffle')
print_head(data_shuffled, labels_shuffled, 'Shuffled')

Original
data: [-8.44776017 -3.73563744 -2.59298899 -5.39246083 -2.02357399],  label: [2 0 1 0 1]
Original after shuffle
data: [-8.44776017 -3.73563744 -2.59298899 -5.39246083 -2.02357399],  label: [2 0 1 0 1]
Shuffled
data: [-8.95084114  1.72409235 -1.43312382 -8.39327843 -2.85048057],  label: [2 0 0 2 0]


Previous example: [/examples/toy_datasets.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/toy_datasets.ipynb)  
Next example: [/examples/shallow/random_forests.ipynb](https://github.com/serhatsoyer/py4ML/blob/main/examples/shallow/random_forests.ipynb)