# Modeling With Transformations

In [1]:
import tensorflow as tf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.compose import make_column_transformer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split

## Import Some Data

In [2]:
dataUrl = "https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv"
dataFromWeb = pd.read_csv(dataUrl)
dataFromWeb.head()

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [3]:
labelField = 'charges'
featureData = dataFromWeb.drop(labelField, axis=1)
labelData = dataFromWeb[labelField]

## Split into Training & Testing Data

In [4]:
testDataPercentage = .2 # how much of our data should we use for "testing"
randomVal = 42
feature_training_data, feature_testing_data, label_training_data, label_testing_data = train_test_split(featureData, 
                                                    labelData, 
                                                    test_size=testDataPercentage, 
                                                    random_state=randomVal) # set random state for reproducible splits

## Transform
The `make_column_transformer` ([docs](https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_transformer.html)) function can take a list of transformer functions along with a list of columns to apply the transformer to. This creates a transformer instance.  
The transformer instance, then, gets fitted to the data with the `fit` method.  
The transformer instnace, then, gets used with the data with the `transform` method.  
Here will be applied two transformers:
- the `MinMaxScaler`([docs](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)), to scale the `age`, `bmi`, and `children` column values to values between 0-and-1
- the `OneHotEncoder`([docs](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)), to ohe the `sex`, `smoker`, and `region` values

In [5]:
dataTransformer = make_column_transformer(
    # get all values between 0 and 1
    (MinMaxScaler(), ["age", "bmi", "children"]),
    (OneHotEncoder(handle_unknown="ignore"), ["sex", "smoker", "region"])
)

dataTransformer.fit(feature_training_data)

## Normalize

In [6]:
normailized_feature_training_data = dataTransformer.transform(feature_training_data)
normailized_feature_testing_data = dataTransformer.transform(feature_testing_data)

### Compare normalized vs non-normalized

In [7]:
normailized_feature_training_data[0]

array([0.60869565, 0.10734463, 0.4       , 1.        , 0.        ,
       1.        , 0.        , 0.        , 1.        , 0.        ,
       0.        ])

In [8]:
feature_training_data.loc[0]

age                19
sex            female
bmi              27.9
children            0
smoker            yes
region      southwest
Name: 0, dtype: object

In [9]:
normailized_feature_training_data.shape

(1070, 11)

In [10]:
feature_training_data.shape

(1070, 6)

## Build A Model
This will be based on the `insurance_model_2` model that can be found in the `modeling-and-wrangling` notebook.  
This model version, though, will use normalized data: one-hot-encoded column values and scaled column values.

In [11]:
tf.random.set_seed(42)
m = tf.keras.Sequential()
epochs = 100
# different & more layers
l1 = tf.keras.layers.Dense(100)
l2 = tf.keras.layers.Dense(10)
l3 = tf.keras.layers.Dense(1)

m.add(l1)
m.add(l2)
m.add(l3)

# Compile the model
m.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(),
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
m_history = m.fit(normailized_feature_training_data, label_training_data, epochs=epochs, verbose=0)

### Review The Model

In [12]:
m.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               1200      
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
Total params: 2221 (8.68 KB)
Trainable params: 2221 (8.68 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [13]:
m.evaluate(normailized_feature_testing_data,label_testing_data)



[3437.778564453125, 3437.778564453125]

In [14]:
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'm MAE: {m.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4421
Training Label Mean: 13346.089736364485
m MAE: 3437.778564453125


Compare this model mae to the `insurance_model_2` (_im2_) model in `modeling-and-wrangling`:
- `im2` had an MAE of `~4700`
- the new model mae looks to be `~3400`

**Normalizing this model's data, with one-hot-encoding and scaling, made this model perform better!**

## Experiment With The Model
### Double The Epochs

In [15]:
m2 = tf.keras.Sequential()
m2epochs = 200

m2.add(l1)
m2.add(l2)
m2.add(l3)

# Compile the model
m2.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(),
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
m2_history = m2.fit(normailized_feature_training_data, label_training_data, epochs=m2epochs, verbose=0)

#### Review The Model

In [16]:
m2.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               1200      
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
Total params: 2221 (8.68 KB)
Trainable params: 2221 (8.68 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [17]:
m2.evaluate(normailized_feature_testing_data,label_testing_data)



[3163.11376953125, 3163.11376953125]

In [24]:
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'm2 MAE: {m2.get_metrics_result()["mae"].numpy()}')
print(f'SHAPE: {normailized_feature_training_data.shape}')

Training Label Median: 9575.4421
Training Label Mean: 13346.089736364485
m2 MAE: 3163.11376953125
SHAPE: (1070, 11)


Increasing the Epochs _?slightly?_ made a positive impact on reducing the `mae`!

### Add A Layer, Change Layer Values

In [27]:
m3 = tf.keras.Sequential()

l4 = tf.keras.layers.Dense(100)

m3.add(l1)
m3.add(l4)
m3.add(l2)
m3.add(l3)

# Compile the model
m3.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(),
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
# , verbose=0
m3_history = m3.fit(normailized_feature_training_data, label_training_data, epochs=m2epochs)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200
Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200
Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78

#### Review The Model

In [28]:
m3.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               1200      
                                                                 
 dense_7 (Dense)             (None, 100)               10100     
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
Total params: 12321 (48.13 KB)
Trainable params: 12321 (48.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
m3.evaluate(normailized_feature_testing_data,label_testing_data)



[3223.422119140625, 3223.422119140625]

In [30]:
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'm3 MAE: {m3.get_metrics_result()["mae"].numpy()} vs m2 MAE: {m2.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4421
Training Label Mean: 13346.089736364485
m3 MAE: 3223.422119140625 vs m2 MAE: 3163.11376953125


Adding a layer made the mae roughly the same

### Change the Learning Rate less epochs

In [32]:
m4 = tf.keras.Sequential()

# l4 = tf.keras.layers.Dense(100)

m4.add(l1)
# m4.add(l4)
m4.add(l2)
m4.add(l3)

# Compile the model
m4.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(learning_rate=.008),
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
# , verbose=0
m4_history = m4.fit(normailized_feature_training_data, label_training_data, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

#### Review The Model

In [28]:
m4.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               1200      
                                                                 
 dense_7 (Dense)             (None, 100)               10100     
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
Total params: 12321 (48.13 KB)
Trainable params: 12321 (48.13 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
m4.evaluate(normailized_feature_testing_data,label_testing_data)



[3223.422119140625, 3223.422119140625]

In [33]:
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'm3 MAE: {m4.get_metrics_result()["mae"].numpy()} vs m2 MAE: {m2.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4421
Training Label Mean: 13346.089736364485
m3 MAE: 3494.249755859375 vs m2 MAE: 3163.11376953125


setting the learning rate to `.008`, compared to the default `.001`, made the outcome slightly worse here :/ 

### Change the Learning Rate Again

In [34]:
m5 = tf.keras.Sequential()

# l4 = tf.keras.layers.Dense(100)

m5.add(l1)
# m5.add(l4)
m5.add(l2)
m5.add(l3)

# Compile the model
m5.compile(loss=tf.keras.losses.mae,
                          optimizer=tf.keras.optimizers.Adam(learning_rate=.01),
                          metrics=['mae'])

# Fit the model and save the history (we can plot this)
# , verbose=0
m5_history = m5.fit(normailized_feature_training_data, label_training_data, epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

#### Review The Model

In [35]:
m5.summary()

Model: "sequential_9"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 100)               1200      
                                                                 
 dense_1 (Dense)             (None, 10)                1010      
                                                                 
 dense_2 (Dense)             (None, 1)                 11        
                                                                 
Total params: 2221 (8.68 KB)
Trainable params: 2221 (8.68 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [29]:
m5.evaluate(normailized_feature_testing_data,label_testing_data)



[3223.422119140625, 3223.422119140625]

In [36]:
print(f'Training Label Median: {label_training_data.median()}')
print(f'Training Label Mean: {label_training_data.mean()}')
print(f'm5 MAE: {m5.get_metrics_result()["mae"].numpy()} vs m2 MAE: {m2.get_metrics_result()["mae"].numpy()}')

Training Label Median: 9575.4421
Training Label Mean: 13346.089736364485
m5 MAE: 3509.501220703125 vs m2 MAE: 3163.11376953125


setting the learning rate to `.01`, compared to the default `.001`, made the outcome slightly worse here