# Loss Functions

In this exercise, you will compare the effects of Loss functions on a `LinearRegression` model.

👇 Let's download a CSV file to use for this challenge and parse it into a DataFrame

In [1]:
import pandas as pd

data = pd.read_csv("https://wagon-public-datasets.s3.amazonaws.com/05-Machine-Learning/04-Under-the-Hood/loss_functions_dataset.csv")
data.sample(5)

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Glazing Area,Average Temperature
274,0.69,735.0,294.0,220.5,3.5,0.1,12.785
36,0.66,759.5,318.5,220.5,3.5,0.0,9.79
477,0.62,808.5,367.5,220.5,3.5,0.25,15.085
234,0.64,784.0,343.0,220.5,3.5,0.1,17.37
76,0.71,710.5,269.5,220.5,3.5,0.1,12.255


🎯 Your task is to predict the average temperature inside a greenhouse based on its design. Your temperature predictions will help you select the appropriate greenhouse design for each plant, based on their climate needs. 

🌿 You know that plants can handle small temperature variations, but are exponentially more sensitive as the temperature variations increase. 

## 1. Theory 

❓ Theoretically, which Loss function would you train your model on to limit the risk of killing plants?

<details>
<summary> 🆘 Answer </summary>
    
By theory, you would use a Mean Square Error (MSE) Loss function. It would penalize outlier predictions and prevent your model from committing large errors. This would ensure smaller temperature variations and a lower risk for plants.

</details>

Sorry I read the answer before answering. So I would say ....MSE???

## 2. Application

### 2.1 Preprocessing

❓ Standardise the features

In [7]:
# YOUR CODE HERE
data.shape
data.describe
print(data['Relative Compactness'].value_counts())
print(data['Surface Area'].value_counts())
print(data['Wall Area'].value_counts())
print(data['Roof Area'].value_counts())
print(data['Overall Height'].value_counts())
print(data['Glazing Area'].value_counts())

0.98    64
0.90    64
0.86    64
0.82    64
0.79    64
0.76    64
0.74    64
0.71    64
0.69    64
0.66    64
0.64    64
0.62    64
Name: Relative Compactness, dtype: int64
514.5    64
563.5    64
588.0    64
612.5    64
637.0    64
661.5    64
686.0    64
710.5    64
735.0    64
759.5    64
784.0    64
808.5    64
Name: Surface Area, dtype: int64
294.0    192
318.5    192
343.0    128
416.5     64
245.0     64
269.5     64
367.5     64
Name: Wall Area, dtype: int64
220.50    384
147.00    192
122.50    128
110.25     64
Name: Roof Area, dtype: int64
7.0    384
3.5    384
Name: Overall Height, dtype: int64
0.10    240
0.25    240
0.40    240
0.00     48
Name: Glazing Area, dtype: int64


I will use a MinMax scaler

In [8]:
from sklearn.preprocessing import MinMaxScaler

In [10]:
features_to_scale = ['Relative Compactness', 'Surface Area', 'Wall Area', 'Roof Area', 'Overall Height', 'Glazing Area']

scaler = MinMaxScaler()

data[features_to_scale] = scaler.fit_transform(data[features_to_scale])
data

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Glazing Area,Average Temperature
0,1.000000,0.000000,0.285714,0.000000,1.0,0.0,18.440
1,1.000000,0.000000,0.285714,0.000000,1.0,0.0,18.440
2,1.000000,0.000000,0.285714,0.000000,1.0,0.0,18.440
3,1.000000,0.000000,0.285714,0.000000,1.0,0.0,18.440
4,0.777778,0.166667,0.428571,0.111111,1.0,0.0,24.560
...,...,...,...,...,...,...,...
763,0.055556,0.916667,0.571429,1.000000,0.0,1.0,19.640
764,0.000000,1.000000,0.714286,1.000000,0.0,1.0,16.710
765,0.000000,1.000000,0.714286,1.000000,0.0,1.0,16.775
766,0.000000,1.000000,0.714286,1.000000,0.0,1.0,16.545


### 2.2 Modeling

In this section, you are going to verify the theory by evaluating models optimized on different Loss functions.

### Least Squares (MSE) Loss

❓ **10-Fold Cross-validate** a Linear Regression model optimized by **Stochastic Gradient Descent** (SGD) on a **Least Squares Loss** (MSE)



In [19]:
# YOUR CODE HERE
from sklearn.linear_model import SGDRegressor
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

In [14]:
data.head()

Unnamed: 0,Relative Compactness,Surface Area,Wall Area,Roof Area,Overall Height,Glazing Area,Average Temperature
0,1.0,0.0,0.285714,0.0,1.0,0.0,18.44
1,1.0,0.0,0.285714,0.0,1.0,0.0,18.44
2,1.0,0.0,0.285714,0.0,1.0,0.0,18.44
3,1.0,0.0,0.285714,0.0,1.0,0.0,18.44
4,0.777778,0.166667,0.428571,0.111111,1.0,0.0,24.56


In [17]:
X_scaled = data[['Relative Compactness', 'Surface Area', 'Wall Area', 'Roof Area', 'Overall Height', 'Glazing Area']]
y = data ['Average Temperature']

In [25]:
sgd_regressor = SGDRegressor(loss='squared_error')
regress = sgd_regressor.fit(X_scaled, y)
regress

#scores = cross_val_score(sgd_regressor, X_scaled, y, cv=10, scoring='neg_mean_squared_error')

#mse_scores = -scores

#print("MSE scores for each fold:")
#for i, mse in enumerate(mse_scores):
    #print(f"Fold {i+1}: {mse}")

#mean_mse = np.mean(mse_scores)
#print("Mean MSE:", mean_mse)


In [32]:
scores = cross_val_score(sgd_regressor, X_scaled, y, cv=10, scoring='neg_mean_squared_error')
scores
mse_scores = -scores
mse_scores.mean()

9.613553497843657

❓ Compute 
- the mean cross-validated R2 score and save it in the variable `r2`
- the single biggest prediction error in °C of all your folds and save it in the variable `max_error_celsius`?

(Tips: `max_error` is an accepted scoring metric in sklearn)

In [34]:
import numpy as np

In [43]:
# YOUR CODE HERE
r2 = cross_val_score(sgd_regressor, X_scaled, y, cv=10, scoring='r2').mean()
print("r2:",r2)
max_error_celsius = cross_val_score(sgd_regressor, X_scaled, y, cv=10, scoring='max_error')
max_error_celsius = np.abs(max_error_celsius).max()
print("max_error_celcius:",max_error_celsius)

r2: 0.8915651787314214
max_error_celcius: 9.753468551101307


### Mean Absolute Error (MAE) Loss

What if we optimize our model on the MAE instead?

❓ **10-Fold Cross-validate** a Linear Regression model optimized by **Stochastic Gradient Descent** (SGD) on a **MAE** Loss

<details>
<summary>💡 Hints</summary>

- MAE loss cannot be directly specified in `SGDRegressor`. It must be engineered by adjusting the right parameters

</details>

In [56]:
# YOUR CODE HERE
sgd_regressor2 = SGDRegressor(loss='epsilon_insensitive', epsilon=0)

r2_mae = cross_val_score(sgd_regressor2, X_scaled, y, cv=10, scoring='r2').mean()
print ("r2_mae" , r2_mae)

max_error_mae_m = cross_val_score(sgd_regressor, X_scaled, y, cv=10, scoring='neg_mean_absolute_error').max()
max_error_mae = -max_error_mae_m
print ("max_error_mae:" , max_error_mae)

r2_mae 0.8635320160157767
max_error_mae: 1.8240715897167996


❓ Compute 
- the mean cross-validated R2 score, store it in `r2_mae`
- the single biggest prediction error of all your folds, store it in `max_error_mae`?

In [57]:
# YOUR CODE HERE
r2_mae 
max_error_mae

1.8240715897167996

## 3. Conclusion

❓Which of the models you evaluated seems the most appropriate for your task?

<details>
<summary> 🆘Answer </summary>
    
Although mean cross-validated r2 scores are approximately similar between the two models, the one optimized on a MAE has more chance to make larger mistakes from time to time, increasing the risk of killing plants!

    
</details>

I would have said the first one so I probably did a mistake.

# 🏁 Check your code and push your notebook

In [58]:
from nbresult import ChallengeResult

result = ChallengeResult(
    'loss_functions',
    r2 = r2,
    r2_mae = r2_mae,
    max_error = max_error_celsius,
    max_error_mae = max_error_mae
)

result.write()
print(result.check())


platform darwin -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0 -- /Users/Laetitia/.pyenv/versions/lewagon/bin/python3
cachedir: .pytest_cache
rootdir: /Users/Laetitia/code/juliensoudet/05-ML/04-Under-the-hood/data-loss-functions/tests
plugins: asyncio-0.19.0, typeguard-2.13.3, anyio-3.6.2
asyncio: mode=strict
[1mcollecting ... [0mcollected 3 items

test_loss_functions.py::TestLossFunctions::test_max_error_order [31mFAILED[0m[31m   [ 33%][0m
test_loss_functions.py::TestLossFunctions::test_r2 [32mPASSED[0m[31m                [ 66%][0m
test_loss_functions.py::TestLossFunctions::test_r2_mae [32mPASSED[0m[31m            [100%][0m

[31m[1m____________________ TestLossFunctions.test_max_error_order ____________________[0m

self = <tests.test_loss_functions.TestLossFunctions testMethod=test_max_error_order>

    [94mdef[39;49;00m [92mtest_max_error_order[39;49;00m([96mself[39;49;00m):
>       [96mself[39;49;00m.assertLess([96mabs[39;49;00m([96mself[39;49;00m.result.

In [59]:
! git add tests/loss_functions.pickle

! git commit -m 'Completed loss_functions step'

! git push origin master



[master cb40356] Completed loss_functions step
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 tests/loss_functions.pickle
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Delta compression using up to 4 threads
Compressing objects: 100% (12/12), done.
Writing objects: 100% (13/13), 3.59 KiB | 612.00 KiB/s, done.
Total 13 (delta 2), reused 0 (delta 0), pack-reused 0 (from 0)
remote: Resolving deltas: 100% (2/2), done.[K
To github.com:juliensoudet/data-loss-functions.git
 * [new branch]      master -> master
