<a href="https://colab.research.google.com/github/kurek0010/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [3]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 94.67043356,  99.06728142, 103.03616231, 124.65170651,
        87.97418399,  97.16653462, 108.58046859, 119.09087462,
       143.04148494,  68.8747762 , 125.52677245,  69.95641197,
       113.68044107, 129.45303822, 106.88540561, 110.04817419,
       133.82037531, 105.5026382 ,  80.85834449,  92.52540563,
       133.62544735,  89.32838825, 120.80597668, 101.0134295 ,
       130.67341617, 104.65141865,  95.07043569,  83.57372534,
        96.3669113 ,  96.19764057, 113.37686991, 102.20064078,
       113.09543327, 120.2893458 , 114.43357855, 103.85245399,
        69.4463771 ,  67.56818858, 103.62154774, 124.49932146,
       104.90376845,  83.94105972, 128.64423127, 130.54998485,
        89.55812698,  78.49186088, 102.92807697,  87.4566408 ,
       114.043134  , 130.11399707])

In [4]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([105.16535254, 101.31521092,  93.1802259 , 115.28021021,
        79.99517723,  77.39971543, 115.75702927, 133.61904329,
       149.67468702,  60.9109361 , 113.06854766,  77.31505434,
       118.32936264, 132.840181  , 115.8180292 , 106.51151083,
       134.76403096, 122.43253601,  73.95672095,  98.39748282,
       137.57463625, 101.18764836, 110.64140428, 110.85420861,
       116.14588749, 125.37383114, 102.1567368 ,  79.5388764 ,
        80.97160375, 100.70380974, 108.0648394 ,  91.03058581,
       121.27495839, 125.32886189,  98.76788223,  92.20508615,
        60.25908081,  69.60095714, 105.3369338 , 124.81775981,
       102.66445799,  78.27539385, 140.13188223, 139.3367103 ,
        80.34622647,  67.56774258, 105.39755371,  89.46316424,
       127.20618104, 136.33462613])

In [7]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()


Unnamed: 0,y_true,y_pred
0,94.670434,105.165353
1,99.067281,101.315211
2,103.036162,93.180226
3,124.651707,115.28021
4,87.974184,79.995177


In [9]:
results['error'] = results['y_true'] - results['y_pred']
results.head(10)

Unnamed: 0,y_true,y_pred,error
0,94.670434,105.165353,-10.494919
1,99.067281,101.315211,-2.247929
2,103.036162,93.180226,9.855936
3,124.651707,115.28021,9.371496
4,87.974184,79.995177,7.979007
5,97.166535,77.399715,19.766819
6,108.580469,115.757029,-7.176561
7,119.090875,133.619043,-14.528169
8,143.041485,149.674687,-6.633202
9,68.874776,60.910936,7.96384



### <a name='a2'></a> Interpretacja graficzna

In [10]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [11]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [12]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

8.017463153866377

In [13]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

8.017463153866377

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [14]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

100.13856798493292

In [15]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

100.13856798493292

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [16]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.00692600077231

In [17]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.00692600077231

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [18]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [19]:
max_error(y_true, y_pred)

31.382602990117327

In [20]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

31.382602990117327

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [21]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7480838273806351

In [22]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [23]:
r2_score(y_true, y_pred)

0.7480838273806351