<a href="https://colab.research.google.com/github/krakowiakpawel9/ml_course/blob/master/cont/07_a_regression_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  

Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)

Podstawowa biblioteka do uczenia maszynowego w języku Python.

Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [0]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [0]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([121.3657205 , 130.93777812, 110.16246423, 104.18819432,
        77.66807674, 115.75933482,  93.98690649,  56.40921021,
       116.0190815 , 103.60314348,  84.239909  , 110.95852152,
       108.0279204 , 103.04388788,  69.3439374 ,  60.93045047,
       112.98993856,  81.88251843, 106.94264021, 157.21196295,
       130.4812196 ,  70.32309885,  99.52599511, 117.86755165,
        74.46726742, 107.60474192,  80.51986284,  93.95754237,
        95.80492029, 104.56465014, 119.51715816,  96.02423389,
       112.63493465,  96.90750576, 109.52002534,  75.55595941,
       107.92962247,  64.40066709, 113.45772635, 112.57060437,
       109.11526948, 107.43076447, 130.14202265,  96.70134789,
       113.12475414, 126.15401153,  98.37954351, 105.94471871,
       110.60330167,  96.68986699])

In [0]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([133.78186582, 129.62692734, 124.53872168, 107.33464342,
        84.10777748, 112.20514557, 102.15289919,  54.20992603,
       101.06809486,  97.77524511,  72.70863489, 107.2606363 ,
       108.97098616,  93.82442287,  75.16714134,  69.42516664,
       121.55210862,  57.08199153, 110.71170401, 141.64623644,
       136.00685241,  47.43460005,  86.17846086, 112.86693839,
        72.59691938, 114.23165246,  84.47427088,  73.61284563,
        99.0606639 , 109.20154972, 129.87416588, 100.28078324,
       113.61979593,  93.67761966, 111.28427007,  91.30041864,
       112.55817954,  59.68491932, 113.17341683, 128.02216637,
       105.06390407, 123.50472458, 131.95737922,  93.52083878,
       116.529797  , 123.48771806, 109.38159795,  97.02942248,
        85.87554508,  89.37038434])

In [0]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,121.365721,133.781866
1,130.937778,129.626927
2,110.162464,124.538722
3,104.188194,107.334643
4,77.668077,84.107777


In [0]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,121.365721,133.781866,-12.416145
1,130.937778,129.626927,1.310851
2,110.162464,124.538722,-14.376257
3,104.188194,107.334643,-3.146449
4,77.668077,84.107777,-6.439701


In [0]:
results[['y_true', 'y_pred']].min()

y_true    56.40921
y_pred    47.43460
dtype: float64


### <a name='a2'></a> Interpretacja graficzna

In [0]:
def plot_regression_results(y_true, y_pred): 
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800,
                                     xaxis_title='y_true', 
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [0]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [0]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.919552059574351

In [0]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.919552059574351

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [0]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

101.04843365106575

In [0]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

101.04843365106575

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [0]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.052284996510284

In [0]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.052284996510284

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$ 

In [0]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [0]:
max_error(y_true, y_pred)

34.578152388161485

In [0]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

34.578152388161485

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [0]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7556370201591597

In [0]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [0]:
r2_score(y_true, y_pred)

0.7556370201591597