<a href="https://colab.research.google.com/github/khyz/AI_Python_examples/blob/main/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [None]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 91.87231326, 136.58311605, 118.44601536, 123.59763434,
        86.897222  ,  92.14742968, 134.48174613,  98.89159179,
       138.21067832, 104.65382054,  91.04818498,  82.32029224,
        68.37193219, 127.22148943, 107.55533535,  88.57052364,
       105.66981868, 104.27559689,  83.57134414,  52.88236775,
        68.26345151,  74.98244492, 103.15220391,  67.60314825,
       109.08511903, 107.82404567,  81.3800559 , 105.16797145,
       103.46602799,  92.58793796,  80.49500952, 123.69014571,
       102.20471806, 118.9410803 , 114.19361136, 103.70723672,
       100.05247439,  77.00626419,  89.89058989, 130.08030016,
       103.06909737,  94.09774165,  94.94536925,  86.80385006,
       109.93808924,  76.38623029, 122.87562316, 152.43996805,
        90.70480474, 117.42488591])

In [None]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 86.4313349 , 136.26036192, 102.37229848, 130.96594222,
        71.51001282,  95.58876346, 145.45307758,  99.6443144 ,
       117.66910509, 100.90897139, 105.79854909,  65.61853079,
        61.25308387, 136.20186666,  98.2853987 ,  95.89200794,
       108.13335962, 103.28218317,  82.20228298,  51.65453551,
        74.33360628,  59.98012969,  98.33527118,  53.60288822,
       113.11957528,  93.48076714,  68.75816694, 109.45244309,
        95.11312123,  93.50623412,  81.92380831, 108.79717569,
       110.91507104, 110.90659489, 110.64612004, 101.97866754,
       109.96005723,  93.83252271,  72.94861669, 126.64387245,
       102.95343915,  70.45436039,  96.8330074 ,  88.60131986,
       111.83387839,  66.80264034, 117.79624187, 155.90583658,
        70.35151988, 117.67758019])

In [None]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,91.872313,86.431335
1,136.583116,136.260362
2,118.446015,102.372298
3,123.597634,130.965942
4,86.897222,71.510013


In [None]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,91.872313,86.431335,5.440978
1,136.583116,136.260362,0.322754
2,118.446015,102.372298,16.073717
3,123.597634,130.965942,-7.368308
4,86.897222,71.510013,15.387209



### <a name='a2'></a> Interpretacja graficzna

In [None]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [None]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [None]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.619688270606848

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.619688270606848

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [None]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

92.46509955021571

In [None]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

92.46509955021571

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [None]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

9.615877471672343

In [None]:
np.sqrt(mean_squared_error(y_true, y_pred))

9.615877471672343

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [None]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [None]:
max_error(y_true, y_pred)

30.437870047583985

In [None]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

30.437870047583985

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [None]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7495827323771318

In [None]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [None]:
r2_score(y_true, y_pred)

0.7580519720109518