<a href="https://colab.research.google.com/github/sebekpro/OculusRiftInAction/blob/master/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [None]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([100.7367195 , 101.65159728,  84.24599086,  89.14513935,
       121.52901072,  87.69250891,  73.12731193, 108.72308229,
       100.39086177, 120.15909071, 115.27362614, 128.09167576,
        92.59479064,  80.19131329, 103.86775422, 109.91801851,
       120.37973818,  86.50291957, 121.97710474, 104.89224632,
       115.90332196,  77.48656675, 131.51512122,  86.9276638 ,
        86.60910037, 123.64224561,  96.8541802 ,  86.67701011,
       119.04696991,  70.62401906,  95.460972  ,  97.44276247,
        90.39650547,  97.83514879,  89.4489184 ,  98.56452517,
       139.79963122, 108.93267912, 106.89231655, 123.61787538,
        74.44107721, 122.56433414,  87.59079107,  82.05575793,
       130.26475773,  75.18403842, 102.89443578,  67.79936786,
       112.87437366, 147.42119357])

In [None]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 97.63290308, 100.13616571,  88.8926698 ,  79.76003349,
       108.38849134, 100.86353767,  73.36808135,  95.94904531,
       105.42726587, 129.28082892, 110.01443034, 126.70569968,
        94.75261542,  83.82423382,  92.98096771, 113.92207587,
       129.85891497,  99.36726984, 128.90179841, 107.615502  ,
       120.85792324,  75.42792976, 139.84123598, 100.7764949 ,
        88.98244084, 119.55091231, 105.41224132, 102.89767125,
       108.53782635,  50.42297338,  91.30949439,  96.22361272,
        94.64734916,  94.07345612, 102.80126162, 101.48745019,
       141.90266124, 107.41523056, 128.38028204, 129.46066223,
        73.47225881, 129.18477224,  79.41103218,  83.87979332,
       128.66252669,  61.28021212, 114.81524365,  67.55910636,
       118.43286482, 145.29773499])

In [None]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,100.73672,97.632903
1,101.651597,100.136166
2,84.245991,88.89267
3,89.145139,79.760033
4,121.529011,108.388491


In [None]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,100.73672,97.632903,3.103816
1,101.651597,100.136166,1.515432
2,84.245991,88.89267,-4.646679
3,89.145139,79.760033,9.385106
4,121.529011,108.388491,13.140519



### <a name='a2'></a> Interpretacja graficzna

In [None]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [None]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [None]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.771271560850102

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.771271560850102

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [None]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

94.6850979786076

In [None]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

94.6850979786076

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [None]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

9.730626802966373

In [None]:
np.sqrt(mean_squared_error(y_true, y_pred))

9.730626802966373

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [None]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [None]:
max_error(y_true, y_pred)

35.295737025438

In [None]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

35.295737025438

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [None]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7644107287039972

In [None]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [None]:
r2_score(y_true, y_pred)

0.7644107287039972