<a href="https://colab.research.google.com/github/kurek0010/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [1]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [2]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 65.2098087 , 104.00071204,  70.69983052,  68.61141868,
        77.51454487,  80.64487189,  91.88593939,  91.89761679,
        92.92788113,  87.21725444, 113.10931365,  99.09574681,
       138.19961933, 114.86081782, 134.45239538, 101.06907811,
        91.07769673, 111.4488275 ,  97.13980962,  83.26228415,
        66.23155612, 119.75743666, 101.50563996,  83.29080164,
       121.13071311, 107.24174177,  93.02124272, 101.32399643,
        77.72255659,  98.59193208,  92.51785433, 121.48217175,
        77.65556536, 102.87734735,  94.5418033 , 104.91683336,
        99.96762523,  89.09272233, 134.17845836, 130.89755005,
        75.58876231, 141.76401613,  97.56606779,  78.43019099,
       133.83754576, 129.26885055, 100.27154369,  81.60415287,
       114.51333256, 122.39784969])

In [3]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 68.6359407 ,  86.53413793,  60.7992371 ,  63.75664362,
        73.65074281,  96.41832432,  80.2450281 , 100.61757829,
       103.35719786,  81.20991222, 108.26122872, 105.33749961,
       150.42393172, 114.73594378, 114.56358272, 104.57351276,
       102.85793289, 106.19593634, 105.85037929, 103.41130921,
        42.80121546, 106.02905034, 110.28380315, 101.20523442,
       134.50146274,  87.85594543,  80.13305878,  86.34721062,
        80.15163027,  98.51420245,  96.54384526, 122.7195952 ,
        72.26045183, 110.5378878 ,  94.76929465, 104.70947414,
       109.14659896,  76.50073863, 146.14274598, 133.58177771,
        61.78723254, 145.64528722, 103.40388384,  92.59614856,
       147.90969305, 129.34961772, 109.61292843,  90.85212201,
       128.2188276 , 125.36880249])

In [4]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()


Unnamed: 0,y_true,y_pred
0,65.209809,68.635941
1,104.000712,86.534138
2,70.699831,60.799237
3,68.611419,63.756644
4,77.514545,73.650743


In [5]:
results['error'] = results['y_true'] - results['y_pred']
results.head(10)

Unnamed: 0,y_true,y_pred,error
0,65.209809,68.635941,-3.426132
1,104.000712,86.534138,17.466574
2,70.699831,60.799237,9.900593
3,68.611419,63.756644,4.854775
4,77.514545,73.650743,3.863802
5,80.644872,96.418324,-15.773452
6,91.885939,80.245028,11.640911
7,91.897617,100.617578,-8.719962
8,92.927881,103.357198,-10.429317
9,87.217254,81.209912,6.007342



### <a name='a2'></a> Interpretacja graficzna

In [6]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [7]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [8]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

8.134597070989171

In [9]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

8.134597070989171

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [10]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

102.35537136983471

In [11]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

102.35537136983471

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [12]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.117083145345536

In [13]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.117083145345536

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [14]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [15]:
max_error(y_true, y_pred)

33.50123368867641

In [16]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

33.50123368867641

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [17]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.735868040061618

In [None]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [18]:
r2_score(y_true, y_pred)

0.735868040061618