<a href="https://colab.research.google.com/github/szarpan/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [None]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 55.83305101, 108.8709246 , 103.31428952, 105.70692292,
       106.01355385, 130.67504907, 115.81227799,  51.00108041,
        91.32509971, 106.43204073,  82.05819029,  93.52359033,
        98.37670887, 103.39899636,  92.41134459,  96.49100897,
       132.83922635, 109.38091874, 107.15026685, 104.99898953,
       109.81193246,  64.55421937, 113.26996181,  89.65327897,
       125.22902369, 105.63983923, 116.84063843,  73.76374526,
       129.0271842 , 109.09601011,  95.53239539,  82.68921518,
        88.66478054,  85.13138219, 137.51522206,  89.73191998,
        87.19796779,  60.0872086 ,  72.83531364,  98.67432564,
        76.60119761, 114.93751774,  96.1908119 , 115.06174635,
        71.48794912, 119.09921435, 120.65584661,  37.22079732,
       129.35087413,  77.02981311])

In [None]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 54.32159813, 106.26023522, 114.39660879, 112.07804685,
       106.0162226 , 141.63364726, 114.56997706,  55.01169579,
        77.00489765, 117.73495593,  84.1714808 , 106.01266771,
        97.31511906, 109.98086126,  92.44554922, 106.758285  ,
       114.57091112,  95.78556163, 100.83845087, 121.52746445,
       111.52745638,  47.25221762, 118.80783518,  97.84925819,
       136.34809869, 113.93005633, 121.41700926,  78.67247712,
       128.05887509, 106.25536497, 112.91822712,  78.1435241 ,
       114.8893338 ,  84.97060421, 141.96051187,  80.98547532,
        76.51442847,  45.70917295,  70.88060819,  91.52769274,
        92.28227839, 124.26624769, 109.23797686, 114.15221044,
        76.05796327, 111.06776028, 121.74601233,  43.22201026,
       135.4087355 ,  73.50222759])

In [None]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,55.833051,54.321598
1,108.870925,106.260235
2,103.31429,114.396609
3,105.706923,112.078047
4,106.013554,106.016223


In [None]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,55.833051,54.321598,1.511453
1,108.870925,106.260235,2.610689
2,103.31429,114.396609,-11.082319
3,105.706923,112.078047,-6.371124
4,106.013554,106.016223,-0.002669



### <a name='a2'></a> Interpretacja graficzna

In [None]:
def plot_regression_results(y_true, y_pred): 
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode='markers'),
                    go.Scatter(x=[min, max], y=[min, max])],
                    layout=go.Layout(showlegend=False, width=800, height=500,
                                     xaxis_title='y_true', 
                                     yaxis_title='y_pred',
                                     title='Regression results'))
    fig.show()
plot_regression_results(y_true, y_pred)

In [None]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [None]:
def mean_absolute_error(y_true, y_pred):
    return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.802812669603978

In [None]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.802812669603978

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [None]:
def mean_squared_error(y_true, y_pred):
    return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

95.84892821073035

In [None]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

95.84892821073035

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [None]:
def root_mean_squared_error(y_true, y_pred):
    return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

9.790246585798048

In [None]:
np.sqrt(mean_squared_error(y_true, y_pred))

9.790246585798048

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$ 

In [None]:
def max_error(y_true, y_pred):
    return abs(y_true - y_pred).max()

In [None]:
max_error(y_true, y_pred)

30.81917349365932

In [None]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

30.81917349365932

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [None]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7580519720109518

In [None]:
def r2_score(y_true, y_pred):
    numerator = ((y_true - y_pred) ** 2).sum()
    denominator = ((y_true - y_true.mean()) ** 2).sum()
    try:
        r2 = 1 - numerator / denominator
    except ZeroDivisionError:
        print('Dzielenie przez zero')
    return r2

In [None]:
r2_score(y_true, y_pred)

0.7580519720109518