<a href="https://colab.research.google.com/github/kszymon/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

### <a name='a0'></a>  Import bibliotek

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [3]:
y_true = 100 + 20*np.random.randn(50)
y_true

array([ 78.82623319,  84.35402804,  90.34657172,  76.93084846,
       101.44050199, 105.61427331, 119.13462356, 121.33060623,
       104.42164518, 136.78252341, 116.81078737, 101.23405573,
        94.06483678,  99.07405291, 115.67783236,  73.01434732,
       144.28706048,  74.96507053, 122.98421183, 123.09999978,
        79.02500004, 111.16923156,  95.84394058, 124.01519347,
        49.61020343,  70.71950485,  93.36777946, 127.43236561,
        97.86014088,  81.37531135,  60.06378723,  74.78270946,
        90.48934474, 114.23398526,  75.11041359, 107.64840326,
        87.82110676,  78.94250387, 102.40400841, 104.09726922,
       119.47712437,  78.82803208, 123.90728512,  67.88330365,
        89.39971635,  88.00466942,  98.48856128,  74.17174399,
        78.31485876, 119.54304148])

In [5]:
y_pred = y_true + 10*np.random.randn(50)
y_pred

array([ 64.16966413,  92.90885255,  96.96004382,  60.03539473,
        83.9447638 , 105.68718419, 122.56095768, 129.25558092,
        99.26493757, 125.16492628, 126.03642615, 104.49368591,
        94.77276014, 105.18836764, 116.19843305,  67.29232314,
       149.04694522,  79.88097677, 121.09746635, 131.33954081,
        74.63372937, 122.89822953,  76.74819373, 133.7393108 ,
        40.9590869 ,  85.20415951,  99.12546701, 127.65687199,
       113.40537383,  69.97386232,  54.20893125,  77.88162594,
       115.69735839, 114.2254713 ,  81.80852055,  99.97116927,
        91.57695044,  63.04689835, 121.07441164,  90.49227058,
       105.09460815,  89.81545449, 120.62978724,  39.25553191,
        98.02849953,  74.50639891,  86.79096344,  78.37378757,
        64.60405969, 113.4749303 ])

In [6]:
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,78.826233,64.169664
1,84.354028,92.908853
2,90.346572,96.960044
3,76.930848,60.035395
4,101.440502,83.944764


In [7]:
results['error'] = results['y_true'] - results['y_pred']
results.head()

Unnamed: 0,y_true,y_pred,error
0,78.826233,64.169664,14.656569
1,84.354028,92.908853,-8.554825
2,90.346572,96.960044,-6.613472
3,76.930848,60.035395,16.895454
4,101.440502,83.944764,17.495738



### <a name='a2'></a> Interpretacja graficzna

In [10]:
def plot_regression_results(y_true, y_pred):
    results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
    min = results[['y_true', 'y_pred']].min().min()
    max = results[['y_true', 'y_pred']].max().max()

    fig = go.Figure(data=[go.Scatter(x=results['y_true'], y=results['y_pred'], mode = 'markers'),
                    go.Scatter(x=[min,max], y=[min,max])],
                    layout=go.Layout(showlegend=False, width=800, height = 500,
                                     xaxis_title='y_true',
                                     yaxis_title='y_pred',
                                     title='Refression results'))
    fig.show()

plot_regression_results(y_true, y_pred)

In [12]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true': y_true, 'y_pred': y_pred})
results['error'] = results['y_true'] - results['y_pred']

px.histogram(results, x='error', nbins=50, width=800)


### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [13]:
def mean_absolute_error(y_true, y_pred):
  return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

8.084340877284774

In [14]:
from sklearn.metrics import mean_absolute_error

mean_absolute_error(y_true, y_pred)

8.084340877284774

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [15]:
def mean_squared_error(y_true, y_pred):
  return ((y_true - y_pred) ** 2).sum() / len(y_true)

mean_squared_error(y_true, y_pred)

104.43492401829074

In [16]:
from sklearn.metrics import mean_squared_error

mean_squared_error(y_true, y_pred)

104.43492401829074

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

In [17]:
def root_mean_squared_error(y_true, y_pred):
  return np.sqrt(((y_true - y_pred) ** 2).sum() / len(y_true))

root_mean_squared_error(y_true, y_pred)

10.219340684128833

In [18]:
np.sqrt(mean_squared_error(y_true, y_pred))

10.219340684128833

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$

In [20]:
def max_error(y_true, y_pred):
  return abs(y_true - y_pred).max()

In [21]:
max_error(y_true, y_pred)

33.64923716993046

In [22]:
from sklearn.metrics import max_error

max_error(y_true, y_pred)

33.64923716993046

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [23]:
from sklearn.metrics import r2_score

r2_score(y_true, y_pred)

0.7369592467802477

In [26]:
def r2_score(y_true, y_pred):
  numerator = ((y_true - y_pred) ** 2).sum()
  denominator = ((y_true - y_true.mean()) ** 2).sum()
  try:
    r2 = 1 - numerator / denominator
  except ZeroDivisionError:
    r2 = 0
  return r2

In [27]:
r2_score(y_true, y_pred)

0.7369592467802477