<a href="https://colab.research.google.com/github/rafzieli/data-science-bootcamp/blob/main/06_uczenie_maszynowe/03_metryki_regresja.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* @author: krakowiakpawel9@gmail.com  
* @site: e-smartdata.org

### scikit-learn
>Strona biblioteki: [https://scikit-learn.org](https://scikit-learn.org)  
>
>Dokumentacja/User Guide: [https://scikit-learn.org/stable/user_guide.html](https://scikit-learn.org/stable/user_guide.html)
>
>Podstawowa biblioteka do uczenia maszynowego w języku Python.
>
>Aby zainstalować bibliotekę scikit-learn, użyj polecenia poniżej:
```
pip install scikit-learn
```

### Metryki - Problem regresji:
1. [Import bibliotek](#a0)
2. [Interpretacja graficzna](#a2)
3. [Mean Absolute Error - MAE - Średni błąd bezwzględny](#a3)
4. [Mean Squared Error - MSE - Błąd średniokwadratowy](#a4)
5. [Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego](#a5)
6. [Max Error - Błąd maksymalny](#a6)
7. [R2 score - współczynnik determinacji](#a7)

    

### <a name='a0'></a>  Import bibliotek

In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

In [3]:
y_true = 100 + 20 * np.random.randn(50)
y_true

array([ 81.66205812, 131.62753431,  91.72101971,  90.82374652,
        91.34469896,  84.42187852, 127.26592331,  94.65984558,
       105.83229658,  72.82390615, 100.93943895,  87.95432641,
        73.72059132, 114.9148111 ,  96.45018522, 102.68631358,
       104.32257632,  76.00190984,  84.23034178,  77.08489355,
        65.31642508,  91.27601247, 102.94822194, 115.86151649,
        99.92517257, 105.34751012, 115.35819946,  96.2695152 ,
       118.90448057,  93.97624026,  87.58522144,  84.68875896,
       106.69731787,  94.01009513,  72.87377703, 100.20624037,
       125.75718816, 113.82067146, 109.97601752, 126.44655379,
        88.91345757, 106.58481162, 111.15056903,  83.71671367,
       117.16903357, 115.54741952, 127.35886647, 111.49842825,
        97.41796096,  82.09937447])

In [4]:
y_pred = y_true + 10 * np.random.randn(50)
y_pred

array([ 72.87912626, 128.73598629, 100.10582901,  79.79754858,
        85.20561703, 104.6001129 , 111.65997082,  86.22421348,
       102.74554927,  85.81855119, 100.82269626,  96.95386503,
        84.26659122, 112.23683208,  93.47506907, 104.13394683,
       107.08937115,  60.50579725,  89.1375272 ,  67.26129768,
        67.09290171,  88.52579551, 126.06914943, 100.86071582,
       102.60378415,  90.72204919, 105.25043137,  84.36651948,
       143.70035685,  78.64155599,  83.73296298,  78.26453752,
       102.32815843, 101.82572183,  65.32941085,  95.07155759,
       116.51026935, 104.67611925, 120.75925887, 113.38042279,
        87.42447101,  92.2468633 , 122.74810645,  74.76543619,
       104.09265528, 114.85690642, 140.32430665, 113.42738104,
       111.06200765,  69.94035934])

In [5]:
results = pd.DataFrame({'y_true':y_true, 'y_pred':y_pred})
results.head()

Unnamed: 0,y_true,y_pred
0,81.662058,72.879126
1,131.627534,128.735986
2,91.72102,100.105829
3,90.823747,79.797549
4,91.344699,85.205617


In [6]:
results['error'] = results.y_true - results.y_pred
results.head()

Unnamed: 0,y_true,y_pred,error
0,81.662058,72.879126,8.782932
1,131.627534,128.735986,2.891548
2,91.72102,100.105829,-8.384809
3,90.823747,79.797549,11.026198
4,91.344699,85.205617,6.139082



### <a name='a2'></a> Interpretacja graficzna

In [8]:
def plot_regression_results(y_true, y_pred):
  results = pd.DataFrame({'y_true':y_true, 'y_pred':y_pred})
  min = results[['y_true', 'y_pred']].min().min()
  max = results[['y_true', 'y_pred']].max().max()

  fig = go.Figure(data=[
                        go.Scatter(x=results.y_true, y=results.y_pred, mode='markers'),
                        go.Scatter(x=[min, max], y=[min, max])],
                  layout=go.Layout(showlegend=False,
                                   width=800,
                                   xaxis_title='y_true',
                                   yaxis_title='y_pred',
                                   title='Regresion results'))
  
  fig.show()

plot_regression_results(y_true, y_pred)

In [11]:
y_true = 100 + 20 * np.random.randn(1000)
y_pred = y_true + 10 * np.random.randn(1000)
results = pd.DataFrame({'y_true':y_true, 'y_pred':y_pred})
results['error'] = results.y_true - results.y_pred

px.histogram(results, x='error', nbins=50)

### <a name='a3'></a> Mean Absolute Error - Średni błąd bezwzględny
### $$MAE = \frac{1}{n}\sum_{i=1}^{n}|y_{true} - y_{pred}|$$

In [15]:
def mean_absolute_error(y_true, y_pred):
  return abs(y_true - y_pred).sum() / len(y_true)

mean_absolute_error(y_true, y_pred)

7.989666858421548

In [16]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(y_true, y_pred)

7.989666858421548

### <a name='a4'></a> Mean Squared Error - MSE - Błąd średniokwadratowy
### $$MSE = \frac{1}{n}\sum_{i=1}^{n}(y_{true} - y_{pred})^{2}$$

In [17]:
def mse(y_true, y_pred):
  return ((y_true - y_pred)**2).sum() / len(y_true)

mse(y_true, y_pred)

99.70510373760375

In [18]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_true, y_pred)

99.70510373760375

### <a name='a5'></a> Root Mean Squared Error - RMSE - Pierwiastek błędu średniokwadratowego
### $$RMSE = \sqrt{MSE}$$

Mowi jak srednio predykcje odbiegaja od wartosci prawdziwych

In [20]:
def rmse(y_true, y_pred):
  return np.sqrt(((y_true - y_pred)**2).sum() / len(y_true))
rmse(y_true, y_pred)

9.985244300346574

In [23]:
np.sqrt(mean_squared_error(y_true, y_pred))

9.985244300346574

### <a name='a6'></a>  Max Error - Błąd maksymalny

$$ME = max(|y\_true - y\_pred|)$$ 

In [25]:
def max_error(y_true, y_pred):
  return abs(y_true - y_pred).max()
max_error(y_true, y_pred)


33.83917333644185

In [26]:
from sklearn.metrics import max_error
max_error(y_true, y_pred)

33.83917333644185

### <a name='a7'></a>  R2 score - współczynnik determinacji
### $$R2\_score = 1 - \frac{\sum_{i=1}^{N}(y_{true} - y_{pred})^{2}}{\sum_{i=1}^{N}(y_{true} - \overline{y_{true}})^{2}}$$

In [28]:
def r2_score(y_true, y_pred):
  return 1 - ((y_true - y_pred)**2).sum() / ((y_true - y_true.mean())**2).sum()
r2_score(y_true, y_pred)

0.7576346398333683

In [29]:
from sklearn.metrics import r2_score
r2_score(y_true, y_pred)

0.7576346398333683