___
<h1> Machine Learning </h1>
<h2> M. Sc. in Electrical and Computer Engineering </h2>
<h3> Instituto Superior de Engenharia / Universidade do Algarve </h3>

[MEEC](https://ise.ualg.pt/en/curso/1477) / [ISE](https://ise.ualg.pt) / [UAlg](https://www.ualg.pt)

Pedro J. S. Cardoso (pcardoso@ualg.pt)
___

# CV analysis

This notebook present some sketches on the use of CV to help in the fine tunning of regression and classification methods. Data belongs to the Wine Quality Data Set (https://archive.ics.uci.edu/ml/datasets/Wine+Quality) which can be viewed as classification or regression tasks. 


In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_val_predict

from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Lasso

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [None]:
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', sep=';')

In [None]:
features = data.columns[:-1]
target = data.columns[-1]
X = data[features].values
y = data[target].values

## Classification problem

The problem can be treated as a classification problem. Let us, for example, use Logistic Regression make a cv analyis

In [None]:
for C in (.01, .1, 1, 10, 100):
        print(f''' *********************** C = {C}''')
        log_reg = LogisticRegression(
                                    random_state=1, 
                                    C=C,
                                    multi_class='auto',
                                    max_iter=10000
                                    )

        scores = cross_val_score(estimator=log_reg, # model
                                 X=X, y=y, # X, y
                                 cv=5,       #number of folds - default 5-fold cross validation (see alternatives in documentation)
                                 n_jobs=-1,  # use all CPU
                                 verbose=1,   # verbose level 
                                )

        print(f'''scores = {scores} \n [mean value: {scores.mean()}]''')
    

## Regression problem

The problem can also be treated as a regression problem. Let us, for example, use Lasso and make a cv analyis

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso

In [None]:
for alpha in (.01, .1, 1, 10, 100):
    print(f''' *********************** alpha = {alpha} \t normalize={normalize}''')
    estimator = Lasso(alpha=alpha,
                      tol=10**-6, 
                      normalize=False,
                      random_state=1)

    scores = cross_val_score(estimator=estimator, # model
                             X=X, y=y, # X, y
                             cv=5,       #number of folds - default 5-fold cross validation (see alternatives in documentation)
                             n_jobs=-1,  # use all CPU
                             verbose=1,   # verbose level                          
                            )
    print(f'''scores = {scores} \n [mean value: {scores.mean()}]''')


In [None]:
estimator = Lasso(alpha=0.01,
                          tol=10**-6, 
                          random_state=1)

y_pred = cross_val_predict(estimator=estimator, # model
                         X=X, y=y, # X, y
                         cv=5,       #number of folds - default 5-fold cross validation (see alternatives in documentation)
                         n_jobs=-1,  # use all CPU
                         verbose=1,   # verbose level 
                 )

y_pred = y_pred.round()
y_pred

In [None]:
_ = plt.figure(figsize=(30,6))
err = y_pred - y
plt.plot(err,".")

print("MEDIA DO ERRO", err.mean())
print("MEDIA DO ERRO ABSOLUTO", np.absolute(err).mean())

In [None]:
_ = plt.figure(figsize=(30,10))
plt.plot(y,"rd")
plt.plot(y_pred,"b.")
plt.legend(["y", "y_pred"])