### Activity 5: Metrics 

The Wine dataset is a classic dataset used frequently in the realm of pattern recognition and machine learning. This dataset originates from the UCI Machine Learning Repository and is often utilized for classification tasks.

Features:

The Wine dataset comprises 13 different measurements taken for three types of wine cultivated in the same region in Italy. These measurements (or features) can be broadly categorized into:

* Alcohol content
* Malic acid
* Ash
* Alkalinity of ash
* Magnesium
* Total phenols
* Flavonoids
* Non-flavonoid phenols
* Proanthocyanins
* Color intensity
* Hue
* OD280/OD315 of diluted wines
* Proline

The primary goal when working with the Wine dataset is to predict the type of wine based on the given features. There are three classes of wines in the dataset, and each class corresponds to a type of wine cultivated in a specific area of Italy.

### Import libraries
Hint: Consider importing the pandas, sklearn, seaborn, and matplotlib libraries.

In [31]:
#Your code (1 point)
import pandas as pd
from sklearn.datasets import load_wine
import seaborn as sns
import matplotlib.pyplot as plt

### Define functions to plot the confusion matrix and train/evaluate the models.

In [41]:
from sklearn.metrics import confusion_matrix

# Your code (1 point)
def plot_confusion_matrix(model_name, y_true, y_pred, classes, title='Confusion Matrix', cmap=plt.cm.Blues):
    cm = confusion_matrix(y_true, y_pred)
    plt.figure(figsize=(8, 6))
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(f"{title}: {model_name}")
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    
    fmt = '.2f' if np.min(cm) >= 0.01 else '.0f'
    thresh = cm.max() / 2.
    for i in range(cm.shape[0]):
        for j in range(cm.shape[1]):
            plt.text(j, i, format(cm[i, j], fmt),
                     horizontalalignment="center",
                     color="white" if cm[i, j] > thresh else "black")
    
    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

### Load the Wine dataset in a pandas dataframe and show the first 10 values

In [42]:
# Your code (1 point)
data = load_wine()
df = pd.DataFrame(data = data['data'], columns = data['feature_names'])
df.head(10)

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0
5,14.2,1.76,2.45,15.2,112.0,3.27,3.39,0.34,1.97,6.75,1.05,2.85,1450.0
6,14.39,1.87,2.45,14.6,96.0,2.5,2.52,0.3,1.98,5.25,1.02,3.58,1290.0
7,14.06,2.15,2.61,17.6,121.0,2.6,2.51,0.31,1.25,5.05,1.06,3.58,1295.0
8,14.83,1.64,2.17,14.0,97.0,2.8,2.98,0.29,1.98,5.2,1.08,2.85,1045.0
9,13.86,1.35,2.27,16.0,98.0,2.98,3.15,0.22,1.85,7.22,1.01,3.55,1045.0


### Split the dataset. Use 25% as test data and 75% as training data.

In [43]:
# Your code (1 point)
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df.values, data.target, test_size=0.25, random_state=42)

### Create a dictionary with the ML models.
Use at least 4 different classification algorithms.


In [44]:
# Your code (2 points)
# Logistic Regression
# Decision Trees
# Random Forest
# Support Vector Machines (SVM)
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

dict = {
    'K-N-Neighbors': KNeighborsClassifier(),
    'DecisionTree': DecisionTreeClassifier(),
    'RandomForest': RandomForestClassifier(),
    'Support-V-M': SVC()
}

### Train the models and show the metrics (accuracy, precision, recall, and F1-score) and the confusion matrix for each one.

In [47]:
# Your code (2 points)
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
trained_models = {}

for model_name, model in dict.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    trained_models[model_name] = {'model': model, 
                                  'accuracy': accuracy,
                                  'precision': precision,
                                  'recall': recall,
                                  'f1': f1}
    print(f"{model_name} -> Accuracy: {accuracy:.4f} | Precision: {precision:.4f} | Recall: {recall:.4f} | F1: {f1:.4f}")

def conf_matrix(model, X_test, y_test):
    y_pred = model.predict(X_test)
    conf_matrix = confusion_matrix(y_test, y_pred)
    return conf_matrix

print('\nConfusion matrix of each model:')
for key, value in dict.items():
    cm = conf_matrix(value, X_test, y_test)
    print(f'{key}:')
    #print(cm)
    sns.heatmap(cm.T, square=True, annot=True, fmt='d', cbar=False)
    plt.xlabel('true label')
    plt.ylabel('predicted label')
    plt.show()
    print()

K-N-Neighbors -> Accuracy: 0.7111 | Precision: 0.7111 | Recall: 0.7111 | F1: 0.7111


ValueError: Classification metrics can't handle a mix of continuous-multioutput and multiclass targets

### According to the results, which model do you select and why? Put your response in the following cell. (2 points)