# Tarea Semana 5

**¡Atención!** En esta tarea se va a calificar la presentación de los datos y la organización del notebook en general. (5 pts)

## A. KNN 

Utilice el dataset [Glass Classification](https://www.kaggle.com/uciml/glass) para clasificar el tipo de vidrio según las caracteristicas de cada sample.

1. Muestre los datos mediante un gráfico de n-dimensiones (2 pts)
2. Ajuste un modelo de KNN (2 pts)
3. Muestre el accuracy del modelo ajustado (1 pt)
4. Muestre la matriz de confusión del modelo ajustado (2 pt)
5. Grafique los datos originales y el contorno de predicción del modelo (ver notebook 1) (3 pts)


In [None]:
import pandas as pd
import numpy as np

from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import plot_confusion_matrix

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

In [None]:
glass = pd.read_csv("../input/glass/glass.csv")
glass.head()

## Data Analysis

In [None]:
glass.describe()

### Histograms

In [None]:
fig = make_subplots(rows=(glass.shape[1]//3)+1, cols=3)

for i, col in enumerate(glass.columns):
    fig.add_trace(go.Histogram(x=glass[col], name=col), row=(i//3)+1, col=(i%3)+1)
    
fig.update_layout(height=1000,)
    
fig.show()

### Boxplots

In [None]:
fig = make_subplots(rows=(glass.shape[1]//3)+1, cols=3)

for i, col in enumerate(glass.columns):
    fig.add_trace(go.Box(y=glass[col], boxpoints="all", name=col), row=(i//3)+1, col=(i%3)+1)
    
fig.update_layout(height=1000,)

fig.show()

### Remove Outliers

In [None]:
glass1 = glass[
            (glass['RI'] <= glass['RI'].quantile(.99)) &
            (glass['Na'] <= glass['Na'].quantile(.99)) &
            (glass['K'] <= glass['K'].quantile(.99)) &
            (glass['Ba'] <= glass['Ba'].quantile(.99)) &
            (glass['Fe'] <= glass['Fe'].quantile(.99))
    ]

### Correlation Matrix

In [None]:
corr = glass1.corr()
corr.style.background_gradient(cmap='viridis').set_precision(2)

## 1.- 3D Scatter Plot

In [None]:
fig = make_subplots(rows=1, cols=1)
    
fig.add_trace(go.Scatter3d(
    x=glass['Al'], 
    y=glass['Na'],
    z=glass['Mg'],
#     text=y,
    mode='markers',
    showlegend=True,
    marker=dict(
        size=np.full((len(glass)), 15),
        color=glass['Type'],
        colorscale='portland'
    )
))

fig.update_layout(scene = dict(
                    xaxis_title='Aluminum',
                    yaxis_title='Sodium ',
                    zaxis_title='Magnesium'),
                    width=700,
                    margin=dict(r=20, b=10, l=10, t=10))
    
fig.update_layout(showlegend=False)
    
fig.show()

## 2.- Fit a knn model

In [None]:
X = glass.drop('Type', axis=1)#[['Al','Na','Mg']]
y = glass['Type']

knnmodel = KNeighborsClassifier(n_neighbors=5)
knnmodel.fit(X, y)

## 3.- Show model accuracy

In [None]:
accuracy = knnmodel.score(X, y)
print(f"Accuracy: {round(accuracy, 2)}")

## 4.- Plot confusion Matrix

In [None]:
plot_confusion_matrix(knnmodel, X, y)

## 5.- Plot data and contour prediction (2 features 'Na' & 'Mg')

In [None]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(go.Contour(
    x=X['Na'],
    y=X['Mg'],
    z=knnmodel.predict(X),
    showscale=False,
    opacity=0.40,
    colorscale='portland'
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=X['Na'], 
    y=X['Mg'],
    text=y,
    mode='markers',
    marker_symbol=y,
    marker=dict(color=y, colorscale='portland')
), row=1, col=1)

fig.update_layout(showlegend=False)

fig.show()

## B. LDA

1. Reduzca el numero de dimensiones a 2. (2 pts)
2. Ajuste otro modelo de KNN (2 pts)
3. Muestre el accuracy del modelo ajustado (1 pt)
4. Muestre la matriz de confusión del modelo ajustado (2 pt)
5. Grafique los datos originales y el contorno de predicción del modelo (ver notebook 3) (3 pts)


In [None]:
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

### 1.- Aply LDA dimensionality reduction

In [None]:
lda = LinearDiscriminantAnalysis(n_components=2)
X_lda = lda.fit_transform(X, y)

varexp = lda.explained_variance_ratio_

print(f'Variance explained by components: {varexp}')
print(f'Total explaned variance: {round(varexp.sum(),2)}')

## 2.- Fit a knn model¶

In [None]:
knnmodel_lda = KNeighborsClassifier(n_neighbors=5)
knnmodel_lda.fit(X_lda, y)

## 3.- Show model accuracy

In [None]:
acc1 = knnmodel_lda.score(X_lda, y)
print(f"Accuracy: {round(acc1, 2)}")

## 4.- Plot confusion matrix

In [None]:
plot_confusion_matrix(knnmodel_lda, X_lda, y);

## 5.- Plot data and contour prediction (2 LDA components)

In [None]:
fig = make_subplots(rows=1, cols=1)

fig.add_trace(go.Contour(
    x=X_lda[:,0],
    y=X_lda[:,1],
    z=knnmodel_lda.predict(X_lda),
    showscale=False,
    opacity=0.40,
    colorscale='portland'
), row=1, col=1)

fig.add_trace(go.Scatter(
    x=X_lda[:,0],
    y=X_lda[:,1],
    text=y,
    mode='markers',
    marker_symbol=y,
    marker=dict(color=y, colorscale='portland')
), row=1, col=1)

fig.update_layout(showlegend=False)

fig.show() 

## Video de la Semana

[Explained In A Minute: Neural Networks](https://www.youtube.com/watch?v=rEDzUT3ymw4)

[a Eurovision song created by Artificial Intelligence](https://www.youtube.com/watch?v=4MKAf6YX_7M)

Si quieren profundizar mas en los Eigenvectores y Eigenvalores les recomiendo este canal

[Eigenvectors and eigenvalues](https://www.youtube.com/watch?v=PFDu9oVAE-g)


## Lectura Recomendada

> Foundation (Isaac Asimov)