# Supervised Learning

1. **Clasificación**: Predecir el label según una categoría establecida
2. **Regresión**: Predecir variable continuas (Ejemplo: Precio de una casa)

**Feature**: X
**Target**: Y

Requisitos para supervised learning:
- No missing values
- Data en valores númericos
- Data guardada en formato pandas DataFrame o array Numpy
- Realizar un EDA (Exploratory Data Analysis) 

```python
#sintaxis de sklearn

from sklearn.module import Model
model = Model()
model.fit(X,y)
predictions = model.predict(X_new)
print(predictions)

```


In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load Iris dataset
iris = load_iris()

# Convert to Pandas DataFrame
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

print(df.head())


# Classifying labels of unseen data

1. Build a model
2. Model learns from the labeled data we pass to it
3. Pass unlabeled data to the model as input
4. Model predicts the labels of the unseen data

Labeled data = training data

# K-Nearest Neighbors
- Predict the label of a data point by
    - Looking at the K closest labeled data points
    - Taking a majority vote

In [None]:
# Import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier 

y = churn_df["churn"].values
X = churn_df[["account_length", "customer_service_calls"]].values

# Create a KNN classifier with 6 neighbors
knn = KNeighborsClassifier(n_neighbors=6)

# Fit the classifier to the data
knn.fit(X, y)

In [None]:
import numpy as np

X_new = np.array([[30.0, 17.5],
                  [107.0, 24.1],
                  [213.0, 10.9]])
# Predict the labels for the X_new
y_pred = knn.predict(X_new)

# Print the predictions
print("Predictions: {}".format(y_pred)) 

# Measuring model performance 

- Accuracy 

Correct predictions / Total observations

- We could compute accuracy on the data used to fit the classifier, but its not an indicative of ability to generalize.
- Split data --> Training and test set 


![Image Description](images/computing_accuracy.png)



stratify = ensure correct representation of labels

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.3, random_state = 21, stratify = y )

knn = KNeighborsClassifier(n_neighbors = 6)
knn.fit(X_train, y_train)
print(knn.score(X_test, y_test))

# Model Complexity

- **Larger K** = less complex model = can cause underfitting
- **Smaller K** = more complex model = can lead to overfitting


In [None]:
train_accuracies = {}
test_accuracies = {}

neighbors = np.arange(1,26) #siempre se le resta 1 al valor maximo
for neighbor in neighbors:
    knn = KNeighborsClassifier(n_neighbors = neighbor)
    knn.fit(X_train, y_train)
    train_accuracies[neighbor] = knn.score(X_train, y_train)
    test_accuracies[neighbor] = knn.score(X_test, y_test)

In [None]:
# Plotting our results
import matplotlib.pyplot as plt
plt.figure(figsize=(8, 6))
plt.title("KNN: Varying Number of Neighbors")
plt.plot(neighbors, train_accuracies.values(), label="Training Accuracy")
plt.plot(neighbors, test_accuracies.values(), label="Testing Accuracy")
plt.legend()
plt.xlabel("Number of Neighbors")
plt.ylabel("Accuracy")
plt.show() 