<a href="https://colab.research.google.com/github/mayait/ClaseMachineLearning/blob/main/SupervisedLearning/Classification/Ejercicio_TelecomChurn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Supervised Learning
## Classification


**Dataset**

https://archive.ics.uci.edu/ml/datasets/Iranian+Churn+Dataset

https://www.kaggle.com/datasets/royjafari/customer-churn


**Data Dictionary**

*   Column	Explanation
*   Call Failure	number of call failures
*   Complaints	binary (0: No complaint, 1: complaint)
*   Subscription Length	total months of subscription
*   Charge Amount	ordinal attribute (0: lowest amount, 9: highest amount)
*   Seconds of Use	total seconds of calls
*   Frequency of use	total number of calls
*   Frequency of SMS	total number of text messages
*   Distinct Called Numbers	total number of distinct phone calls
*   Age Group	ordinal attribute (1: younger age, 5: older age)
*   Tariff Plan	binary (1: Pay as you go, 2: contractual)
*   Status	binary (1: active, 2: non-active)
*   Age	age of customer
*   Customer Value	the calculated value of customer
*   Churn	class label (1: churn, 0: non-churn)


In [None]:
# Descarga el dataset Telecom Churn
!wget https://raw.githubusercontent.com/mayait/ClaseAnalisisDatos/main/machine_learning/datasets/telecom_churn_clean.csv

In [None]:
# Imports

import numpy as np
import pandas as pd


from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets, neighbors

from mlxtend.plotting import plot_decision_regions

import matplotlib.pyplot as plt
plt.rcParams['figure.dpi'] = 100
plt.rcParams['savefig.dpi'] = 100
from matplotlib.colors import ListedColormap
import seaborn as sns
# %config InlineBackend.figure_format = 'retina' # sharper plots

In [None]:
df = pd.read_csv("telecom_churn_clean.csv")

In [None]:
df.sample(10)

In [None]:
sns.scatterplot(data=df, x="customer_service_calls", y="account_length", hue="churn")

In [None]:
sns.pairplot(df[["account_length", "customer_service_calls",'churn']], hue='churn')

In [None]:
# Esta función la vamos a utilizar más adelante
def knn_comparison(data, k):
  y = churn_df["churn"].values
  x = churn_df[["account_length", "customer_service_calls"]].values


  clf = neighbors.KNeighborsClassifier(n_neighbors=k)
  clf.fit(x, y)
  # Plotting decision region
  plot_decision_regions(x, y, clf=clf, legend=2)
  # Adding axes annotations
  plt.xlabel('account_length')
  plt.ylabel('customer_service_calls')
  plt.title('Knn with K='+ str(k))
  plt.show()

knn_comparison(churn_df, 5)

**k-Nearest Neighbors**
The idea of k-Nearest Neighbors, or KNN, is to predict the label of any data point by looking at the k, for example, three, closest labeled data points and getting them to vote on what label the unlabeled observation should have. KNN uses majority voting, which makes predictions based on what label the majority of nearest neighbors have.

**k-Nearest Neighbors: Fit**

In this exercise, you will build your first classification model using the churn_df dataset, which has been preloaded for the remainder of the chapter.

The features to use will be "account_length" and "customer_service_calls". The target, "churn", needs to be a single column with the same number of observations as the feature data.

You will convert the features and the target variable into NumPy arrays, create an instance of a KNN classifier, and then fit it to the data.

numpy has also been preloaded for you as np.

# 🌶️ Import KNeighborsClassifier

**Instructions**
* Import KNeighborsClassifier from sklearn.neighbors.
* Create an array called X containing values from the "account_length" and "customer_service_calls" columns, and an array called y for the values of the "churn" column.
* Instantiate a KNeighborsClassifier called knn with 6 neighbors.
Fit the classifier to the data using the .fit() method.

In [None]:
# Import KNeighborsClassifier
from ____.____ import ____ 

In [None]:
# Create arrays for the features and the target variable
y = ____["____"].values
X = ____[["____", "____"]].values

In [None]:
# Create a KNN classifier with 6 neighbors
knn = ____

In [None]:
# Fit the classifier to the data
knn.____(____, ____)

In [None]:
# SOLUCIÓN
# Import KNeighborsClassifier
import numpy as np
from sklearn.neighbors import KNeighborsClassifier 

# Create arrays for the features and the target variable
y = df["churn"].values
X = df[["account_length", "customer_service_calls"]].values

# Create a KNN classifier with 6 neighbors
knn = KNeighborsClassifier(n_neighbors = 6)

# Fit the classifier to the data
knn.fit(X,y)

## **k-Nearest Neighbors: Predict**
Now you have fit a KNN classifier, you can use it to predict the label of new data points. All available data was used for training, however, fortunately, there are new observations available as X_new

The model knn, which you created and fit the data in the last exercise, has been preloaded for you. You will use your classifier to predict the labels of a set of new data points:

```
X_new = np.array([[30.0, 17.5],
                  [107.0, 24.1],
                  [213.0, 10.9]])
```

In [None]:
X_new = np.array([[30.0, 17.5],
                  [107.0, 24.1],
                  [213.0, 10.9]])

🌶️ Predict the labels for the X_new

In [None]:
# Predict the labels for the X_new
y_pred = ____

# Print the predictions for X_new
print("Predictions: {}".format(____)) 

In [None]:
# SOLUCION
# Predict the labels for the X_new
y_pred = knn.predict(X_new)

# Print the predictions for X_new
print("Predictions: {}".format(y_pred)) 

In [None]:
The model has predicted the first and third customers will not churn in the new array. But how do we know how accurate these predictions are?