# **k-Nearest Neighbors**

Here's a table summarizing the k-Nearest Neighbors algorithm:

| Step | Description |
|---|---|
| 1. **Identify Neighbors** |  Locate the *k* closest labeled data points to the unlabeled data point.  The value of *k* is a parameter set by the user. |
| 2. **Majority Vote** | Determine the most frequent label among the *k* nearest neighbors. |
| 3. **Prediction** | Assign the most frequent label (from step 2) as the predicted label for the unlabeled data point. |



**Instructions: Using Scikit-Learn to Fit a KNN Classifier**

**Step-by-Step Guide**

1. **Import the Necessary Library:**

    - Begin by importing the KNeighborsClassifier class from the sklearn.neighbors module.

In [None]:
from sklearn.neighbors import KNeighborsClassifier

2. **Prepare Your Data:**

    - Split your dataset into X (features) and y (target values).
    - Ensure X is a 2D array where each column represents a feature and each row represents an observation.
    - Ensure y is a 1D array with the same number of observations as X.

In [None]:
# Example data
X = dataset[['feature1', 'feature2']].values  # Features
y = dataset['churn_status'].values           # Target

3. **Convert Data to NumPy Arrays:**

    - Use the .values attribute to convert X and y to NumPy arrays if they are not already.

In [None]:
X = X.values  # Convert to NumPy array if not already
y = y.values  # Convert to NumPy array if not already

4. **Check the Shape of the Data:**

    - Print the shape of X and y to verify the number of observations and features.

In [None]:
print(X.shape)  # Should output (3333, 2) if there are 3333 observations and 2 features
print(y.shape)  # Should output (3333,) if there are 3333 target values

5. **Instantiate the KNN Classifier:**

    - Create an instance of KNeighborsClassifier, specifying the number of neighbors with the n_neighbors parameter.

In [None]:
knn = KNeighborsClassifier(n_neighbors=15)

6. **Fit the Classifier to the Data:**

    - Use the .fit method of the classifier to train it on the labeled data. Pass X (features) and y (target values) as arguments.

In [None]:
knn.fit(X, y)

**Summary of Code**

Here's the complete code to fit a KNN classifier using scikit-learn:

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Split data into features (X) and target (y)
X = dataset[['feature1', 'feature2']].values  # Features
y = dataset['churn_status'].values           # Target

# Check the shape of the data
print(X.shape)  # Output: (3333, 2)
print(y.shape)  # Output: (3333,)

# Instantiate the KNN classifier with 15 neighbors
knn = KNeighborsClassifier(n_neighbors=15)

# Fit the classifier to the data
knn.fit(X, y)

**Using scikit-learn to fit a classifier**

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# Assuming 'churn_df' is a pandas DataFrame with columns 'total_day_charge', 'total_eve_charge', and 'churn'
X = churn_df[["total_day_charge", "total_eve_charge"]].values  # Features
y = churn_df["churn"].values  # Target variable

print(X.shape, y.shape)  # Print the shapes of the feature and target arrays

(3333, 2), (3333,)

knn = KNeighborsClassifier(n_neighbors=15)  # Initialize KNN classifier with 15 neighbors
knn.fit(X, y)  # Fit the classifier to the data

**Explanation:**

- To fit a KNN model using scikit-learn, we import KNeighborsClassifier from sklearn-dot-neighbors. We split our data into X, a 2D array of our features, and y, a 1D array of the target values - in this case, churn status. 
- scikit-learn requires that the features are in an array where each column is a feature and each row a different observation. Similarly, the target needs to be a single column with the same number of observations as the feature data. We use the dot-values attribute to convert X and y to NumPy arrays. 
- Printing the shape of X and y, we see there are 3333 observations of two features, and 3333 observations of the target variable. 
- We then instantiate our KNeighborsClassifier, setting n_neighbors equal to 15, and assign it to the variable knn. - Then we can fit this classifier to our labeled data by applying the classifier's dot-fit method and passing two arguments: the feature values, X, and the target values, y.

**Predicting on unlabeled data**

In [None]:
import numpy as np

# Sample data points for prediction (unlabeled)
X_new = np.array([[56.8, 17.5],
                  [24.4, 24.1],
                  [50.1, 10.9]])

# Print the shape of the data (number of samples, number of features)
print(X_new.shape)  # Output: (3, 2)

# Assuming 'knn' is a pre-trained K-Nearest Neighbors model
# Predict the classes for the new data points
predictions = knn.predict(X_new)

# Print the predictions
print('Predictions: {}'.format(predictions)) # Output: Predictions: [1 0 0]

**Explanation:**

- Here we have a set of new observations, X_new. Checking the shape of X_new, we see it has three rows and two columns, that is, three observations and two features. 
- We use the classifier's dot-predict method and pass it the unseen data as a 2D NumPy array containing features in columns and observations in rows. 
- Printing the predictions returns a binary value for each observation or row in X_new. It predicts 1, which corresponds to 'churn', for the first observation, and 0, which corresponds to 'no churn', for the second and third observations.