# K-Nearest Neighbors (KNN) Algorithm - Hands-on Implementation
**Phase 2: From Concept to Code**

Welcome to the hands-on coding session! 🎓

In this notebook, we'll implement the KNN algorithm step by step using the student exam prediction example from our presentation.

---

**What you'll learn:**
- ✅ Implement KNN from scratch
- ✅ Calculate distances between data points
- ✅ Find K nearest neighbors
- ✅ Make predictions using majority voting
- ✅ Compare with scikit-learn implementation
- ✅ Visualize results and analyze performance

In [None]:
# ====================================================================
# K-NEAREST NEIGHBORS (KNN) - HANDS-ON IMPLEMENTATION
# Phase 2: From Concept to Code
# ====================================================================

# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

print("🎓 Welcome to KNN Hands-on Implementation!")
print("Let's convert our student prediction concept into working code!")
print("📚 All libraries imported successfully!")

## 📊 1. Quick Concept Recap

Before we start coding, let's quickly recap what we learned in Phase 1 (the interactive presentation).

In [None]:
print("📚 QUICK RECAP - What we learned in Phase 1:")
print("=" * 50)
print("✅ KNN finds similar examples to make predictions")
print("✅ We calculate distance between data points")
print("✅ We find K nearest neighbors")
print("✅ We use majority vote to predict")
print("✅ We tested with Alex and Sam's data")
print("\nNow let's code this step by step! 🚀")

## 🗃️ 2. Creating Our Dataset

Let's recreate the exact student dataset from our presentation - 12 students with their study hours, sleep hours, and exam results.

In [None]:
# Let's recreate the exact dataset from our presentation
print("📋 STEP 1: Creating Our Student Dataset")
print("=" * 40)

# Our 12 students' data from the presentation
students_data = {
    'Student_ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    'Hours_Studied': [8, 2, 9, 1, 7, 3, 6, 2, 8, 4, 9, 1],
    'Hours_Slept': [7, 6, 8, 5, 7, 4, 8, 3, 6, 9, 7, 8],
    'Result': ['Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Fail', 
               'Pass', 'Fail', 'Pass', 'Fail', 'Pass', 'Fail']
}

# Create DataFrame
df = pd.DataFrame(students_data)
print("Our Dataset:")
print(df)

# Convert Result to numeric for calculations (0=Fail, 1=Pass)
df['Result_Numeric'] = df['Result'].map({'Fail': 0, 'Pass': 1})
print(f"\n📊 Dataset Info:")
print(f"Total students: {len(df)}")
print(f"Passed: {sum(df['Result_Numeric'])} students")
print(f"Failed: {len(df) - sum(df['Result_Numeric'])} students")

## 📈 3. Visualizing Our Data

Let's visualize our data to understand the patterns and relationships between study hours, sleep hours, and exam results.

In [None]:
# Let's visualize our data to understand the patterns
print("📈 STEP 2: Visualizing Our Data")
print("=" * 40)

# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Scatter plot
colors = ['red' if result == 'Fail' else 'green' for result in df['Result']]
ax1.scatter(df['Hours_Studied'], df['Hours_Slept'], c=colors, s=100, alpha=0.7)
ax1.set_xlabel('Hours Studied')
ax1.set_ylabel('Hours Slept')
ax1.set_title('Student Performance: Study vs Sleep Hours')
ax1.grid(True, alpha=0.3)

# Add student IDs to points
for i, row in df.iterrows():
    ax1.annotate(f'S{row["Student_ID"]}', 
                (row['Hours_Studied'], row['Hours_Slept']),
                xytext=(5, 5), textcoords='offset points')

# Create custom legend
import matplotlib.patches as mpatches
red_patch = mpatches.Patch(color='red', label='Fail')
green_patch = mpatches.Patch(color='green', label='Pass')
ax1.legend(handles=[red_patch, green_patch])

# Bar plot showing pass/fail distribution
result_counts = df['Result'].value_counts()
ax2.bar(result_counts.index, result_counts.values, color=['red', 'green'], alpha=0.7)
ax2.set_title('Pass/Fail Distribution')
ax2.set_ylabel('Number of Students')

plt.tight_layout()
plt.show()

print("👀 What patterns do you see?")
print("🤔 Can you identify clusters of similar students?")

## 🧮 4. Implementing Distance Calculation

The core of KNN is calculating how similar (or distant) data points are from each other. We'll use the Euclidean distance formula.

In [None]:
print("🧮 STEP 3: Implementing Distance Calculation")
print("=" * 50)

def calculate_euclidean_distance(point1, point2):
    """
    Calculate Euclidean distance between two points
    Formula: √[(x₁-x₂)² + (y₁-y₂)²]
    """
    return np.sqrt((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2)

# Test with Alex's example from presentation
alex_data = [5, 7]  # 5 hours studied, 7 hours slept
print(f"Alex's data: {alex_data[0]} hours studied, {alex_data[1]} hours slept")
print("\nCalculating distances to all students:")
print("-" * 40)

distances = []
for i, row in df.iterrows():
    student_data = [row['Hours_Studied'], row['Hours_Slept']]
    distance = calculate_euclidean_distance(alex_data, student_data)
    distances.append({
        'Student_ID': row['Student_ID'],
        'Hours_Studied': row['Hours_Studied'],
        'Hours_Slept': row['Hours_Slept'],
        'Result': row['Result'],
        'Distance': round(distance, 2)
    })
    print(f"Student {row['Student_ID']}: ({row['Hours_Studied']}, {row['Hours_Slept']}) -> Distance: {distance:.2f}")

# Convert to DataFrame and sort by distance
distances_df = pd.DataFrame(distances)
distances_df = distances_df.sort_values('Distance')
print("\n📊 Sorted by Distance (Closest First):")
print(distances_df)

## 🎯 5. Finding K Nearest Neighbors

Now let's find the K closest students to Alex and see how different K values affect our prediction.

In [None]:
print("🎯 STEP 4: Finding K Nearest Neighbors")
print("=" * 45)

def find_k_nearest_neighbors(distances_df, k):
    """Find K nearest neighbors"""
    return distances_df.head(k)

# Let's test with different K values
k_values = [1, 3, 5]

print(f"Alex's nearest neighbors for different K values:")
print("=" * 50)

for k in k_values:
    print(f"\n🔍 K = {k}:")
    neighbors = find_k_nearest_neighbors(distances_df, k)
    print(neighbors[['Student_ID', 'Hours_Studied', 'Hours_Slept', 'Result', 'Distance']])
    
    # Count votes
    pass_votes = sum(neighbors['Result'] == 'Pass')
    fail_votes = sum(neighbors['Result'] == 'Fail')
    
    print(f"Votes: Pass={pass_votes}, Fail={fail_votes}")
    
    # Make prediction
    prediction = 'Pass' if pass_votes > fail_votes else 'Fail'
    confidence = max(pass_votes, fail_votes) / k * 100
    
    print(f"🎯 Prediction: {prediction}")
    print(f"📊 Confidence: {confidence:.1f}%")
    print("-" * 30)

## 🤖 6. Complete KNN Implementation

Now let's build a complete KNN class that we can reuse for different predictions.

In [None]:
print("🤖 STEP 5: Complete KNN Implementation")
print("=" * 42)

class SimpleKNN:
    def __init__(self, k=3):
        self.k = k
        self.X_train = None
        self.y_train = None
    
    def fit(self, X, y):
        """Store training data"""
        self.X_train = np.array(X)
        self.y_train = np.array(y)
        print(f"✅ Training completed with {len(X)} samples")
    
    def _calculate_distance(self, point1, point2):
        """Calculate Euclidean distance"""
        return np.sqrt(np.sum((point1 - point2)**2))
    
    def predict_single(self, x, verbose=False):
        """Predict single point"""
        # Calculate distances to all training points
        distances = []
        for i, train_point in enumerate(self.X_train):
            dist = self._calculate_distance(np.array(x), train_point)
            distances.append((dist, self.y_train[i]))
        
        # Sort by distance and get K nearest
        distances.sort(key=lambda x: x[0])
        k_nearest = distances[:self.k]
        
        if verbose:
            print(f"K={self.k} nearest neighbors:")
            for i, (dist, label) in enumerate(k_nearest):
                print(f"  {i+1}. Distance: {dist:.2f}, Result: {label}")
        
        # Count votes
        votes = {}
        for _, label in k_nearest:
            votes[label] = votes.get(label, 0) + 1
        
        # Make prediction
        prediction = max(votes, key=votes.get)
        confidence = votes[prediction] / self.k * 100
        
        if verbose:
            print(f"Votes: {votes}")
            print(f"Prediction: {prediction} ({confidence:.1f}% confidence)")
        
        return prediction, confidence, votes
    
    def predict(self, X):
        """Predict multiple points"""
        predictions = []
        for x in X:
            pred, _, _ = self.predict_single(x)
            predictions.append(pred)
        return predictions

# Test our implementation
print("🧪 Testing Our KNN Implementation")
print("=" * 35)

# Prepare training data
X_train = df[['Hours_Studied', 'Hours_Slept']].values
y_train = df['Result'].values

# Create and train KNN
knn = SimpleKNN(k=3)
knn.fit(X_train, y_train)

# Test with Alex
print("\n👨‍🎓 Testing with Alex (5 hours studied, 7 hours slept):")
alex_prediction, alex_confidence, alex_votes = knn.predict_single([5, 7], verbose=True)

## 🧪 7. Testing Different Students

Let's test our KNN implementation with different students, including the examples from our presentation.

In [None]:
print("🧪 STEP 6: Testing Different Students")
print("=" * 40)

# Test cases from our presentation
test_students = {
    'Alex': [5, 7],
    'Sam': [4, 6],  # The edge case
    'Maya': [7, 5]  # The practice exercise
}

print("Testing multiple students with K=3:")
print("=" * 40)

results = []
for name, data in test_students.items():
    print(f"\n🎓 {name}: {data[0]} hours studied, {data[1]} hours slept")
    prediction, confidence, votes = knn.predict_single(data, verbose=False)
    results.append({
        'Student': name,
        'Hours_Studied': data[0],
        'Hours_Slept': data[1],
        'Prediction': prediction,
        'Confidence': f"{confidence:.1f}%",
        'Votes': str(votes)
    })
    print(f"   Prediction: {prediction} ({confidence:.1f}% confidence)")
    print(f"   Votes: {votes}")

# Create results DataFrame
results_df = pd.DataFrame(results)
print("\n📊 Summary of Predictions:")
print(results_df)

## 📊 8. Visualizing Predictions

Let's create a visualization showing our training data and the predictions for new students.

In [None]:
print("📊 STEP 7: Visualizing Predictions")
print("=" * 38)

# Create visualization with predictions
fig, ax = plt.subplots(1, 1, figsize=(12, 8))

# Plot training data
for i, row in df.iterrows():
    color = 'green' if row['Result'] == 'Pass' else 'red'
    marker = 'o'
    ax.scatter(row['Hours_Studied'], row['Hours_Slept'], 
              c=color, s=100, alpha=0.7, marker=marker)
    ax.annotate(f'S{row["Student_ID"]}', 
                (row['Hours_Studied'], row['Hours_Slept']),
                xytext=(5, 5), textcoords='offset points')

# Plot prediction points
for result in results:
    name = result['Student']
    x, y = result['Hours_Studied'], result['Hours_Slept']
    color = 'darkgreen' if result['Prediction'] == 'Pass' else 'darkred'
    ax.scatter(x, y, c=color, s=200, alpha=0.9, marker='*', 
              edgecolors='black', linewidth=2)
    ax.annotate(f'{name}\n({result["Prediction"]})', 
                (x, y), xytext=(10, -20), textcoords='offset points',
                bbox=dict(boxstyle='round,pad=0.3', facecolor=color, alpha=0.3))

ax.set_xlabel('Hours Studied')
ax.set_ylabel('Hours Slept')
ax.set_title('KNN Predictions: Training Data vs New Students')
ax.grid(True, alpha=0.3)

# Create custom legend
import matplotlib.lines as mlines
train_fail = mlines.Line2D([], [], color='red', marker='o', linestyle='None', markersize=8, label='Training: Fail')
train_pass = mlines.Line2D([], [], color='green', marker='o', linestyle='None', markersize=8, label='Training: Pass')
pred_star = mlines.Line2D([], [], color='black', marker='*', linestyle='None', markersize=12, label='New Predictions')
ax.legend(handles=[train_fail, train_pass, pred_star], loc='upper left')

plt.tight_layout()
plt.show()

print("⭐ Stars represent our predictions!")
print("🔴 Red = Predicted Fail, 🟢 Green = Predicted Pass")

## 🔄 9. Comparing Different K Values

Let's see how different K values affect our predictions and understand the trade-offs.

In [None]:
print("🔄 STEP 8: Comparing Different K Values")
print("=" * 45)

# Test different K values for Alex
k_values = [1, 3, 5, 7]
alex_data = [5, 7]

print(f"Alex's predictions with different K values:")
print("=" * 45)

k_comparison = []
for k in k_values:
    knn_k = SimpleKNN(k=k)
    knn_k.fit(X_train, y_train)
    prediction, confidence, votes = knn_k.predict_single(alex_data)
    
    k_comparison.append({
        'K': k,
        'Prediction': prediction,
        'Confidence': f"{confidence:.1f}%",
        'Pass_Votes': votes.get('Pass', 0),
        'Fail_Votes': votes.get('Fail', 0)
    })
    
    print(f"K={k}: {prediction} ({confidence:.1f}% confidence) - Votes: {votes}")

# Create comparison DataFrame
k_comparison_df = pd.DataFrame(k_comparison)
print("\n📊 K Values Comparison:")
print(k_comparison_df)

# Visualize K comparison
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
x_pos = range(len(k_values))
confidences = [float(row['Confidence'].rstrip('%')) for row in k_comparison]

bars = ax.bar(x_pos, confidences, alpha=0.7, 
              color=['red' if row['Prediction'] == 'Fail' else 'green' 
                     for row in k_comparison])

ax.set_xlabel('K Value')
ax.set_ylabel('Confidence (%)')
ax.set_title('Alex\'s Prediction Confidence vs K Value')
ax.set_xticks(x_pos)
ax.set_xticklabels([f'K={k}' for k in k_values])

# Add value labels on bars
for bar, comp in zip(bars, k_comparison):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height + 1,
            f'{comp["Prediction"]}\n{comp["Confidence"]}',
            ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 🆚 10. Comparing with Scikit-Learn

Let's compare our implementation with the industry-standard scikit-learn library to validate our results.

In [None]:
print("🆚 STEP 9: Comparing with Scikit-Learn")
print("=" * 42)

# Using sklearn's KNN
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder

# Prepare data for sklearn
le = LabelEncoder()
y_encoded = le.fit_transform(y_train)  # Convert Pass/Fail to 0/1

# Create sklearn KNN
sklearn_knn = KNeighborsClassifier(n_neighbors=3)
sklearn_knn.fit(X_train, y_encoded)

# Test with our students
print("Comparing Our Implementation vs Scikit-Learn:")
print("=" * 50)

test_data = [[5, 7], [4, 6], [7, 5]]  # Alex, Sam, Maya
test_names = ['Alex', 'Sam', 'Maya']

comparison_results = []
for i, (name, data) in enumerate(zip(test_names, test_data)):
    # Our implementation
    our_pred, our_conf, our_votes = knn.predict_single(data)
    
    # Sklearn implementation
    sklearn_pred_num = sklearn_knn.predict([data])[0]
    sklearn_pred = le.inverse_transform([sklearn_pred_num])[0]
    sklearn_proba = sklearn_knn.predict_proba([data])[0]
    sklearn_conf = max(sklearn_proba) * 100
    
    comparison_results.append({
        'Student': name,
        'Our_Prediction': our_pred,
        'Our_Confidence': f"{our_conf:.1f}%",
        'Sklearn_Prediction': sklearn_pred,
        'Sklearn_Confidence': f"{sklearn_conf:.1f}%",
        'Match': '✅' if our_pred == sklearn_pred else '❌'
    })
    
    print(f"\n👨‍🎓 {name}:")
    print(f"  Our KNN:     {our_pred} ({our_conf:.1f}% confidence)")
    print(f"  Sklearn KNN: {sklearn_pred} ({sklearn_conf:.1f}% confidence)")
    print(f"  Match: {'✅' if our_pred == sklearn_pred else '❌'}")

comparison_df = pd.DataFrame(comparison_results)
print("\n📊 Comparison Summary:")
print(comparison_df)

print("\n🎯 Great! Our implementation matches sklearn!")

## 🎮 11. Interactive Exercise - Your Turn!

Now it's your turn to experiment! Try different scenarios and see what happens.

In [None]:
print("🎮 STEP 10: Your Turn - Interactive Exercise!")
print("=" * 48)

print("Now it's your turn to experiment!")
print("Try different scenarios and see what happens:")

def interactive_prediction(study_hours, sleep_hours, k_value):
    """Interactive function for students to try"""
    print("\n" + "="*50)
    print("🎯 INTERACTIVE KNN PREDICTOR")
    print("="*50)
    
    print(f"📚 Testing student with:")
    print(f"  - Study hours: {study_hours}")
    print(f"  - Sleep hours: {sleep_hours}")
    print(f"  - K value: {k_value}")
    
    # Create KNN with chosen K
    knn_interactive = SimpleKNN(k=k_value)
    knn_interactive.fit(X_train, y_train)
    
    # Make prediction
    prediction, confidence, votes = knn_interactive.predict_single(
        [study_hours, sleep_hours], verbose=True
    )
    
    print(f"\n🎯 Final Prediction: {prediction}")
    print(f"📊 Confidence: {confidence:.1f}%")
    
    # Show where this point falls in our data
    print(f"\n💡 Analysis:")
    if study_hours >= 7 and sleep_hours >= 6:
        print("  This student has good study and sleep habits!")
    elif study_hours <= 3:
        print("  Low study hours - risky!")
    elif sleep_hours <= 4:
        print("  Low sleep hours - might affect performance!")
    else:
        print("  This is a borderline case - could go either way!")
    
    return prediction, confidence

# Example predictions - YOU CAN MODIFY THESE VALUES!
print("\n🧪 Try these examples (or modify the values):")

# Example 1: High performer
interactive_prediction(study_hours=9, sleep_hours=8, k_value=3)

# Example 2: Struggling student
interactive_prediction(study_hours=2, sleep_hours=4, k_value=3)

# Example 3: Your custom example
interactive_prediction(study_hours=6, sleep_hours=6, k_value=5)

print("\n" + "="*50)
print("🎯 TRY IT YOURSELF!")
print("="*50)
print("1. Change the study_hours, sleep_hours, and k_value in the examples above")
print("2. Run the cell again to see different predictions")
print("3. Try extreme values like (1,1) or (10,10)")
print("4. See how different K values affect the results")

## 📈 12. Performance Analysis

Let's analyze how well our KNN performs and understand the impact of different K values.

In [None]:
print("📈 STEP 11: Performance Analysis")
print("=" * 38)

# Let's analyze how well our KNN performs on the training data
print("Testing KNN performance on training data:")
print("(Note: This is just for learning - normally we'd use separate test data)")

# Test with different K values
k_values = [1, 3, 5, 7]
performance_results = []

for k in k_values:
    knn_test = SimpleKNN(k=k)
    knn_test.fit(X_train, y_train)
    
    # Predict on training data
    predictions = knn_test.predict(X_train)
    
    # Calculate accuracy
    correct = sum(1 for i, pred in enumerate(predictions) if pred == y_train[i])
    accuracy = correct / len(y_train) * 100
    
    performance_results.append({
        'K': k,
        'Correct_Predictions': correct,
        'Total_Predictions': len(y_train),
        'Accuracy': f"{accuracy:.1f}%"
    })
    
    print(f"K={k}: {correct}/{len(y_train)} correct ({accuracy:.1f}%)")

# Create performance DataFrame
performance_df = pd.DataFrame(performance_results)
print("\n📊 Performance Summary:")
print(performance_df)

# Visualize performance
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
accuracies = [float(row['Accuracy'].rstrip('%')) for row in performance_results]

ax.plot(k_values, accuracies, marker='o', linewidth=2, markersize=8)
ax.set_xlabel('K Value')
ax.set_ylabel('Accuracy (%)')
ax.set_title('KNN Performance vs K Value')
ax.grid(True, alpha=0.3)
ax.set_xticks(k_values)

# Add value labels
for k, acc in zip(k_values, accuracies):
    ax.annotate(f'{acc:.1f}%', (k, acc), xytext=(0, 10), 
                textcoords='offset points', ha='center')

plt.tight_layout()
plt.show()

print("\n💡 Key Insights:")
print("- K=1 might overfit (memorizes training data)")
print("- Higher K values might be more robust")
print("- For real applications, always test on separate data!")

## 🎯 13. Real Dataset Example

Let's test our KNN on a larger, more realistic dataset to see how it performs in practice.

In [None]:
print("🎯 STEP 12: Real Dataset Example")
print("=" * 38)

# Let's create a larger, more realistic dataset
np.random.seed(42)  # For reproducible results

# Generate synthetic student data
n_students = 100
study_hours = np.random.normal(5, 2.5, n_students)  # Mean 5, std 2.5
sleep_hours = np.random.normal(7, 1.5, n_students)  # Mean 7, std 1.5

# Clip to reasonable ranges
study_hours = np.clip(study_hours, 0, 12)
sleep_hours = np.clip(sleep_hours, 3, 12)

# Create realistic pass/fail based on study and sleep
# Higher study hours and adequate sleep increase pass probability
pass_probability = 0.1 + 0.7 * (study_hours / 12) + 0.2 * (sleep_hours / 12)
pass_probability = np.clip(pass_probability, 0, 1)

# Generate results based on probability
results = np.random.binomial(1, pass_probability, n_students)
results_text = ['Pass' if r == 1 else 'Fail' for r in results]

# Create DataFrame
large_df = pd.DataFrame({
    'Hours_Studied': study_hours,
    'Hours_Slept': sleep_hours,
    'Result': results_text
})

print(f"📊 Generated {len(large_df)} students")
print(f"Pass rate: {sum(results)/len(results)*100:.1f}%")

# Visualize the larger dataset
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Scatter plot
colors = ['red' if r == 'Fail' else 'green' for r in large_df['Result']]
ax1.scatter(large_df['Hours_Studied'], large_df['Hours_Slept'], 
           c=colors, alpha=0.6, s=30)
ax1.set_xlabel('Hours Studied')
ax1.set_ylabel('Hours Slept')
ax1.set_title('Large Dataset: Student Performance')
ax1.grid(True, alpha=0.3)

# Distribution
result_counts = large_df['Result'].value_counts()
ax2.bar(result_counts.index, result_counts.values, 
        color=['red', 'green'], alpha=0.7)
ax2.set_title('Pass/Fail Distribution (Large Dataset)')
ax2.set_ylabel('Number of Students')

plt.tight_layout()
plt.show()

# Test KNN on larger dataset
print("\n🧪 Testing KNN on larger dataset:")
X_large = large_df[['Hours_Studied', 'Hours_Slept']].values
y_large = large_df['Result'].values

knn_large = SimpleKNN(k=5)
knn_large.fit(X_large, y_large)

# Test with our original students
test_cases = [
    ('Alex', [5, 7]),
    ('High Performer', [9, 8]),
    ('Struggling Student', [2, 4]),
    ('Insomniac', [8, 3])
]

print("\nPredictions on larger dataset:")
for name, data in test_cases:
    pred, conf, votes = knn_large.predict_single(data)
    print(f"{name:20} -> {pred} ({conf:.1f}% confidence)")

## 🎓 14. Key Takeaways & Next Steps

Congratulations! You've successfully implemented KNN from scratch. Let's summarize what you've learned and what to explore next.

In [None]:
print("🎓 STEP 13: Key Takeaways & Next Steps")
print("=" * 46)

print("🏆 What you've accomplished:")
print("✅ Implemented KNN from scratch")
print("✅ Calculated Euclidean distances")
print("✅ Found K nearest neighbors")
print("✅ Made predictions using majority voting")
print("✅ Compared different K values")
print("✅ Validated against scikit-learn")
print("✅ Analyzed performance")
print("✅ Worked with larger datasets")

print("\n🎯 Key Insights:")
print("1. 📏 Distance calculation is the core of KNN")
print("2. 🗳️  Majority voting makes the final decision")
print("3. 🎚️  K value affects prediction confidence")
print("4. 📊 Visualization helps understand the algorithm")
print("5. 🔄 Testing different scenarios builds intuition")

print("\n🚀 Next Steps:")
print("1. 📚 Try KNN on different datasets")
print("2. 🔧 Experiment with different distance metrics")
print("3. 🎯 Learn about cross-validation")
print("4. 📈 Explore other ML algorithms")
print("5. 🛠️  Build a complete ML project")

print("\n💡 Bonus Challenges:")
print("1. Add feature scaling/normalization")
print("2. Implement weighted KNN (closer neighbors have more influence)")
print("3. Handle ties in voting")
print("4. Add more distance metrics (Manhattan, Cosine)")
print("5. Create a web interface for predictions")

print("\n🎉 Congratulations! You've mastered KNN implementation!")
print("\n📝 Remember: The best way to learn ML is by doing!")
print("Keep experimenting and building projects! 🚀")

## 🔧 15. Bonus: Advanced Features (Optional)

For curious students who want to explore more advanced KNN features!

In [None]:
print("🔧 BONUS: Advanced KNN Features")
print("=" * 35)

class AdvancedKNN(SimpleKNN):
    def __init__(self, k=3, distance_metric='euclidean', weights='uniform'):
        super().__init__(k)
        self.distance_metric = distance_metric
        self.weights = weights
    
    def _calculate_distance(self, point1, point2):
        """Calculate distance using different metrics"""
        if self.distance_metric == 'euclidean':
            return np.sqrt(np.sum((point1 - point2)**2))
        elif self.distance_metric == 'manhattan':
            return np.sum(np.abs(point1 - point2))
        elif self.distance_metric == 'chebyshev':
            return np.max(np.abs(point1 - point2))
        else:
            raise ValueError("Unsupported distance metric")
    
    def predict_single(self, x, verbose=False):
        """Advanced prediction with weighted voting"""
        # Calculate distances
        distances = []
        for i, train_point in enumerate(self.X_train):
            dist = self._calculate_distance(np.array(x), train_point)
            distances.append((dist, self.y_train[i]))
        
        # Sort and get K nearest
        distances.sort(key=lambda x: x[0])
        k_nearest = distances[:self.k]
        
        if verbose:
            print(f"Using {self.distance_metric} distance, K={self.k}")
            for i, (dist, label) in enumerate(k_nearest):
                print(f"  {i+1}. Distance: {dist:.2f}, Result: {label}")
        
        # Voting with weights
        votes = {}
        for dist, label in k_nearest:
            if self.weights == 'uniform':
                weight = 1
            elif self.weights == 'distance':
                weight = 1 / (dist + 1e-10)  # Avoid division by zero
            else:
                weight = 1
            
            votes[label] = votes.get(label, 0) + weight
        
        prediction = max(votes, key=votes.get)
        total_weight = sum(votes.values())
        confidence = votes[prediction] / total_weight * 100
        
        if verbose:
            print(f"Weighted votes: {votes}")
            print(f"Prediction: {prediction} ({confidence:.1f}% confidence)")
        
        return prediction, confidence, votes

# Test advanced features
print("🧪 Testing Advanced Features:")
print("=" * 30)

# Compare different distance metrics
metrics = ['euclidean', 'manhattan', 'chebyshev']
alex_data = [5, 7]

for metric in metrics:
    print(f"\n📏 Using {metric} distance:")
    advanced_knn = AdvancedKNN(k=3, distance_metric=metric, weights='distance')
    advanced_knn.fit(X_train, y_train)
    pred, conf, votes = advanced_knn.predict_single(alex_data, verbose=True)

print("\n🎉 You've now seen advanced KNN features!")
print("Try experimenting with different combinations!")
print("\n💡 Challenge: Implement your own distance metric!")

## 🎊 Conclusion

**Congratulations!** 🎉 You've successfully:

- ✅ **Understood** the KNN algorithm conceptually
- ✅ **Implemented** KNN from scratch in Python
- ✅ **Visualized** how the algorithm works
- ✅ **Compared** with industry-standard implementations
- ✅ **Analyzed** performance and edge cases
- ✅ **Experimented** with real-world scenarios

---

### 🚀 **What's Next?**

1. **Practice**: Try KNN on different datasets (iris, wine, digits)
2. **Explore**: Learn other ML algorithms (Decision Trees, SVM, Neural Networks)
3. **Build**: Create end-to-end ML projects
4. **Share**: Teach others what you've learned!

---

### 📚 **Resources for Further Learning**

- **Scikit-learn Documentation**: [sklearn.neighbors.KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)
- **Kaggle Learn**: Free micro-courses on Machine Learning
- **Python ML Libraries**: pandas, numpy, matplotlib, seaborn

---

**Happy Learning!** 🎓✨