# K-Nearest Neighbors Classifier Visualization and Analysis

This notebook explores how K-Nearest Neighbors (KNN) classifiers work by:
1. **Visualizing nearest neighbors** - See how neighbors influence predictions
2. **Plotting decision boundaries** - Understand classification regions  
3. **Interactive exploration** - Test different k values and distance metrics

**Goal**: Understand KNN behavior and parameter effects before formal benchmarking

In [19]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


from sklearn import datasets
from sklearn.model_selection import train_test_split

from reml.neighbors import KNeighborsClassifier
from reml.metrics import accuracy_score, confusion_matrix

plt.style.use('default')
matplotlib.use('inline') 
np.random.seed(42)

import warnings
warnings.filterwarnings('ignore')

# widget
import ipywidgets as widgets
from IPython.display import display

## 📊 Load and Explore Data

Let's use a simple 2D dataset to visualize decision boundaries clearly.

In [33]:
breast_cancer = datasets.load_breast_cancer()
feature_names = breast_cancer.feature_names
target_names = breast_cancer.target_names

# Create dropdown for feature selection with better styling
feature_x = widgets.Dropdown(
    options=[(name, i) for i, name in enumerate(feature_names)], 
    value=6, 
    description='X Feature:',
    style={'description_width': 'initial'}
)
feature_y = widgets.Dropdown(
    options=[(name, i) for i, name in enumerate(feature_names)], 
    value=4, 
    description='Y Feature:',
    style={'description_width': 'initial'}
)

@widgets.interact
def update_plot(x_feature=feature_x, y_feature=feature_y):
    """Interactive plot with proper fixed sizing"""
    X = breast_cancer.data[:, [x_feature, y_feature]]
    y_target = breast_cancer.target
    
    # Force specific figure size that VS Code respects
    plt.ioff()  # Turn off interactive mode
    
    # Create figure with explicit size control
    fig = plt.figure(figsize=(4, 3.2), dpi=120)  # Smaller, higher DPI
    ax = fig.add_subplot(111)
    
    # Remove black borders for clean look
    for spine in ax.spines.values():
        spine.set_visible(False)
    
    # Beautiful color palette and markers
    colors = ['#FF6B6B', '#4ECDC4']  # Modern coral and teal
    markers = ['o', 's']  # Circle and square markers
    
    # Plot with enhanced styling - NO WHITE EDGES for clean theme
    for i, target_name in enumerate(target_names):
        mask = y_target == i
        ax.scatter(
            X[mask, 0], X[mask, 1], 
            c=colors[i], 
            marker=markers[i],
            label=target_name, 
            alpha=0.8, 
            s=25,  # Even smaller points
            edgecolors='none'  # NO EDGES - clean modern look
        )
    
    # Compact styling
    ax.set_xlabel(feature_names[x_feature], fontsize=8, fontweight='bold', color='#2C3E50')
    ax.set_ylabel(feature_names[y_feature], fontsize=8, fontweight='bold', color='#2C3E50')
    ax.set_title("KNN Dataset", fontsize=9, fontweight='bold', color='#2C3E50', pad=6)
    
    # Compact legend
    legend = ax.legend(
        frameon=True, 
        fontsize=7,
        loc='best',
        framealpha=0.9,
        handletextpad=0.5,
        columnspacing=0.5
    )
    legend.get_frame().set_facecolor('#F8F9FA')
    legend.get_frame().set_edgecolor('#E9ECEF')
    
    # Clean background
    ax.set_facecolor('#FAFBFC')
    
    # Clean ticks
    ax.tick_params(colors='#7F8C8D', which='both', length=0, labelsize=7)
    
    # Ultra tight layout
    plt.subplots_adjust(left=0.12, right=0.95, top=0.9, bottom=0.15)
    
    # Show with explicit size
    plt.show()
    plt.close(fig)  # Explicitly close
    plt.ion()  # Turn interactive mode back on
    
    # Compact info
    print(f"📊 {X.shape} | {feature_names[x_feature]} vs {feature_names[y_feature]}")
    print(f"🎯 {len(X)} samples | Malignant: {sum(y_target)} | Benign: {len(y_target) - sum(y_target)}")

interactive(children=(Dropdown(description='X Feature:', index=6, options=((np.str_('mean radius'), 0), (np.st…

## 🎯 KNN Classification with Interactive K-Value

Now let's explore how different K values affect classification decisions.

In [None]:
# Create interactive KNN classifier exploration
k_slider = widgets.IntSlider(
    value=5,
    min=1,
    max=20,
    step=1,
    description='K Value:',
    style={'description_width': 'initial'}
)

# Feature selection dropdowns (reuse from above)
feature_x_knn = widgets.Dropdown(
    options=[(name, i) for i, name in enumerate(breast_cancer.feature_names)], 
    value=6,  # smoothness error
    description='X Feature:',
    style={'description_width': 'initial'}
)
feature_y_knn = widgets.Dropdown(
    options=[(name, i) for i, name in enumerate(breast_cancer.feature_names)], 
    value=4,  # mean smoothness  
    description='Y Feature:',
    style={'description_width': 'initial'}
)

@widgets.interact
def knn_classification_demo(k=k_slider, x_feature=feature_x_knn, y_feature=feature_y_knn):
    """Interactive KNN classification with different K values"""
    
    # Prepare data
    X = breast_cancer.data[:, [x_feature, y_feature]]
    y = breast_cancer.target
    
    # Split data for training and testing
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    
    # Train KNN classifier
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    
    # Make predictions
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    # Create visualization
    plt.ioff()
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
    
    # Plot 1: Training data with decision regions
    colors = ['#FF6B6B', '#4ECDC4']
    markers = ['o', 's']
    
    # Remove spines for clean look
    for ax in [ax1, ax2]:
        for spine in ax.spines.values():
            spine.set_visible(False)
    
    # Training data visualization
    for i, target_name in enumerate(breast_cancer.target_names):
        mask = y_train == i
        ax1.scatter(
            X_train[mask, 0], X_train[mask, 1], 
            c=colors[i], marker=markers[i], label=f'{target_name} (train)',
            alpha=0.7, s=30, edgecolors='none'
        )
    
    ax1.set_xlabel(breast_cancer.feature_names[x_feature], fontsize=9, fontweight='bold')
    ax1.set_ylabel(breast_cancer.feature_names[y_feature], fontsize=9, fontweight='bold')
    ax1.set_title(f'Training Data (K={k})', fontsize=10, fontweight='bold', color='#2C3E50')
    ax1.legend(fontsize=8)
    ax1.set_facecolor('#FAFBFC')
    ax1.tick_params(colors='#7F8C8D', labelsize=8)
    
    # Plot 2: Test predictions
    for i, target_name in enumerate(breast_cancer.target_names):
        # True labels
        mask_true = y_test == i
        ax2.scatter(
            X_test[mask_true, 0], X_test[mask_true, 1], 
            c=colors[i], marker=markers[i], label=f'{target_name} (true)',
            alpha=0.8, s=40, edgecolors='white', linewidth=1
        )
        
        # Predicted labels (different marker)
        mask_pred = y_pred == i
        ax2.scatter(
            X_test[mask_pred, 0], X_test[mask_pred, 1], 
            c=colors[i], marker='x', label=f'{target_name} (pred)',
            alpha=0.9, s=60, linewidth=2
        )
    
    ax2.set_xlabel(breast_cancer.feature_names[x_feature], fontsize=9, fontweight='bold')
    ax2.set_ylabel(breast_cancer.feature_names[y_feature], fontsize=9, fontweight='bold')
    ax2.set_title(f'Test Predictions (Acc: {accuracy:.3f})', fontsize=10, fontweight='bold', color='#2C3E50')
    ax2.legend(fontsize=7, ncol=2)
    ax2.set_facecolor('#FAFBFC')
    ax2.tick_params(colors='#7F8C8D', labelsize=8)
    
    plt.tight_layout()
    plt.show()
    plt.close()
    plt.ion()
    
    # Print classification results
    print(f"🎯 K={k} | Accuracy: {accuracy:.3f} | Train: {len(X_train)} | Test: {len(X_test)}")
    print(f"📊 Features: {breast_cancer.feature_names[x_feature]} vs {breast_cancer.feature_names[y_feature]}")
    
    # Show confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    print(f"🔍 Confusion Matrix: TN={cm[0,0]}, FP={cm[0,1]}, FN={cm[1,0]}, TP={cm[1,1]}")

print("🚀 Interactive KNN Classification Demo Ready!")
print("📋 Adjust K value and features to explore KNN behavior")

## 📋 Next Steps: From Notebook to Experiment

This notebook provided **interactive exploration** of KNN behavior. Based on insights gained:

### 🧪 **Next: Create Systematic Experiment**
- **Location**: `experiments/knn_analysis/benchmark_vs_sklearn.py`  
- **Purpose**: Formal performance comparison across datasets
- **Output**: `experiments/knn_analysis/benchmark_results.json`

### 📊 **Finally: Generate Reports**
- **Location**: `reports/knn_performance_analysis.md`
- **Purpose**: Professional documentation of results
- **Content**: Charts, tables, and insights from experiments

### 🎯 **Workflow Summary:**
1. ✅ **Notebook**: Interactive exploration and understanding
2. 🔄 **Experiment**: Systematic benchmarking (next step)  
3. 📊 **Report**: Documentation and visualization (final step)