# Runtime Comparisons: CPU vs. GPU

This notebook provides a more in-depth performance comparison between scikit-learn (CPU), the standard `cuml` (GPU Brute-Force), and the accelerated `cuml` with `ivfflat` (GPU Indexed).

We will benchmark the `NearestNeighbors` algorithm across a range of dataset sizes (`n_samples`) and visualize the results to understand how each implementation scales.

In [None]:
# %%
import cudf
import cupy as cp
import numpy as np
import pandas as pd
import time
import matplotlib.pyplot as plt

# Import the models
from sklearn.neighbors import NearestNeighbors as skNearestNeighbors
from cuml.neighbors import NearestNeighbors as cumlNearestNeighbors
from cuml.datasets import make_blobs

# Plotting style
plt.style.use('dark_background')

In [None]:
# %%
def benchmark_model(model_type, n_samples, n_features, n_neighbors):
    """
    Generates data and times a NearestNeighbors model.
    
    Parameters:
    - model_type (str): 'sklearn', 'cuml_brute', or 'cuml_ivfflat'.
    - n_samples (int): Number of samples to generate.
    - n_features (int): Number of features per sample.
    - n_neighbors (int): Number of neighbors to find.
    
    Returns:
    - float: Execution time in seconds.
    """
    # 1. Generate Data
    X_gpu, _ = make_blobs(n_samples=n_samples, 
                          n_features=n_features, 
                          random_state=42)
    
    # 2. Select Model and Data
    if model_type == 'sklearn':
        model = skNearestNeighbors(n_neighbors=n_neighbors)
        X_data = X_gpu.get() # Transfer to CPU
    elif model_type == 'cuml_brute':
        model = cumlNearestNeighbors(n_neighbors=n_neighbors, algorithm='brute')
        X_data = X_gpu
    elif model_type == 'cuml_ivfflat':
        model = cumlNearestNeighbors(n_neighbors=n_neighbors, algorithm='ivfflat')
        X_data = X_gpu
    else:
        raise ValueError("Unknown model type")

    # 3. Time the Execution
    start_time = time.time()
    
    model.fit(X_data)
    distances, indices = model.kneighbors(X_data)
    
    # Synchronize only for GPU models
    if 'cuml' in model_type:
        cp.cuda.runtime.deviceSynchronize()
        
    end_time = time.time()
    
    return end_time - start_time

In [None]:
# %%
SAMPLES_LIST = [10_000, 50_000, 100_000, 250_000, 500_000]
N_FEATURES = 50
N_NEIGHBORS = 10

# List of models to test
MODEL_TYPES = ['sklearn', 'cuml_brute', 'cuml_ivfflat']

# Dictionary to store the results
results = {}

# Main benchmark loop
print("Starting benchmarks...")
for n_samples in SAMPLES_LIST:
    print(f"\nTesting with n_samples = {n_samples:,}...")
    # Dictionary to store the results of this iteration
    times = {}
    for model_type in MODEL_TYPES:
        # Run the benchmark and store the time
        exec_time = benchmark_model(model_type, n_samples, N_FEATURES, N_NEIGHBORS)
        times[model_type] = exec_time
        print(f"  - {model_type}: {exec_time:.4f} seconds")
    results[n_samples] = times

# Convert the results dictionary to a Pandas DataFrame for visualization
results_df = pd.DataFrame.from_dict(results, orient='index')

print("\n--- Results Table (in seconds) ---")
display(results_df)

In [None]:
# %%
# Create the figure and axes for the plot
fig, ax = plt.subplots(figsize=(12, 8))

# Plot the results for each model
ax.plot(results_df.index, results_df['sklearn'], marker='o', linestyle='--', label='Scikit-learn (CPU)')
ax.plot(results_df.index, results_df['cuml_brute'], marker='s', linestyle='-', label='cuML Brute (GPU)')
ax.plot(results_df.index, results_df['cuml_ivfflat'], marker='^', linestyle='-', label='cuML IVF-Flat (GPU)')

# Configure the labels and title
ax.set_xlabel("Number of Samples (n_samples)")
ax.set_ylabel("Execution Time (seconds) - Logarithmic Scale")
ax.set_title("Performance Comparison: CPU vs GPU (NearestNeighbors)")

# Use a logarithmic scale for the Y-axis
ax.set_yscale('log')

# Add legend and grid
ax.legend()
ax.grid(True, which="both", linestyle='--', linewidth=0.5)

# Show the plot
plt.show()

## Conclusion

As the graph illustrates, the performance advantage of cuML's GPU-accelerated `NearestNeighbors` grows significantly as the dataset size increases.

- **GPU vs CPU**: Both cuML implementations provide a speedup of one to two orders of magnitude compared to the CPU-based scikit-learn.
- **Brute vs IVF-Flat**: For smaller datasets, the brute-force algorithm is competitive due to the overhead of building an index for IVF-Flat. However, as `n_samples` grows, the indexed `ivfflat` algorithm becomes significantly more efficient than the brute-force approach.