# Probabilistic k-NN Model (Using Only NumPy)

In this notebook, we will:

1. Load `dataset.csv` (containing `launch_speed, launch_angle, target`).
2. Build a k-NN model from scratch, using only NumPy.
3. Output a probability distribution across 8 possible classes:
   - single
   - double
   - triple
   - home run
   - groundoutable
   - flyoutable
   - lineoutable

In [1]:
import numpy as np
from collections import Counter

# We will assume the following 8 classes:
ALL_CLASSES = [
    'Single',
    'Double',
    'Triple',
    'Home Run',
    'Groundoutable',
    'Flyoutable',
    'Lineoutable']

# Load numeric columns (launch_speed, launch_angle)
# from dataset.csv using NumPy.

data_numeric = np.genfromtxt(
    'dataset.csv',       # CSV filename/path
    delimiter=',',
    skip_header=1,       # skip header row
    usecols=(0, 1),
    dtype=float
)

# Load the target (third column) as strings:
data_labels = np.genfromtxt(
    'dataset.csv',
    delimiter=',',
    skip_header=1,
    usecols=(2),
    dtype=str
)

def knn_predict_probabilities(query_point, features, labels, k=3):
    """
    Given:
      query_point: (launch_speed, launch_angle)
      features:    Nx2 array of numeric data
      labels:      Nx1 array of string labels
      k:           number of neighbors to consider
    Return:
      A dictionary mapping each class in ALL_CLASSES to a probability.
    """
    # Convert query_point to an array for vectorized math
    query_arr = np.array(query_point)

    # 1. Compute Euclidean distances to all points in 'features'
    #    features has shape (N, 2), query_arr is shape (2,)
    diffs = features - query_arr  # shape (N, 2)
    squared_diffs = diffs ** 2    # shape (N, 2)
    dist_array = np.sqrt(np.sum(squared_diffs, axis=1))  # shape (N,)

    # 2. Sort by distance and pick the indices of the k nearest
    sorted_indices = np.argsort(dist_array)
    k_nearest_indices = sorted_indices[:k]

    # 3. Retrieve the labels of the k nearest neighbors
    k_labels = labels[k_nearest_indices]

    # 4. Count occurrences of each label in the k nearest
    counts = Counter(k_labels)

    # 5. Convert counts to probabilities
    probabilities = {}
    for cls in ALL_CLASSES:
        probabilities[cls] = counts[cls] / k

    return probabilities

print("Data and function loaded successfully.")

Data and function loaded successfully.


### Example Usage

We'll query the model with a hypothetical `launch_speed` and `launch_angle`, then print the probability distribution for each of the 8 classes.

In [2]:
# Example query:
query_point = (92.4,-13.0)
k_value = 10000

# Compute probabilities:
probs = knn_predict_probabilities(query_point, data_numeric, data_labels, k=k_value)

print(f"Probabilities for query = {query_point} (k={k_value}):")
for cls in ALL_CLASSES:
    print(f"  {cls:<15}: {probs[cls]:.3f}")


Probabilities for query = (92.4, -13.0) (k=10000):
  Single         : 0.142
  Double         : 0.041
  Triple         : 0.004
  Home Run       : 0.000
  Groundoutable  : 0.814
  Flyoutable     : 0.000
  Lineoutable    : 0.000
