<a href="https://colab.research.google.com/github/kevenbazile/AiBall/blob/main/NBA_Comparison_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Import and Install Libraries code below

In [None]:
# STEP 1: Install dependencies (if not already available)
!pip install pandas scikit-learn

# STEP 2: Import libraries
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import StandardScaler




Load dataset and clean it (mergeddata.csv)

In [None]:
# STEP 3: Load the dataset
file_path = '/content/Merged_Player_Dataset.csv'  # You'll upload this file manually
nba_df = pd.read_csv(file_path)

# STEP 4: Preview it
nba_df.head()


Unnamed: 0.1,Unnamed: 0,seas_id,season,player_id,player,birth_year,pos,age,experience,lg,...,fg_percent_from_x16_3p_range,fg_percent_from_x3p_range,percent_assisted_x2p_fg,percent_assisted_x3p_fg,percent_dunks_of_fga,num_of_dunks,percent_corner_3s_of_3pa,corner_3_point_percent,num_heaves_attempted,num_heaves_made
0,0,31871.0,2025,5025.0,A.J. Green,,SG,25.0,3,NBA,...,0.368,0.423,0.714,0.938,0.0,0.0,0.302,0.513,2.0,1.0
1,1,31872.0,2025,5026.0,A.J. Lawson,,SG,24.0,3,NBA,...,,0.4,1.0,1.0,0.125,1.0,0.2,0.0,0.0,0.0
2,2,31873.0,2025,5210.0,AJ Johnson,,SG,20.0,1,NBA,...,,0.4,0.571,0.75,0.074,1.0,0.2,1.0,1.0,0.0
3,3,31874.0,2025,5210.0,AJ Johnson,,SG,20.0,1,NBA,...,,0.6,0.6,0.667,0.053,0.0,0.2,1.0,0.0,0.0
4,4,31875.0,2025,5210.0,AJ Johnson,,SG,20.0,1,NBA,...,,0.2,0.5,1.0,0.125,1.0,0.2,1.0,1.0,0.0


Prepare set colums for comparisons

In [None]:
# STEP 5: Select relevant numerical stat columns
# Drop columns that aren't useful for comparison (like names or IDs)
drop_cols = ['Unnamed: 0', 'seas_id', 'season', 'player_id', 'player', 'birth_year', 'pos', 'lg']
stat_df = nba_df.drop(columns=drop_cols, errors='ignore')

# STEP 6: Drop non-numeric or all-NaN columns
stat_df = stat_df.select_dtypes(include=[np.number]).dropna(axis=1, how='all')

# STEP 7: Fill missing values with column mean
stat_df.fillna(stat_df.mean(), inplace=True)

# STEP 8: Normalize the dataset
scaler = StandardScaler()
normalized_stats = scaler.fit_transform(stat_df)

# Keep a reference to the players for later
player_names = nba_df['player'].values


Define Matching Function

In [None]:
def find_closest_nba_player(input_stats: dict):
    # Ensure input stats match the columns used in stat_df
    input_vector = []
    for col in stat_df.columns:
        input_vector.append(input_stats.get(col, 0))  # default to 0 if stat not provided

    # Convert and reshape
    input_vector = np.array(input_vector).reshape(1, -1)

    # Normalize using same scaler
    normalized_input = scaler.transform(input_vector)

    # Compute cosine similarity with all players
    similarities = cosine_similarity(normalized_input, normalized_stats)[0]

    # Get the index of the best match
    best_index = np.argmax(similarities)
    best_score = similarities[best_index]
    matched_player = player_names[best_index]

    return {
        "matched_player": matched_player,
        "similarity_score": float(best_score)
    }


Test it with dummy data

In [None]:
# Example input (you'll replace this with actual player stats later)
custom_player_stats = {
    'age': 25,
    'experience': 3,
    'fg_percent_from_x16_3p_range': 0.35,
    'fg_percent_from_x3p_range': 0.40,
    'percent_assisted_x2p_fg': 0.7,
    'percent_assisted_x3p_fg': 0.9,
    'percent_dunks_of_fga': 0.05,
    'num_of_dunks': 5,
    'percent_corner_3s_of_3pa': 0.25,
    'corner_3_point_percent': 0.38,
    'num_heaves_attempted': 1,
    'num_heaves_made': 0
}

find_closest_nba_player(custom_player_stats)




{'matched_player': 'Bruce Bowen', 'similarity_score': 0.6222373089657987}

Checkpoint


**Gnerating Scouting report using deepseek in colabs **

In [None]:
!pip install transformers accelerate bitsandbytes

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch


Collecting bitsandbytes
  Downloading bitsandbytes-0.45.3-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.

**`Huggin face token and downloading transformers for deepseek`**

In [None]:
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="Qwen/Qwen2.5-7B",
    trust_remote_code=True,
    device_map="auto"
)


config.json:   0%|          | 0.00/686 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/27.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/3.95G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.86G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/3.56G [00:00<?, ?B/s]

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.23k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.03M [00:00<?, ?B/s]

Device set to use cuda:0


**Test Scouting Prompt**

In [None]:
prompt = """
You are an NBA scout. Compare the two players below and generate a report with these sections:

1. Overall Summary
2. Strengths and Weaknesses
3. Matchup Breakdown
4. Recommendations

Player A compares to: Bruce Bowen
Strengths: elite corner shooting, lockdown defense
Weaknesses: limited ball-handling, slow in transition

Player B compares to: Josh Hart
Strengths: physical rebounding, hustle plays
Weaknesses: inconsistent shooting, foul trouble

Scouting Report:
"""

# Generate output
response = pipe(prompt, max_new_tokens=500, temperature=0.7, do_sample=True, return_full_text=False)
print(response[0]['generated_text'])

1. Overall Summary
Both Player A and Player B have distinct skill sets that make them valuable assets on the court. Player A excels in corner shooting and defensive prowess, while Player B is known for his physical rebounding and hustle plays. However, Player A faces challenges with ball-handling and speed in transition, whereas Player B struggles with consistency in his shooting and is prone to fouls.

2. Strengths and Weaknesses
Player A's strengths lie in his elite corner shooting and his ability to provide effective lockdown defense. These skills contribute to his overall effectiveness on the defensive end of the court. On the downside, Player A's limited ball-handling and slow movement in transition hinder his ability to contribute offensively in fast-paced situations.

Player B, on the other hand, showcases his physical rebounding and hustle plays as his primary strengths. These skills make him a reliable presence on the defensive glass and in team efforts to secure loose balls. 