# NBA Player Position Classification

**Objective:** Classify NBA players into 5 positions based on their statistical performance

## Project Overview
This project uses machine learning to classify NBA players into their positions (PG, SG, SF, PF, C) based on performance statistics from the 2023 regular season.

## Data Import and Setup

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# Load the dataset
nba = pd.read_csv('nba_stats.csv')
print(f"Dataset shape: {nba.shape}")
print(f"\nColumns: {nba.columns.tolist()}")

## Exploratory Data Analysis

In [None]:
# Display basic information about the dataset
print("Dataset Info:")
print(nba.info())

print("\nPosition distribution:")
print(nba['Pos'].value_counts())

# Visualize position distribution
plt.figure(figsize=(10, 6))
nba['Pos'].value_counts().plot(kind='bar')
plt.title('Distribution of NBA Player Positions')
plt.xlabel('Position')
plt.ylabel('Number of Players')
plt.xticks(rotation=0)
plt.show()

## Feature Selection and Data Preparation

In [None]:
# Define features and target variable
classColumn = 'Pos'
attributes = ['MP','FGA', '3PA', '2PA', 'FTA', 'ORB', 'DRB', 'AST', 'STL', 'BLK', 'TOV', 'PTS']

print(f"Selected features: {attributes}")
print(f"Target variable: {classColumn}")

# Prepare features and target
nbaAttributes = nba[attributes]
nbaClass = nba[classColumn]

# Display feature statistics
print("\nFeature Statistics:")
print(nbaAttributes.describe())

## Model Training and Evaluation

In [None]:
# Split the data: 80% training, 20% validation
train_feature, test_feature, train_class, test_class = train_test_split(
    nbaAttributes, nbaClass, 
    stratify=nbaClass, 
    test_size=0.20, 
    random_state=0
)

print(f"Training set size: {len(train_feature)}")
print(f"Test set size: {len(test_feature)}")

# Train Naive Bayes classifier
nb = GaussianNB().fit(train_feature, train_class)

# Calculate accuracies
train_accuracy = nb.score(train_feature, train_class)
test_accuracy = nb.score(test_feature, test_class)

print(f"\nTraining set accuracy: {train_accuracy:.3f}")
print(f"Validation set accuracy: {test_accuracy:.3f}")

## Model Performance Analysis

In [None]:
# Generate predictions
prediction = nb.predict(test_feature)

# Confusion Matrix
print("Confusion Matrix:")
cm = pd.crosstab(test_class, prediction, rownames=['True'], colnames=['Predicted'], margins=True)
print(cm)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
cm_numeric = confusion_matrix(test_class, prediction)
positions = sorted(nba['Pos'].unique())
sns.heatmap(cm_numeric, annot=True, fmt='d', cmap='Blues', 
            xticklabels=positions, yticklabels=positions)
plt.title('Confusion Matrix - NBA Position Classification')
plt.xlabel('Predicted Position')
plt.ylabel('True Position')
plt.show()

# Classification Report
print("\nClassification Report:")
print(classification_report(test_class, prediction))

## Cross-Validation Analysis

In [None]:
# 10-fold stratified cross-validation
cv_scores = cross_val_score(nb, nbaAttributes, nbaClass, cv=10, scoring='accuracy')

print(f"10-Fold Cross-validation scores: {cv_scores}")
print(f"Average CV accuracy: {cv_scores.mean():.3f} (+/- {cv_scores.std() * 2:.3f})")

# Visualize CV scores
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), cv_scores, 'bo-', linewidth=2, markersize=8)
plt.axhline(y=cv_scores.mean(), color='r', linestyle='--', label=f'Mean: {cv_scores.mean():.3f}')
plt.xlabel('Fold Number')
plt.ylabel('Accuracy Score')
plt.title('10-Fold Cross-Validation Results')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## Test on Dummy Dataset

In [None]:
# Load and test on dummy test set
dummy_test = pd.read_csv('dummy_test.csv')
dummy_features = dummy_test[attributes]
dummy_classes = dummy_test[classColumn]

# Make predictions
dummy_predictions = nb.predict(dummy_features)
dummy_accuracy = nb.score(dummy_features, dummy_classes)

print(f"Dummy test set accuracy: {dummy_accuracy:.3f}")
print("\nDummy test confusion matrix:")
dummy_cm = pd.crosstab(dummy_classes, dummy_predictions, 
                      rownames=['True'], colnames=['Predicted'], margins=True)
print(dummy_cm)

## Key Findings and Insights

### Model Performance:
- The Gaussian Naive Bayes classifier shows good performance in distinguishing between NBA player positions
- Cross-validation provides robust estimates of model performance
- Different positions show distinct statistical patterns that the model can learn

### Basketball Insights:
- Guards (PG, SG) typically have higher assist and steal rates
- Centers (C) and Power Forwards (PF) dominate in rebounding and blocks
- Statistical profiles effectively capture positional roles in basketball

### Technical Implementation:
- Feature selection based on basketball domain knowledge
- Proper train-test splitting with stratification
- Comprehensive model evaluation with multiple metrics