 <h1>Building an Artificial Neural Network (ANN) for Selecting the Optimal NBA Team</h1>
 <h2>Nathan Dilla & John Haviland</h2>

 
<h3>Problem Statement</h3>
    The problem is to identify the optimal team of 5 players from a pool of 603 NBA players from the 2021-22 NBA season. The criteria chosen to build this optimal team will be based on statistics given in the dataset, picking specific player characteristics based on those stats to try and build a balanced team.



<h3>Data Preparation</h3>

When the dataset of NBA players is loaded in, we selected the input features for our ANN. The selected features include:

    - player height (player_height)
    - games played (gp)
    - points per game (pts)
    - rebounds per game (reb)
    - assists per game (ast)
    - offensive rebounding percentage (oreb_pct)
    - defensive rebounding percentage (dreb_pct)
    - usage percentage (usg_pct)
    - true shooting percentage (ts_pct)
    - assist percentage (ast_pct)

With the specific criteria being:

    - At least 70 games played in the 2021-22 season
        - Generally, games played can help quantify how healthy a player is throughout the season.
        - The last few NBA Champions (2022-2023 Nuggets, 2021-2022 Warriors, 2020-2021 Bucks) were teams where the key players were able to stay healthy and play most of their games.

    - A player is taller than 196 centimeters (~6'5")
        - The dataset did not include the player's positions, so in order to make a balanced team without set positions, we need everyone on the team to be a balanced size as well. This is so there is less of a chance for the team to get outsized in matchups.

    - A player averages more than 15 points per game
        - Every player on an optimal NBA team should be a threat to score

    - A player has a true shooting percentage over 56%
        - League average true shooting percentage in 2021-2022: 57.2%
        - On top of every player on the team needing to be a threat to score, all the players need to be able to shoot efficiently as well. A player who averages a lot of points per game but has a 




<h3>Algorithm of the Solution</h3>

    1. Load in NBA player dataset, necessary libraries.
    2. Define the selection criteria for the optimal team based on player statistics/characteristics
    3. Extract the input features, target variable from the dataset.
    4. Split the data into training and testing sets.
    5. Build ANN with the following architecture:
        - Input Layer: number of neurons is determinded by the number of input features.
        - 2 Hidden Layers: first input layer has 128 neurons and second input layer has 64         neurons. The activation functions defined for both layers are ReLU.
        - Output Layer: one output neuron produced with a sigmoid activation function for binary   classification (1 for selected to team, 0 for not selected).
    6. Compile ANN with binary cross-entropy loss and accuracy as the metric.
    7. Train ANN on the training data for 500 epochs.
    8. Evaulute model's performance on the test data.
    9. Use the trained model to predict the probability of each player being selected to the team.
    10. Apply threshold for every player.
    11. Select the 5 players with the highest predicted probabilities as the optimal team.

    

            

<h3>Code Implementation</h3>

In [1]:
# Import libraries
import numpy as np
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Embedding, Flatten
import pandas as pd

# Load NBA player dataset
# ADD PATH MANUALLY AFTER DOWNLOADING ACCOMPANYING FILE
nba_players_dataset = pd.read_json(r"C:\Users\jhavi\Downloads\nba-players_21-22.json")

input_features = [
    "player_height", "player_weight", "gp", "pts", "reb", "ast", "oreb_pct", "dreb_pct", "usg_pct", "ts_pct", "ast_pct"
]


# Define selection criteria
criteria = (
    (nba_players_dataset['gp'] <= 70) &
    (nba_players_dataset['player_height'] <= 196) &
    (nba_players_dataset['pts'] <= 15) &
    (nba_players_dataset['reb'] <= 4) &
    (nba_players_dataset['ast_pct'] <= 0.1) &
    (nba_players_dataset['ts_pct'] <= 0.56)
)

# Create the 'selected_for_team' column based on the criteria
nba_players_dataset['selected_for_team'] = criteria.astype(int)


# Extract the input features from the dataset
x = nba_players_dataset[input_features].values


# Target variable
y = nba_players_dataset['selected_for_team'].values


# Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)


# Define the MLP model
model = Sequential()
model.add(Dense(128, activation='relu', input_dim=x_train.shape[1]))
model.add(Dense(64, activation='relu')) 
model.add(Dense(1, activation='sigmoid'))  


# Compile, train, and evaluate model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

history = model.fit(x_train, y_train, epochs=500, batch_size=64, validation_split=0.2)

loss, accuracy = model.evaluate(x_test, y_test)
print(f'Test loss: {loss:.4f}')
print(f'Test accuracy: {accuracy:.4f}')

predictions = model.predict(x_test)
threshold = 0.5
predicted_labels = (predictions > threshold).astype(int)


# Select the top 5 players with the highest predicted probability
top_players_indices = predicted_labels[:, 0].argsort()[-5:][::-1]
top_players = nba_players_dataset.iloc[top_players_indices]
print("Optimal Team:")
print(top_players)


# Check the number of players meeting each criterion (TESTING)
print("Number of players meeting each criterion:")
print("Criterion 1 (gp >= 75):", sum(nba_players_dataset['gp'] >= 75))
print("Criterion 2 (player_height >= 200):", sum(nba_players_dataset['player_height'] >= 200))
print("Criterion 3 (pts >= 16):", sum(nba_players_dataset['pts'] >= 16))
print("Criterion 4 (reb >= 6):", sum(nba_players_dataset['reb'] >= 6))
print("Criterion 5 (ast_pct >= 0.2):", sum(nba_players_dataset['ast_pct'] >= 0.2))
print("Criterion 7 (ts_pct >= 0.58):", sum(nba_players_dataset['ts_pct'] >= 0.58))


<h3>Analysis of Findings</h3>




<h3>References</h3>