# NBA Player Data Analysis Project

## Project Outline

1. **Introduction**
   - Overview of the project
   - Goals and objectives

2. **Data Collection**
   - Import necessary libraries
   - Fetch player data from the NBA API

3. **Data Processing**
   - Load data into a Pandas DataFrame
   - Data cleaning and transformation

4. **Data Analysis**
   - Exploratory data analysis (EDA)
   - Visualizations

5. **Conclusion**
   - Summary of findings
   - Future work


## 0. Environment

### Virtual Environment

It's generally recommended that you use a virtual environment (or venv) for this project. That way, all dependencies can be installed for the project without affecting the rest of your system. You can create a venv with Python:

`python -m venv .venv`

To activate the virtual environment in your shell, you can use the following commands.

On Windows:

`.venv\Scripts\activate`

On other operating systems:

`.venv/bin/activate`

### Dependencies

This project uses [Poetry](https://python-poetry.org/) to manage its dependencies. You can install the dependencies with the `poetry` command:

`poetry install`

If you don't want to use Poetry, a `requirements.txt` is also provided. You can install this using `pip`:

`pip install -r requirements.txt`

### Imports

In [None]:
import requests
import json
import time
import os
from dotenv import load_dotenv

### Environment Variables

This project uses the [BALLDONTLIE](https://app.balldontlie.io/) API, which has a free key available [if you sign up for an account](https://app.balldontlie.io/signup). We will load the API key from the .env file if available.

In [None]:
load_dotenv()
BALLDONTLIE_API_KEY = os.getenv("BALLDONTLIE_API_KEY")

### Notebook (Developer Only)

This notebook uses `nbstripout` to strip notebook output from Git commits. If you are committing code, please run the following command to set up the Git filter.

In [None]:
!nbstripout --install

## 1. Data Collection

### Fetch Player Data from NBA API

Before we analyze the statistics for each players, we need to get a list of all players that had minutes in the 2023 season.

In [None]:
BASE_PLAYERS_URL = "https://api.balldontlie.io/v1/players"
HEADERS = {
    "Authorization": f"{BALLDONTLIE_API_KEY}"
}
OUTPUT_FILE = "../data/all_players.json"

# Check if the JSON file already exists
if os.path.exists(OUTPUT_FILE):
    print(f"{OUTPUT_FILE} already exists.")
    with open(OUTPUT_FILE, "r") as f:
        all_players = json.load(f)
else:
    all_players = []
    next_cursor = None

    while True:
        if next_cursor:
            url = f"{BASE_PLAYERS_URL}?cursor={next_cursor}"
        else:
            url = BASE_PLAYERS_URL
        
        response = requests.get(url, headers=HEADERS)
        
        if response.status_code == 200:
            data = response.json()
            all_players.extend(data["data"])
            next_cursor = data["meta"].get("next_cursor")
            
            if not next_cursor:
                break
            
            time.sleep(2)  # Sleep for 2 seconds before the next request
        else:
            print(f"Request failed with status code {response.status_code}")
            break

    # Save all players data to a JSON file
    with open(OUTPUT_FILE, 'w') as file:
        json.dump(all_players, file, indent=4)

    print(f"All player data has been saved to {OUTPUT_FILE}")

print(all_players[0])
len(all_players)


This JSON file contains a list of all players in the NBA. We're only concerned with the players from the current season, so we need to eliminate all the players who aren't. One quick heuristic we can use is draft year. Obviously, a player who was drafted in 1986 will not be playing now. We want to pick a cutoff year as close as possible to the currrent year to eliminate as many players as we can. The easiest way to do this is check the oldest players still playing in the NBA, mark them as exceptions, and use their draft year as a starting point.
A list of the oldest players still in the NBA can be found at [this Wikipedia page](https://en.wikipedia.org/wiki/List_of_oldest_and_youngest_NBA_players#Active). I chose 2010 as the cutoff.

In [None]:
OLD_PLAYERS = ["LeBron James", "Chris Paul", "Kyle Lowry", "PJ Tucker", "Kevin Durant", "Al Horford", "Mike Conley", "Jeff Green", "Derrick Rose", "Russell Westbrook", "Kevin Love", "Eric Gordon", "Brook Lopez", "Nicolas Batum", "DeAndre Jordan", "James Harden", "Stephen Curry", "DeMar DeRozan", "Jrue Holiday", "Taj Gibson", "Paul George"]
all_players_after_2010 = [player for player in all_players if player["draft_year"] == None or player["draft_year"] > 2010 or f"{player["first_name"]} {player["last_name"]}" in OLD_PLAYERS]
len(all_players_after_2010)

You'll notice that I included players that have `null` for their draft year. That's because those players are undrafted. There are some undrafted players currently in the NBA, so we can't exclude them purely based on that fact. The website [2KRatings](https://www.2kratings.com/) maintains [a list](https://www.2kratings.com/lists/undrafted-nba-players) of all active undrafted players. This includes players in the G League, however. I decided to take the top 35 players as, after that point, the players play so few minutes that their stats will have a negligible impact on analysis.

In [None]:
UNDRAFTED_PLAYERS = ["Fred VanVleet", "Austin Reaves", "Naz Reid", "T.J. McConnell", "Luguentz Dort", "Alex Caruso", "Derrick Jones Jr.", "Duncan Robinson", "Simone Fontecchio", "Gary Payton II", "Max Strus", "Luke Kornet", "Jock Landale", "Christian Wood", "Caleb Martin", "Chris Boucher", "Dorian Finney-Smith", "Robert Covington", "Jose Alvarado", "Javonte Green", "Sam Hauser", "Keon Ellis", "Duop Reath", "Royce O'Neale", "Naji Marshall", "Scotty Pippen Jr.", "Haywood Highsmith", "Drew Eubanks", "Gabe Vincent", "Daniel Theis", "Maxi Kleber", "Jordan McLaughlin", "Jordan Goodwin", "Damion Lee", "Lamar Stevens"]
all_players_after_2010_without_undrafted = [player for player in all_players_after_2010 if player["draft_year"] != None or f"{player["first_name"]} {player["last_name"]}" in UNDRAFTED_PLAYERS]
len(all_players_after_2010_without_undrafted)

### Get Stats for each Player

Now that we've gotten our list of players, we can get the stats for each of them.