In [None]:
# Importing the necessary libraries

# Pandas: Data manipulation and analysis library
import pandas as pd

# Matplotlib: Plotting library for creating static, animated, and interactive visualizations
import matplotlib.pyplot as plt

# Seaborn: Data visualization library based on Matplotlib, provides a high-level interface for drawing attractive and informative statistical graphics
import seaborn as sns

# IPython.display: Provides tools for displaying rich media representations in the IPython notebook
from IPython.display import display, Image

# scipy.stats: Part of SciPy library, contains statistical functions, including ANOVA (f_oneway)
from scipy.stats import f_oneway

# statsmodels.stats.multicomp: Part of Statsmodels library, provides tools for multiple comparison tests, including Tukey's HSD
from statsmodels.stats.multicomp import pairwise_tukeyhsd

In [None]:
nba_df = pd.read_csv('./all_seasons.csv')

In [None]:
# Confirming column names for consistency and adherence to a standard format.
nba_df.columns

In [None]:
# Remove the 'Unnamed: 0' column, which is an unnecessary index column when the dataset is read.
nba_df.drop(columns='Unnamed: 0', inplace=True)

### Column Name Updates

In the current dataset, certain column names consist of abbreviations that might pose challenges, especially for those unfamiliar with the context. To enhance readability and comprehension, I intend to rename specific columns:

- `pts`: Renamed to `points` (Average number of points scored)
- `reb`: Renamed to `rebounds` (Average number of rebounds grabbed)
- `ast`: Renamed to `assists` (Average number of assists distributed)
- `oreb_pct`: Renamed to `offensive_rebounds_pct` (Percentage of available offensive rebounds grabbed)
- `dreb_pct`: Renamed to `defensive_rebounds_pct` (Percentage of available defensive rebounds grabbed)
- `usg_pct`: Renamed to `usage_pct` (Percentage of team plays used by the player)
- `ts_pct`: Renamed to `true_shooting_pct` (Measure of the player's shooting efficiency)
- `ast_pct`: Renamed to `assists_pct` (Percentage of teammate field goals the player assisted in)

This adjustment aims to create more descriptive and accessible columnwiderfor a broance.


In [None]:
#renaming columns for better understanding which abbreviation stands for which statistic
nba_df.rename(columns={'gp':'games_played', 'pts':'points', 'reb':'rebounds', 'ast':'assists', 
                       'oreb_pct':'offensive_rebounds_pct', 'dreb_pct':'deffensive_rebounds_pct', 'usg_pct':'usage_percentage',
                      'ts_pct':'true_shooting_pct', 'ast_pct':'assists_pct'}, inplace=True)

In [None]:
#checking the head of the dataframe to see if the changes are applied
nba_df.head()

In [None]:
#inspecting the shape of the dataframe
print(nba_df.shape)
print(nba_df.info()) #to get an overview of the datatypes and non-null values

### Initial Data Inspection

After reviewing the results of executing `df.info()`, it appears that the data types match the actual data stored. Notably, columns such as `draft_year`, `draft_round`, and `draft_number` are of type object, which is acceptable given that some players may be undrafted. Moreover, there are no null values present. In summary, the dataset is considered satisfactory.

### Draft Analysis

For the purpose of draft analysis, I plan to convert the "Undrafted" entries to 0. This conversion will facilitate numerical analysis and provide a clearer representation of players who were not drafted. The following steps will be taken:

1. Convert "Undrafted" entries in the `draft_round` column to 0.
2. Convert "Undrafted" entries in the `draft_number` column to 0.

This adjustment aims to enhance the dataset for draft-related numerical analysis.
alysis.

In [None]:
# Replace 'Undrafted' with 0 and convert to integer dtype for 'draft_round' and 'draft_number'
nba_df['draft_round'] = nba_df['draft_round'].replace('Undrafted', 0).astype(int)
nba_df['draft_number'] = nba_df['draft_number'].replace('Undrafted', 0).astype(int)

### Draft Analysis:

1. What is the distribution of drafted and undrafted players in the dataset? </br>
2. Do players who were drafted in earlier rounds tend to have higher average points or better performance?

In [None]:

plt.figure(figsize=(10, 6))
sns.countplot(x='draft_round', data=nba_df, palette='viridis')
plt.title('Distribution of Players Across Draft Rounds')
plt.xlabel('Draft Round')
plt.ylabel('Number of Players')
plt.show()


### Draft Analysis Summary

After a quick glance at the chart, we can observe the following distribution of players across draft rounds:

- Approximately 7000 players are drafted in the first round.
- Around 3000 players are drafted in the second round.
- Approximately 2500 players are marked as "Undrafted."
- The number of players drafted in rounds 3 to 8 is not that many, as it would not even reach the hundreds.

In [None]:
#Grouping by 'draft_round' and then computing for the average 'points', 'rebounds' and 'assists'
nba_groupedby_draft = nba_df.groupby('draft_round')[['points', 'rebounds', 'assists']].mean().reset_index()

print(nba_groupedby_draft)

- Players drafted in the first round have higher average points, rebounds, and assists compared to other rounds.
- Players drafted in the second round also show good performance, though typically not as high as those in the first round.
- Undrafted players generally have lower averages in points, rebounds, and assists compared to drafted players.
- Caution should be taken when interpreting the performance of players drafted in less common rounds (e.g., 3rd, 7th) due to the smaller number of observations.


In [None]:
# Function to save and display the plot
def save_and_display_plot(figure, filename):
    figure.savefig(filename)
    display(Image(filename))

# Bar plot: Average points per draft round
plt.figure(figsize=(12, 8))
sns.barplot(x='draft_round', y='points', data=nba_groupedby_draft, errorbar=None)
plt.title('Average Points by Draft Round')
plt.xlabel('Draft Round')
plt.ylabel('Average Points')
save_and_display_plot(plt, 'average_points.png')

# Bar plot: Average rebounds per draft round
plt.figure(figsize=(12, 8))
sns.barplot(x='draft_round', y='rebounds', data=nba_groupedby_draft, errorbar=None)
plt.title('Average Rebounds by Draft Round')
plt.xlabel('Draft Round')
plt.ylabel('Average Rebounds')
save_and_display_plot(plt, 'average_rebounds.png')

# Bar plot: Average assists per draft round
plt.figure(figsize=(12, 8))
sns.barplot(x='draft_round', y='assists', data=nba_groupedby_draft, errorbar=None)
plt.title('Average Assists by Draft Round')
plt.xlabel('Draft Round')
plt.ylabel('Average Assists')
save_and_display_plot(plt, 'average_assists.png')

## Average Performance by Draft Round

Visualizing the average performance metrics (points, rebounds, and assists) of players across different draft rounds provides insights into their relative performance.

### Average Points

![Average Points by Draft Round](average_points.png)

The bar plot illustrates the average points scored by players in each draft round. Players drafted in the first round exhibit higher average points compared to other rounds.

### Average Rebounds

![Average Rebounds by Draft Round](average_rebounds.png)

The second bar plot showcases the average rebounds for players in each draft round. While players in the first round lead in rebounds, second-round draftees also display notable performance.

### Average Assists

![Average Assists by Draft Round](average_assists.png)

The final bar plot demonstrates the average assists for players in each draft round. First-round and second-round draftees tend to have higher average assists compared to other rounds.



In [None]:
# Perform ANOVA for each metric
anova_points = f_oneway(*[group['points'] for name, group in nba_df.groupby('draft_round')])
anova_rebounds = f_oneway(*[group['rebounds'] for name, group in nba_df.groupby('draft_round')])
anova_assists = f_oneway(*[group['assists'] for name, group in nba_df.groupby('draft_round')])

# Display the results
print("ANOVA Results for Average Points:")
print("F-statistic:", anova_points.statistic)
print("P-value:", anova_points.pvalue)

print("\nANOVA Results for Average Rebounds:")
print("F-statistic:", anova_rebounds.statistic)
print("P-value:", anova_rebounds.pvalue)

print("\nANOVA Results for Average Assists:")
print("F-statistic:", anova_assists.statistic)
print("P-value:", anova_assists.pvalue)


## ANOVA Results for Average Performance Metrics by Draft Round

### Average Points
- F-statistic: 318.51
- P-value: 0.00

### Average Rebounds
- F-statistic: 238.46
- P-value: 0.00

### Average Assists
- F-statistic: 109.86
- P-value: 4.95e-157

The ANOVA results indicate statistically significant differences in average points, rebounds, and assists among different draft rounds. The low p-values suggest that at least one group mean is different from the others.

Given the significance in ANOVA, it's advisable to perform post-hoc tests to identify which specific draft rounds exhibit significant differences in performance metrics.



In [None]:
# Tukey's HSD for Points
tukey_points = pairwise_tukeyhsd(nba_df['points'], nba_df['draft_round'])
print("\nTukey's HSD Results for Average Points:")
print(tukey_points.summary())

# Tukey's HSD for Rebounds
tukey_rebounds = pairwise_tukeyhsd(nba_df['rebounds'], nba_df['draft_round'])
print("Tukey's HSD Results for Average Rebounds:")
print(tukey_rebounds.summary())

# Tukey's HSD for Assists
tukey_assists = pairwise_tukeyhsd(nba_df['assists'], nba_df['draft_round'])
print("\nTukey's HSD Results for Average Assists:")
print(tukey_assists.summary())

## Tukey's HSD Results Interpretation</br>

***
### Average Points:

- **Significant Differences:**
  - Players drafted in the first round have significantly higher average points than those in the second, third, fourth, sixth, and eighth rounds, as well as undrafted players.</br>
  - Significant differences also exist between players drafted in the second round and those in the third, fourth, sixth, and eighth rounds.</br>
  - Notably, undrafted players show no significant differences with players in the third, fourth, sixth, seventh, and eighth rounds.</br>

- **No Significant Differences:**
  - There are no significant differences in average points between players drafted in the third, fourth, sixth, seventh, and eighth rounds.</br>
***
### Average Rebounds:

- **Significant Differences:**
  - Players drafted in the first round have significantly higher average rebounds than those in the second, sixth, and eighth rounds, as well as undrafted players.</br>
  - Significant differences also exist between players drafted in the second round and those in the third and seventh rounds.</br>

- **No Significant Differences:**
  - There are no significant differences in average rebounds between players drafted in the third, fourth, and sixth rounds, as well as between players drafted in the seventh and eighth rounds.</br>
***
### Average Assists:

- **Significant Differences:**
  - Players drafted in the first round have significantly higher average assists than those in the second, fourth, sixth, and eighth rounds, as well as undrafted players.</br>
  - Significant differences also exist between players drafted in the second round and those in the fourth and eighth rounds.</br>

- **No Significant Differences:**
  - There are no significant differences in average assists between players drafted in the third, fourth, sixth , seventhce metrics across these rounds.
 metrics across these rounds.
h rounds

## Key Findings and Insights:

***
- Players drafted in the _**first round**_ consistently outperform others across various performance metrics.</br>
- _**Second-round**_ draftees also demonstrate strong performance, although generally not reaching the levels of _**first-round**_ picks.</br>
- _**Undrafted players**_, on average, lag behind drafted players in _**points**_, _**rebounds**_, and _**assists**_.</br>
- The analysis provides valuable insights for assessing and comparing player performance based on their draft round.