In [1]:
import pandas as pd

df = pd.read_csv("analysis_dataframe2.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Player,Tm,Pos,Age,MP,Team Performance,Team Performance (20),Season,Composite,ORtg,DRtg
0,0,Jason Terry,ATL,PG,22,1888,Average,bottom,1999-2000,0.209845,100.0,108.0
1,1,LaPhonso Ellis,ATL,PF,29,1309,Average,bottom,1999-2000,0.189767,103.0,108.0
2,2,Matt Maloney,ATL,PG,29,1403,Average,bottom,2000-2001,0.164912,100.0,107.0
3,3,DerMarr Johnson,ATL,SF,20,1313,Average,bottom,2000-2001,0.096842,88.0,106.0
4,4,Jacque Vaughn,ATL,PG,26,1856,Average,Average,2001-2002,0.199747,110.0,109.0


# Steps to analyze / visualize this data

<ol>
<li>Descriptive stats </li> - 1. Find average age of these Sixth Men 2. Group data and find average PIE to see if there is any noticable difference between the top, average, and medium teams 3. Finally, look for changes in usages through different generations 
<li>Visualizations</li> - 1. Bar charts showing average PIE for sixth men in top, average, and bottom teams. maybe : Heatmap of PIE vs. Team Performance2. A scatter plot showing age on the x-axis and PIE on the y-axis. 3. Violin Plot for PIE by Position 4. Line graphs showing the change in average PIE of sixth men over the seasons 5. Box Plot of Age Groups and PIE
<li>Advanced Analysis</li> 1. Association Rules Mining: Explore the relationship between team success and the characteristics of its sixth man using association rules mining.e.g. "Teams with sixth men aged over 30 and with a PIE above X tend to finish in the top Y%." 2. Cluster Analysis: Perform cluster analysis to segment sixth men into distinct groups based on multiple factors like PIE, age, minutes played, and team performance. This could reveal unique profiles of sixth men, such as high-impact veterans or efficient young talents.
</ol>

By the way, I took two players from each team, so technically they are like sixth and seventh men, but I will refer them to sixth men. 

## Let's start with descriptive Stats

In [2]:
average_age = df.loc[:, 'Age'].mean() #calculating average
print(f"Average Age of Sixth Men: {average_age} | Average age in NBA : 26.18 ")

Average Age of Sixth Men: 26.348951048951047 | Average age in NBA : 26.18 


The age if Sixth men is slightly greater, but no conclusions can be drawn. We will be exploring the age of sixth men more in depth later.

In [3]:
#grouping the different team performances 10 percentile
top_teams = df[df["Team Performance"] == "top"]
mid_teams = df[df["Team Performance"] == "Average"]
bottom_teams = df[df["Team Performance"] == "bottom"] 

#averaging the groups 10 percentile
averageTopPIE =  top_teams.loc[:, 'Composite'].mean() 
averagemidPIE = mid_teams.loc[:, 'Composite'].mean() 
averageBotPIE = bottom_teams.loc[:, 'Composite'].mean() 

#grouping the different team performances 10 percentile
top_teams_20 = df[df["Team Performance (20)"] == "top"]
mid_teams_20 = df[df["Team Performance (20)"] == "Average"]
bottom_teams_20 = df[df["Team Performance (20)"] == "bottom"] 

# Averaging the groups' 20 percentile
averageTopPIE_20 = top_teams_20['Composite'].mean() 
averageMidPIE_20 = mid_teams_20['Composite'].mean() 
averageBotPIE_20 = bottom_teams_20['Composite'].mean() 

print(f"average PIE score of sixth man on top 4 team : {averageTopPIE} | average PIE score of sixth man on average team : {averagemidPIE} | average PIE score of sixth man on bottom 4 team : {averageBotPIE}")
print(f"average PIE score of sixth man on top 6 team : {averageTopPIE_20} | average PIE score of sixth man on average team : {averageMidPIE_20} | average PIE score of sixth man on bottom 6 team : {averageBotPIE_20}")

average PIE score of sixth man on top 4 team : 0.17128029240641215 | average PIE score of sixth man on average team : 0.18472674474154335 | average PIE score of sixth man on bottom 4 team : 0.19705700807143223
average PIE score of sixth man on top 6 team : 0.1756481409099202 | average PIE score of sixth man on average team : 0.18392866707869326 | average PIE score of sixth man on bottom 6 team : 0.19544757538400306


this is very interesting because you'd think it would be the opposite. I think the reason top teams have sixth men with lower pie scores is because these bench players, while still very important, do not need to contribute as much statistically. They can focus on getting the team hyped or bring energy!

please note that these PIE scores might be different from other resources. However, after verification, the scores relative to each other are still accurate. 

Now we will be looking at the usage rate of these "sixth men" through the 2010s, and then through from 2010 - 2023. 

In [5]:
def get_start_year(season): #getting only the first year 
    start_year = int(season.split('-')[0])
    return start_year

df['StartYear'] = df['Season'].apply(get_start_year) 
#sorting the different times
period1_df = df[(df['StartYear'] >= 1999) & (df['StartYear'] <= 2010)]  
period2_df = df[(df['StartYear'] > 2010) & (df['StartYear'] <= 2023)]

def calculate_averages(group): #function for calculating averages 
    averages = {
        'AveragePIE': group['Composite'].mean(),
        'AverageORtg': group['ORtg'].mean(),
        'AverageDRtg': group['DRtg'].mean(), 
        'Minutes Played' : group["MP"].mean(),
        'Age' : group["Age"].mean()
    }
    return pd.Series(averages)

#calc the averages every season
averages_period1 = period1_df.groupby('Season').apply(calculate_averages) 
averages_period2 = period2_df.groupby('Season').apply(calculate_averages)

#calc the averages time period
summary_averages_period1 = averages_period1.mean()
summary_averages_period2 = averages_period2.mean()

print("Averages by Season (1999-2010):\n", averages_period1)
print("\nSummary Averages for 1999-2010:\n", summary_averages_period1)
print("\nAverages by Season (2010-2023):\n", averages_period2)
print("\nSummary Averages for 2010-2023:\n", summary_averages_period2)

Averages by Season (1999-2010):
            AveragePIE  AverageORtg  AverageDRtg  Minutes Played        Age
Season                                                                    
1999-2000    0.184813   106.137931   104.534483     1708.017241  27.172414
2000-2001    0.177429   103.517241   103.586207     1756.500000  27.741379
2001-2002    0.184884   105.034483   104.620690     1750.655172  26.793103
2002-2003    0.179517   103.775862   103.741379     1727.637931  27.551724
2003-2004    0.166162   102.896552   103.844828     1664.086207  27.431034
2004-2005    0.181842   106.433333   106.383333     1608.633333  26.966667
2005-2006    0.175380   106.016667   106.816667     1707.550000  26.016667
2006-2007    0.182174   106.850000   107.100000     1740.233333  25.933333
2007-2008    0.177955   107.350000   108.100000     1781.150000  25.350000
2008-2009    0.174786   108.950000   108.983333     1729.650000  26.316667
2009-2010    0.191340   108.850000   108.033333     1716.016667  25

  averages_period1 = period1_df.groupby('Season').apply(calculate_averages)
  averages_period2 = period2_df.groupby('Season').apply(calculate_averages)


## Conclusions from Averages
<ol>
<li>PIE Scores </li> As we can see, the PIE scores of the players from 2010-2023 were higher than the PIE scores from the earlier generation. This could be to a variety of factors, but my guess is because the sixth men have become more and more specialized, and teams have learned to utilize these players better. 
For example, a player like Kyle Korver is very specialized in shooting threes. Teams will adapt to fit his playstyle, giving him a higher ORtg and PIE score. 
<li>Minutes</li> The minutes actually decreased from the first time period to the second period. This is very interesting because the efficiencies and PIE scores both increased. I think this has shown how basketball has really developed. 
<li>age</li> There was a slight decrease in age, but I think this is just because there is more and more young players in the NBA. 
</ol>

Next we will take a look at the positions of sixth men.

In [6]:
#finding the totals for each position for different timeframes
Positions1999_2023 = df['Pos'].value_counts()
Positions1999_2010 = period1_df["Pos"].value_counts()
Positions2011_2023 = period2_df["Pos"].value_counts()

#Calculate the avg PIE score by position
avg_pie_by_position = df.groupby('Pos')['Composite'].mean()
avg_pie_by_position_1999_2010 = period1_df.groupby('Pos')['Composite'].mean()
avg_pie_by_position_2011_2023 = period2_df.groupby('Pos')['Composite'].mean()


print(f"Position Count 1999-2023 \n {Positions1999_2023}")
print(f"Position Count 1999-2010 \n {Positions1999_2010}")
print(f"Position Count 2010-2023 \n {Positions2011_2023}")

print("Average PIE Score by Position (1999-2023):")
print(avg_pie_by_position)
print("\nAverage PIE Score by Position (1999-2010):")
print(avg_pie_by_position_1999_2010)
print("\nAverage PIE Score by Position (2011-2023):")
print(avg_pie_by_position_2011_2023)

Position Count 1999-2023 
 Pos
SG    392
PF    295
PG    279
SF    276
C     188
Name: count, dtype: int64
Position Count 1999-2010 
 Pos
SG    177
SF    155
PF    148
PG    136
C      94
Name: count, dtype: int64
Position Count 2010-2023 
 Pos
SG    215
PF    147
PG    143
SF    121
C      94
Name: count, dtype: int64
Average PIE Score by Position (1999-2023):
Pos
C     0.198901
PF    0.195844
PG    0.189285
SF    0.169512
SG    0.176432
Name: Composite, dtype: float64

Average PIE Score by Position (1999-2010):
Pos
C     0.171524
PF    0.192803
PG    0.188324
SF    0.171606
SG    0.174113
Name: Composite, dtype: float64

Average PIE Score by Position (2011-2023):
Pos
C     0.226277
PF    0.198905
PG    0.190199
SF    0.166830
SG    0.178342
Name: Composite, dtype: float64


## Positional Impact Analysis (1999-2023)

The observed shift in position counts over time, with shooting guards (SG) becoming increasingly prevalent and small forwards (SF) seeing a decline, likely reflects strategic changes in the NBA favoring versatility and three-point shooting efficiency, roles typically associated with SGs.

Centers have seen a marked increase in their average PIE scores from 1999-2010 to 2011-2023, suggesting their roles have grown in importance despite their lower presence on the court. Power forwards and point guards also show increased PIE scores, indicating their sustained or enhanced impact in games. Conversely, small forwards have experienced a slight decrease in PIE scores, potentially reflecting changes in their traditional roles or an overlap with shooting guards, who have remained relatively stable in their impact. The data suggests that as the NBA has evolved, so too has the contribution of each position, with a clear trend towards valuing efficiency and versatility.


In [7]:
#calculating positions for top mid and bottom teams
topTeamPos_20 = top_teams_20["Pos"].value_counts()
topTeamPos = top_teams["Pos"].value_counts()
midTeamPos= mid_teams["Pos"].value_counts()
midteamPos20 = mid_teams_20["Pos"].value_counts()
botTeamPos = bottom_teams["Pos"].value_counts()
botTeamPos_20 = bottom_teams_20["Pos"].value_counts()

print("Position Counts for Top 4 Teams:")
print(topTeamPos)
print("\nPosition Counts for Top 6 Teams (Team Performance 20):")
print(topTeamPos_20)
print("\nPosition Counts for Average Teams:")
print(midTeamPos)
print("\nPosition Counts for Average Teams (Team Performance 20):")
print(midteamPos20)
print("\nPosition Counts for Bottom 4 Teams:")
print(botTeamPos)
print("\nPosition Counts for Bottom 6 Teams (Team Performance 20):")
print(botTeamPos_20)

Position Counts for Top 4 Teams:
Pos
SG    59
PG    42
PF    39
SF    30
C     18
Name: count, dtype: int64

Position Counts for Top 6 Teams (Team Performance 20):
Pos
SG    87
PG    61
PF    60
SF    48
C     24
Name: count, dtype: int64

Position Counts for Average Teams:
Pos
SG    282
SF    216
PF    212
PG    201
C     145
Name: count, dtype: int64

Position Counts for Average Teams (Team Performance 20):
Pos
SG    234
SF    179
PF    167
PG    165
C     125
Name: count, dtype: int64

Position Counts for Bottom 4 Teams:
Pos
SG    51
PF    44
PG    36
SF    30
C     25
Name: count, dtype: int64

Position Counts for Bottom 6 Teams (Team Performance 20):
Pos
SG    71
PF    68
PG    53
SF    49
C     39
Name: count, dtype: int64


## Team Performance and Usage of Sixth Men at Different Positions
- SG are the most common sixth man position 
- Centers are the least
- One trend we see is that the second most common position top teams utilize as sixth men is point gaurds, while for average and bottom teams, the point gaurd is always the third or fourth most utilized. 

now lets peek the average ages 


In [8]:
avgAge_topTeam_20 = top_teams_20["Age"].mean()
avgAge_topTeam = top_teams["Age"].mean()
avgAge_midTeam = mid_teams["Age"].mean()
avgAge_midTeam_20 = mid_teams_20["Age"].mean()
avgAge_botTeam = bottom_teams["Age"].mean()
avgAge_botTeam_20 = bottom_teams_20["Age"].mean()

print("Average Age for Top 4 Teams:")
print(f"{avgAge_topTeam:.2f} years")
print("\nAverage Age for Top 6 Teams (Team Performance 20):")
print(f"{avgAge_topTeam_20:.2f} years")
print("\nAverage Age for Average Teams:")
print(f"{avgAge_midTeam:.2f} years")
print("\nAverage Age for Average Teams (Team Performance 20):")
print(f"{avgAge_midTeam_20:.2f} years")
print("\nAverage Age for Bottom 4 Teams:")
print(f"{avgAge_botTeam:.2f} years")
print("\nAverage Age for Bottom 6 Teams (Team Performance 20):")
print(f"{avgAge_botTeam_20:.2f} years")

Average Age for Top 4 Teams:
27.63 years

Average Age for Top 6 Teams (Team Performance 20):
27.85 years

Average Age for Average Teams:
26.41 years

Average Age for Average Teams (Team Performance 20):
26.37 years

Average Age for Bottom 4 Teams:
24.73 years

Average Age for Bottom 6 Teams (Team Performance 20):
24.79 years


## Teams Performance and Average Ages

The data indicates a trend where the average age of sixth men increases with the team's performance, suggesting that more successful teams tend to rely on slightly older, and potentially more experienced, bench players.
