<a href="https://colab.research.google.com/github/somas1/CT/blob/main/NBA_PANDAS/NBA_Regular_Season_2018_19_Data_Challenge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Nba Regular Season 2018-19 Data Challenge

Your task will be to take the dataset given, and create an analysis answering the following 10 questions. This project will again test your knowledge of pandas in order to find the answers needed given the data you are presented with.

In [1]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [3]:
nba = pd.read_csv('nbastats2018-2019.csv')
nba.head()

Unnamed: 0,Name,Height,Weight,Team,Age,Salary,Points,Blocks,Steals,Assists,...,MP,G,PER,OWS,DWS,WS,WS48,USG,BPM,VORP
0,Alex Abrines,78,200,Oklahoma City Thunder,25,5455236,5.3,0.2,0.5,0.6,...,19.0,31,6.3,0.1,0.6,0.6,0.053,12.2,-3.4,-0.2
1,Quincy Acy,79,240,Phoenix Suns,28,213949,1.7,0.4,0.1,0.8,...,12.3,10,2.9,-0.1,0.0,-0.1,-0.022,9.2,-5.9,-0.1
2,Jaylen Adams,74,190,Atlanta Hawks,22,236854,3.2,0.1,0.4,1.9,...,12.6,34,7.6,-0.1,0.2,0.1,0.011,13.5,-4.4,-0.3
3,Steven Adams,84,265,Oklahoma City Thunder,25,24157304,13.9,1.0,1.5,1.6,...,33.4,80,18.5,5.1,4.0,9.1,0.163,16.4,2.7,3.2
4,Bam Adebayo,82,255,Miami Heat,21,2955840,8.9,0.8,0.9,2.2,...,23.3,82,17.9,3.4,3.4,6.8,0.171,15.8,3.0,2.4


# What was the average age of player in the league?

In [4]:
nba['Age'].mean().round(2)

25.9

# What player scored the most points?

In [5]:
nba.sort_values('Points', ascending=False)[['Name', 'Points']].head(1)

Unnamed: 0,Name,Points
202,James Harden,36.1


James Harden had a higher point value than any other player in the dataset at 36.1. This suggests the values in the dataset are averages for the season, not totals.

Since the values presented are most likely averages, I will avoid applying functions such as .mean() to the dataset to average values further.

# What player had the most blocks during the season? Was it a post player (F/C)?

In [6]:
most_blocks = nba.sort_values(by='Blocks', ascending=False)[['Name', 'Height', 'Points', 'Blocks', 'Rebounds', 'Assists', 'FGA']].head(1)
most_blocks

Unnamed: 0,Name,Height,Points,Blocks,Rebounds,Assists,FGA
476,Myles Turner,83,13.3,2.7,7.2,1.6,10.5


In [14]:
nba['FGA'].median()

6.0

Myles Turner had the most blocks during the 2018-2019 season. He is 83 inches or 6'9 inches tall which suggests he might play center. Center players are usually post players.

A post player plays with their back to the basket, which gives them the ability to block more shots than they can typically make.

Myles has averaged 7.6 rebounds and 1.2 assists a game. He's attempted 10.2 non-free throw shots, which is a little over the median value of field shots attempted. All of this suggests he's a post player.

# Based on the regular season, who had the best chance to win a title given their win percentage?

With the information available in the dataset, we will have to use the WS value or Win Shares to estimate how many wins a team had for the 2018-1019 season.

In [15]:
win_shares_by_team = nba.groupby('Team').sum('numeric_only')[['WS']]
win_shares_by_team

Unnamed: 0_level_0,WS
Team,Unnamed: 1_level_1
Atlanta Hawks,28.0
Boston Celtics,52.0
Brooklyn Nets,45.4
Charlotte Hornets,38.8
Chicago Bulls,20.9
Cleveland Cavaliers,16.2
Dallas Mavericks,33.7
Denver Nuggets,50.5
Detroit Pistons,38.9
Golden State Warriors,55.9


In [158]:
win_shares_by_team.idxmax(), win_shares_by_team.max()

(WS    Milwaukee Bucks
 dtype: object,
 WS    63.0
 dtype: float64)

According to the WS values of the teams in our dataset, the Milwaukee Bucks had the best win percentage assuming all teams played 82 games.

# What player had the best 3-pt percentage?

In [160]:
FG3_percentage = nba.sort_values('FG3%', ascending=False)
FG3_percentage[['Name','FG3%','FG3A']].head(5)

Unnamed: 0,Name,FG3%,FG3A
312,Scott Machado,1.0,0.3
439,Jordan Sibert,1.0,1.0
147,Trevon Duval,1.0,0.3
352,Eric Moreland,1.0,0.2
95,Troy Caupain,0.667,0.8


Scott Machado, Jordan Sibert, Trevon Duval and Eric Moreland all have a 100% 3 point attempt percentage. Jordan Sibert has attempted on average one 3 point attempt per game as well.

# Who played the most minutes during the season

In [19]:
nba['Most Minutes'] = nba['MP']*nba['G']
nba.sort_values(by='Most Minutes', ascending=False)[['Name', 'G', 'MP', 'Most Minutes']].head(1)

Unnamed: 0,Name,G,MP,Most Minutes
37,Bradley Beal,82,36.9,3025.8


Bradley Beal played in all 82 games and averaged 36.9 minutes per game.

# What player given their player effiecency rating was the clutchest during the season?

In [63]:
# The commented code below works but gives me a FutureWarning about using Name, a non-numeric value, with .max()
#nba.groupby('PER').max().sort_values(by='PER', ascending=False)[['Name','G', 'Most Minutes']].head(10)
grouped = nba.groupby('PER')[['G','Most Minutes']].max()
result = pd.merge(grouped,nba[['PER','Name']], on='PER')
result.set_index('Name', inplace=True)
result.sort_values(by='PER',ascending=False).head(10)

Unnamed: 0_level_0,PER,G,Most Minutes
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Zhou Qi,80.4,1,1.0
Trevon Duval,38.3,3,6.0
Gary Payton II,36.9,3,15.9
Alan Williams,32.9,5,26.0
Giannis Antetokounmpo,30.9,72,2361.6
Troy Caupain,30.7,4,16.0
James Harden,30.6,78,2870.4
Anthony Davis,30.3,56,1848.0
Jordan Sibert,29.7,1,4.0
Karl-Anthony Towns,26.3,80,2548.7


Zhou Qi had the highest 'PER' score during the season but he only played for one game and only for a minute.

Giannis Anetetokounmpo and James Harden both played over 70 games and maintained a PER score over 30. They are the most consistent performers during the season.

# What team had the youngest roster?

In [32]:
youngest = nba.groupby('Team').mean()['Age'].sort_values().head()
youngest

  youngest = nba.groupby('Team').mean()['Age'].sort_values().head()


Team
Chicago Bulls       24.312500
Sacramento Kings    24.400000
Orlando Magic       24.533333
New York Knicks     24.625000
Phoenix Suns        24.761905
Name: Age, dtype: float64

The Chicago Bulls had the team with the youngest mean age.

# Who is the highest paid player during the seasion?

In [60]:
nba[['Salary']].dtypes

Salary    object
dtype: object

The Salary column has strings rather than numeric values. This requires converting these strings to numbers before they can be sorted.

In [63]:
nba['Fixed-Salary'] = pd.to_numeric(nba['Salary'], errors='coerce')
nba.dtypes[['Salary','Fixed-Salary']]

Salary           object
Fixed-Salary    float64
dtype: object

In [70]:
nba[['Salary','Fixed-Salary']]

Unnamed: 0,Salary,Fixed-Salary
80,-,
314,-,
315,-,
192,-,
322,-,
...,...,...
157,9600000,9600000.0
129,9607500,9607500.0
402,9631250,9631250.0
188,988464,988464.0


Null values are now NaN in the new Fixed-Salary column.

In [80]:
fixed_nba = nba.dropna()
fixed_nba[['Salary','Fixed-Salary']]

Unnamed: 0,Salary,Fixed-Salary
313,1000000,1000000.0
57,1000000,1000000.0
193,10000000,10000000.0
107,10000000,10000000.0
297,10002681,10002681.0
...,...,...
157,9600000,9600000.0
129,9607500,9607500.0
402,9631250,9631250.0
188,988464,988464.0


In [85]:
fixed_nba.sort_values(by='Fixed-Salary', ascending=False)
fixed_nba[['Name','Fixed-Salary']].head(1)


Unnamed: 0,Name,Fixed-Salary
121,Stephen Curry,37457154.0


Stephen Curry was the highest paid athlete in the dataset.

# At the end of a game, who WOULDN'T you want on the Free Throw Line?

We will use fixed_nba so that NaN values are removed. This removes the players in the dataset who are not associated with any teams.

In [93]:
fixed_nba.sort_values(by='FT%')[['Name','FT%','FTA','Team']].head(10)

Unnamed: 0,Name,FT%,FTA,Team
154,Jacob Evans,0.0,0.0,Golden State Warriors
360,Dzanan Musa,0.0,0.2,Brooklyn Nets
268,Terrence Jones,0.0,0.5,Houston Rockets
325,Tahjere McCall,0.0,1.0,Brooklyn Nets
168,Melvin Frazier,0.25,0.4,Orlando Magic
310,Tyler Lydon,0.333,0.1,Denver Nuggets
451,Ray Spalding,0.333,0.9,Dallas Mavericks
324,Luc Mbah a Moute,0.4,1.3,Los Angeles Clippers
27,Lonzo Ball,0.417,1.0,Los Angeles Lakers
472,Gary Trent Jr.,0.429,0.5,Portland Trail Blazers


All of the players listed above have poor free throw attempts made during the 2018-2019 season but it looks like Tahjere McCall made an average of 1 free throw attempt per game with 0 shots made. He's the last person I'd want attempting a free throw.