### Prompt

The NBA is considered the world's premier basketball league due in large part to the tremendously talented players who dazzle fans every time out on the court. One of the highest honors for any player is to be named an All-Star. All-Stars are selected each year to recognize outstanding performance at each position. The All-Stars 2000-2016 file contains a list of players named All-Stars over that time period.

You may notice that many of the league's 30 teams have at least one player selected over this time period. That is largely a result of the fact that regardless of whether they come from collegiate basketball teams or international club teams, players must enter the NBA Draft when entering the league. The draft is a system devised to spread talent across the league by giving teams who had poor seasons the best odds of signing the most promising new players coming into the league. In the current version of the draft, each of the 30 teams has one pick per round for 2 rounds. However, the league used to have fewer teams and more rounds in the draft. The Drafts Info file contains a listing of the number of picks in each round of the draft from 1984-2016.

For this week's challenge, find the 3 latest drafted players to play in an All-Star game between 2000-2016. (Undrafted players are ineligible; draft position is overall pick number in a given draft) 

Data Source: https://www.kaggle.com/fmejia21/nba-all-star-game-20002016; https://www.nba.com/history/draft

In [1]:
# importing lib

import pandas as pd

import os

import re

In [2]:
# importing data

cwd = os.getcwd()

file_name = ['all_stars.csv', 'drafts.csv']

file_paths = []

for x in file_name:
    
    file_paths.append(os.path.join(cwd, x))
    
dfs = {}
    
for x in file_paths:
    
    basename = os.path.basename(x)
    
    dfs[basename] = (pd.read_csv(x))

df_allstars = dfs['all_stars.csv']

df_drafts = dfs['drafts.csv']

In [3]:
df_allstars.head()

Unnamed: 0,Year,Player,Pos,HT,WT,Team,Selection Type,NBA Draft Status,Nationality
0,2016,Stephen Curry,G,6-3,190,Golden State Warriors,Western All-Star Fan Vote Selection,2009 Rnd 1 Pick 7,United States
1,2016,James Harden,SG,6-5,220,Houston Rockets,Western All-Star Fan Vote Selection,2009 Rnd 1 Pick 3,United States
2,2016,Kevin Durant,SF,6-9,240,Golden State Warriors,Western All-Star Fan Vote Selection,2007 Rnd 1 Pick 2,United States
3,2016,Kawhi Leonard,F,6-7,230,San Antonio Spurs,Western All-Star Fan Vote Selection,2011 Rnd 1 Pick 15,United States
4,2016,Anthony Davis,PF,6-11,253,New Orleans Pelicans,Western All-Star Fan Vote Selection,2012 Rnd 1 Pick 1,United States


In [4]:
df_drafts.head()

Unnamed: 0,Year,Rd1,Rd2,Rd3,Rd4,Rd5,Rd6,Rd7,Rd8,Rd9,Rd10
0,1984,24,23,23.0,23.0,23.0,23.0,23.0,22.0,22.0,22.0
1,1985,24,23,23.0,23.0,23.0,23.0,23.0,,,
2,1986,24,23,23.0,23.0,23.0,23.0,23.0,,,
3,1987,23,23,23.0,23.0,22.0,24.0,23.0,,,
4,1988,25,25,25.0,,,,,,,


### Wrangling Drafts Data

In [5]:
# crosstabbing data

df_drafts_pivoted = pd.melt(df_drafts, id_vars = 'Year', var_name = 'Round', value_name = 'Pick')

df_drafts_pivoted.head()

Unnamed: 0,Year,Round,Pick
0,1984,Rd1,24.0
1,1985,Rd1,24.0
2,1986,Rd1,24.0
3,1987,Rd1,23.0
4,1988,Rd1,25.0


In [6]:
# scaffolding based on pick

scaffold_list = []

for index, row in df_drafts_pivoted.iterrows():
    
    pick_num = 1
    
    while pick_num <= row['Pick']:
        
        new_row = row.copy()
        
        new_row['pick_num'] = pick_num
        
        scaffold_list.append(new_row)
        
        pick_num += 1

In [7]:
scaffold_list

[Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       1
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       2
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       3
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       4
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       5
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       6
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       7
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       8
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num       9
 Name: 0, dtype: object,
 Year        1984
 Round        Rd1
 Pick        24.0
 pick_num      10
 Name: 0, dtype: object,
 Year        1984
 Round      

In [8]:
df_drafts_scaffolded = pd.DataFrame(scaffold_list)

df_drafts_scaffolded.head()

Unnamed: 0,Year,Round,Pick,pick_num
0,1984,Rd1,24.0,1
0,1984,Rd1,24.0,2
0,1984,Rd1,24.0,3
0,1984,Rd1,24.0,4
0,1984,Rd1,24.0,5


In [9]:
# sorting

df_drafts_scaffolded.sort_values(by = ['Year', 'Round'], inplace = True)

In [10]:
# generating overall pick col per year

df_drafts_scaffolded['overall_pick'] = df_drafts_scaffolded.groupby('Year').cumcount()+1

df_drafts_scaffolded.head()

Unnamed: 0,Year,Round,Pick,pick_num,overall_pick
0,1984,Rd1,24.0,1,1
0,1984,Rd1,24.0,2,2
0,1984,Rd1,24.0,3,3
0,1984,Rd1,24.0,4,4
0,1984,Rd1,24.0,5,5


In [11]:
# extracting round number from round

df_drafts_scaffolded['round_num'] = df_drafts_scaffolded['Round'].str.replace('Rd', '')

In [12]:
df_drafts_scaffolded.head()

Unnamed: 0,Year,Round,Pick,pick_num,overall_pick,round_num
0,1984,Rd1,24.0,1,1,1
0,1984,Rd1,24.0,2,2,1
0,1984,Rd1,24.0,3,3,1
0,1984,Rd1,24.0,4,4,1
0,1984,Rd1,24.0,5,5,1


In [13]:
df_drafts_scaffolded.tail()

Unnamed: 0,Year,Round,Pick,pick_num,overall_pick,round_num
65,2016,Rd2,30.0,26,56,2
65,2016,Rd2,30.0,27,57,2
65,2016,Rd2,30.0,28,58,2
65,2016,Rd2,30.0,29,59,2
65,2016,Rd2,30.0,30,60,2


### Wrangling All Star Data

In [14]:
df_allstars.head()

Unnamed: 0,Year,Player,Pos,HT,WT,Team,Selection Type,NBA Draft Status,Nationality
0,2016,Stephen Curry,G,6-3,190,Golden State Warriors,Western All-Star Fan Vote Selection,2009 Rnd 1 Pick 7,United States
1,2016,James Harden,SG,6-5,220,Houston Rockets,Western All-Star Fan Vote Selection,2009 Rnd 1 Pick 3,United States
2,2016,Kevin Durant,SF,6-9,240,Golden State Warriors,Western All-Star Fan Vote Selection,2007 Rnd 1 Pick 2,United States
3,2016,Kawhi Leonard,F,6-7,230,San Antonio Spurs,Western All-Star Fan Vote Selection,2011 Rnd 1 Pick 15,United States
4,2016,Anthony Davis,PF,6-11,253,New Orleans Pelicans,Western All-Star Fan Vote Selection,2012 Rnd 1 Pick 1,United States


In [15]:
# selecting relevant col

df_allstars_2 = df_allstars[['Year', 'Player', 'NBA Draft Status']]

df_allstars_2.head()

Unnamed: 0,Year,Player,NBA Draft Status
0,2016,Stephen Curry,2009 Rnd 1 Pick 7
1,2016,James Harden,2009 Rnd 1 Pick 3
2,2016,Kevin Durant,2007 Rnd 1 Pick 2
3,2016,Kawhi Leonard,2011 Rnd 1 Pick 15
4,2016,Anthony Davis,2012 Rnd 1 Pick 1


In [16]:
# filtering out where NBA draft status is undrafted

df_allstars_filtered = df_allstars_2[~df_allstars_2['NBA Draft Status'].str.contains('Undrafted')]

df_allstars_filtered.head()

Unnamed: 0,Year,Player,NBA Draft Status
0,2016,Stephen Curry,2009 Rnd 1 Pick 7
1,2016,James Harden,2009 Rnd 1 Pick 3
2,2016,Kevin Durant,2007 Rnd 1 Pick 2
3,2016,Kawhi Leonard,2011 Rnd 1 Pick 15
4,2016,Anthony Davis,2012 Rnd 1 Pick 1


In [17]:
# parsing pick year, round, and pick from draft status

df_allstars_filtered['parse'] = df_allstars_filtered['NBA Draft Status'].apply(lambda x: pd.Series(re.findall(r'(\d{4})\sRnd\s(\d+)\sPick\s(\d+)', x)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allstars_filtered['parse'] = df_allstars_filtered['NBA Draft Status'].apply(lambda x: pd.Series(re.findall(r'(\d{4})\sRnd\s(\d+)\sPick\s(\d+)', x)))


In [18]:
df_allstars_filtered

Unnamed: 0,Year,Player,NBA Draft Status,parse
0,2016,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)"
1,2016,James Harden,2009 Rnd 1 Pick 3,"(2009, 1, 3)"
2,2016,Kevin Durant,2007 Rnd 1 Pick 2,"(2007, 1, 2)"
3,2016,Kawhi Leonard,2011 Rnd 1 Pick 15,"(2011, 1, 15)"
4,2016,Anthony Davis,2012 Rnd 1 Pick 1,"(2012, 1, 1)"
...,...,...,...,...
434,2000,Antonio McDyess,1995 Rnd 1 Pick 2,"(1995, 1, 2)"
435,2000,Gary Payton,1990 Rnd 1 Pick 2,"(1990, 1, 2)"
436,2000,Rasheed Wallace,1995 Rnd 1 Pick 4,"(1995, 1, 4)"
437,2000,David Robinson,1987 Rnd 1 Pick 1,"(1987, 1, 1)"


In [19]:
df_allstars_filtered[['pick_year', 'round', 'pick']] = pd.DataFrame(df_allstars_filtered['parse'].to_list(), index=df_allstars_filtered.index)

"""note: When you pass the result of to_list() (which is a list of tuples or lists) to pd.DataFrame(), 
pandas automatically converts each tuple/list into a separate row. 
The number of elements within each tuple/list determines how many columns will be created."""

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allstars_filtered[['pick_year', 'round', 'pick']] = pd.DataFrame(df_allstars_filtered['parse'].to_list(), index=df_allstars_filtered.index)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allstars_filtered[['pick_year', 'round', 'pick']] = pd.DataFrame(df_allstars_filtered['parse'].to_list(), index=df_allstars_filtered.index)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.

'note: When you pass the result of to_list() (which is a list of tuples or lists) to pd.DataFrame(), \npandas automatically converts each tuple/list into a separate row. \nThe number of elements within each tuple/list determines how many columns will be created.'

In [20]:
df_allstars_filtered

Unnamed: 0,Year,Player,NBA Draft Status,parse,pick_year,round,pick
0,2016,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)",2009,1,7
1,2016,James Harden,2009 Rnd 1 Pick 3,"(2009, 1, 3)",2009,1,3
2,2016,Kevin Durant,2007 Rnd 1 Pick 2,"(2007, 1, 2)",2007,1,2
3,2016,Kawhi Leonard,2011 Rnd 1 Pick 15,"(2011, 1, 15)",2011,1,15
4,2016,Anthony Davis,2012 Rnd 1 Pick 1,"(2012, 1, 1)",2012,1,1
...,...,...,...,...,...,...,...
434,2000,Antonio McDyess,1995 Rnd 1 Pick 2,"(1995, 1, 2)",1995,1,2
435,2000,Gary Payton,1990 Rnd 1 Pick 2,"(1990, 1, 2)",1990,1,2
436,2000,Rasheed Wallace,1995 Rnd 1 Pick 4,"(1995, 1, 4)",1995,1,4
437,2000,David Robinson,1987 Rnd 1 Pick 1,"(1987, 1, 1)",1987,1,1


In [21]:
# another way of doing it

df_allstars_filtered['parse'].apply(lambda x: pd.Series(x))

Unnamed: 0,0,1,2
0,2009,1,7
1,2009,1,3
2,2007,1,2
3,2011,1,15
4,2012,1,1
...,...,...,...
434,1995,1,2
435,1990,1,2
436,1995,1,4
437,1987,1,1


### Bringing It Together

In [22]:
# changing data types

df_allstars_filtered['pick_year'] = df_allstars_filtered['pick_year'].astype('int64')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allstars_filtered['pick_year'] = df_allstars_filtered['pick_year'].astype('int64')


In [23]:
# changing data types

df_allstars_filtered['pick'] = df_allstars_filtered['pick'].astype('int64')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_allstars_filtered['pick'] = df_allstars_filtered['pick'].astype('int64')


In [24]:
df_allstars_filtered.dtypes

Year                 int64
Player              object
NBA Draft Status    object
parse               object
pick_year            int64
round               object
pick                 int64
dtype: object

In [25]:
df_drafts_scaffolded.dtypes

Year              int64
Round            object
Pick            float64
pick_num          int64
overall_pick      int64
round_num        object
dtype: object

In [32]:
df_merge = df_allstars_filtered.merge(df_drafts_scaffolded, left_on = ['pick_year', 'round', 'pick'], right_on = ['Year', 'round_num', 'pick_num'])

df_merge

Unnamed: 0,Year_x,Player,NBA Draft Status,parse,pick_year,round,pick,Year_y,Round,Pick,pick_num,overall_pick,round_num
0,2016,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)",2009,1,7,2009,Rd1,30.0,7,7,1
1,2015,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)",2009,1,7,2009,Rd1,30.0,7,7,1
2,2014,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)",2009,1,7,2009,Rd1,30.0,7,7,1
3,2013,Stephen Curry,2009 Rnd 1 Pick 7,"(2009, 1, 7)",2009,1,7,2009,Rd1,30.0,7,7,1
4,2016,James Harden,2009 Rnd 1 Pick 3,"(2009, 1, 3)",2009,1,3,2009,Rd1,30.0,3,3,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
428,2000,Allan Houston,1993 Rnd 1 Pick 11,"(1993, 1, 11)",1993,1,11,1993,Rd1,27.0,11,11,1
429,2000,Michael Finley,1995 Rnd 1 Pick 21,"(1995, 1, 21)",1995,1,21,1995,Rd1,29.0,21,21,1
430,2000,Antonio McDyess,1995 Rnd 1 Pick 2,"(1995, 1, 2)",1995,1,2,1995,Rd1,29.0,2,2,1
431,2000,David Robinson,1987 Rnd 1 Pick 1,"(1987, 1, 1)",1987,1,1,1987,Rd1,23.0,1,1,1


In [33]:
# aggregating

df_agg = df_merge.groupby(['Player', 'NBA Draft Status']).agg(overall_pick = ('overall_pick', 'max'))

df_agg

Unnamed: 0_level_0,Unnamed: 1_level_0,overall_pick
Player,NBA Draft Status,Unnamed: 2_level_1
Al Horford,2007 Rnd 1 Pick 3,3
Allan Houston,1993 Rnd 1 Pick 11,11
Allen Iverson,1996 Rnd 1 Pick 1,1
Alonzo Mourning,1992 Rnd 1 Pick 2,2
Amar'e Stoudemire,2002 Rnd 1 Pick 9,9
...,...,...
Vlade Divac,1989 Rnd 1 Pick 26,26
Wally Szczerbiak,1999 Rnd 1 Pick 6,6
Yao Ming,2002 Rnd 1 Pick 1,1
Zach Randolph,2001 Rnd 1 Pick 19,19


In [34]:
df_agg.shape

(121, 1)

In [35]:
# selecting top three overall picks

df_agg.nlargest(3, 'overall_pick')

Unnamed: 0_level_0,Unnamed: 1_level_0,overall_pick
Player,NBA Draft Status,Unnamed: 2_level_1
Isaiah Thomas,2011 Rnd 2 Pick 30,60
Manu Ginobili,1999 Rnd 2 Pick 28,57
Anthony Mason,1988 Rnd 3 Pick 3,53
