# Pre_processing: players_model_data

In this notebook I created a dataset with the NBA players from 1999 to 2021 with the necessary variables to solve the optimization problem:

- Season
- Team
- PER
- DRtg
- Salary 
- Position
- Unit (feature created according to the minutes played during the season)

## Merging salaries with general stats have some challenges:

- Players can change teams in the middle of a season, and receiving the same salary. That makes that the entries in the player's statistics `stats` and the salary dataset `salaries` have different shapes. 
- Players that do not play or played can still receive a salary. For example, Monta Ellis was waived in 2017 but he kept receiving salaries as the Indiana Pacers use on him the stretch provision.
- Players with accents and 'III' or 'Jr' suffix are different in the two datasets.

## Merging solutions:

- Players will keep the same salary or its season, regarless of the team they are playing for.
- Players without stats will be dropped, as players that do not play are not relevant for the project. I am not interested in players that receive salaries after retired or waived. 
- Fuzzy merging

# Virtual environment and packages

In [550]:
import sys
sys.executable

'C:\\Users\\pipeg\\miniconda3\\envs\\nba_team_maker\\python.exe'

In [551]:
print(os.getcwd())
path = 'c:/Users/pipeg/Documents/GitHub/nba-team-creator/'
os.chdir(path)
os.getcwd()

c:\Users\pipeg\Documents\GitHub\nba-team-creator


'c:\\Users\\pipeg\\Documents\\GitHub\\nba-team-creator'

In [553]:
import numpy as np 
import pandas as pd
from preprocessing_functions import remove_accents, create_unit_indicator
from thefuzz import fuzz, process

 # Load raw data

In [554]:
adv_stats = pd.read_csv('raw_data/advanced.csv')
poss_stats = pd.read_csv('raw_data/100_possessions.csv')
salaries = pd.read_csv('raw_data/salaries.csv')


# Basic cleaning

In [556]:
# Remove combination of stats of different teams
stats = adv_stats[adv_stats.Tm != "TOT"][['Player', 'Pos', 'Team', 'Season', 'MP', 'PER']]

# Create a unit indicator
stats = create_unit_indicator(stats)
stats.drop(['MP', 'MP_Rank'], axis = 1, inplace = True)

# Add DRtg
stats = stats.merge(poss_stats[['Player', 'Team', 'Season', 'DRtg']], on = ['Player', 'Team', 'Season'])

# Remove accents of the names and dots
stats['Player'] = remove_accents(stats['Player'])
salaries['Player'] = remove_accents(salaries['Player'])

In [558]:
stats

Unnamed: 0,Player,Pos,Team,Season,PER,Unit,DRtg
0,Deni Avdija,SF,Washington Wizards,2020/21,7.6,2,113.0
1,Bradley Beal,SG,Washington Wizards,2020/21,22.7,1,115.0
2,Jordan Bell,C,Washington Wizards,2020/21,8.5,3,109.0
3,Davis Bertans,PF,Washington Wizards,2020/21,11.4,1,116.0
4,Isaac Bonga,SF,Washington Wizards,2020/21,4.0,3,114.0
...,...,...,...,...,...,...,...
11692,Roshown McLeod,SF,Atlanta Hawks,1999/00,9.1,2,110.0
11693,Dikembe Mutombo,C,Atlanta Hawks,1999/00,19.4,1,101.0
11694,Isaiah Rider,SG,Atlanta Hawks,1999/00,15.7,1,111.0
11695,Jason Terry,PG,Atlanta Hawks,1999/00,13.9,2,108.0


In [559]:
salaries

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
0,Stephen Curry,43006362,43006362,2020/21
1,Chris Paul,41358814,41358814,2020/21
2,Russell Westbrook,41358814,41358814,2020/21
3,James Harden,41254920,41254920,2020/21
4,John Wall,41254920,41254920,2020/21
...,...,...,...,...
10869,Jonathan Kerner,5000,7755,1999/00
10870,Ira Bowman,4529,7025,1999/00
10871,Trevor Winter,4529,7025,1999/00
10872,Steve Goodrich,4529,7025,1999/00


In [560]:
stats[stats.duplicated()]

Unnamed: 0,Player,Pos,Team,Season,PER,Unit,DRtg


In [561]:
salaries[salaries.duplicated()]

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season


# Merging the two datasets with different players naming

## Must change by hand from salaries dataset
- Aleksandar Pavlovic to Sasha Pavlovic
- BJ Mullens to Byron Mullens
- Cam Reynolds to Cameron Reynolds
- Didier Ilunga-Mbenga to DJ Mbenga
- Hidayet Turkoglu to Hedo Turkoglu
- Iakovos Tsakalidis to Jake Tsakalidis
- Ishmael Smith to Ish Smith
- Jeffery Taylor to Jeff Taylor
- Jose Juan Barea to JJ Barea
- Joseph Young to Joe Young
- Kiwane Garris to Kiwane Lemorris Garris
- Moe Harkless to Maurice Harkless
- Monty Williams to Mo Williams
- Patrick Mills to Patty Mills
- Predrag Stojakovic to Peja Stojakovic
- Radoslav Nesterovic to Rasho Nesterovic
- Raymond Spalding to Ray Spalding
- Sergey Monya to Sergei Monia
- Timothe Luwawu to Timothe Luwawu-Cabarrot
- Walter Tavares to Edy Tavares


In [563]:
# E.g. Timothe Luwawu-Cabarrot do not appear in salaries dataset
salaries[salaries.Player == 'Timothe Luwawu-Cabarrot']

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season


In [564]:
# But he does in stats dataset
stats[stats.Player == 'Timothe Luwawu-Cabarrot']

Unnamed: 0,Player,Pos,Team,Season,PER,Unit,DRtg
579,Timothe Luwawu-Cabarrot,SF,Brooklyn Nets,2020/21,7.8,2,116.0
1170,Timothe Luwawu-Cabarrot,SF,Brooklyn Nets,2019/20,11.0,2,112.0
1414,Timothe Luwawu-Cabarrot,SF,Oklahoma City Thunder,2018/19,3.4,3,109.0
1754,Timothe Luwawu-Cabarrot,SF,Chicago Bulls,2018/19,9.0,3,114.0
1979,Timothe Luwawu-Cabarrot,SF,Philadelphia 76ers,2017/18,7.0,2,110.0
2572,Timothe Luwawu-Cabarrot,SF,Philadelphia 76ers,2016/17,8.5,2,111.0


In [565]:
# Changing some names by hand
dict_players = {
'Aleksandar Pavlovic': 'Sasha Pavlovic',
'BJ Mullens': 'Byron Mullens',
'Cam Reynolds': 'Cameron Reynolds',
'Didier Ilunga-Mbenga': 'DJ Mbenga',
'Hidayet Turkoglu': 'Hedo Turkoglu',
'Iakovos Tsakalidis': 'Jake Tsakalidis',
'Ishmael Smith': 'Ish Smith',
'Jeffery Taylor': 'Jeff Taylor',
'Jose Juan Barea': 'JJ Barea',
'Joseph Young': 'Joe Young',
'Kiwane Garris': 'Kiwane Lemorris Garris',
'Moe Harkless': 'Maurice Harkless',
'Patrick Mills': 'Patty Mills',
'Predrag Stojakovic': 'Peja Stojakovic',
'Radoslav Nesterovic': 'Rasho Nesterovic',
'Raymond Spalding': 'Ray Spalding',
'Sergey Monya': 'Sergei Monia',
'Timothe Luwawu': 'Timothe Luwawu-Cabarrot',
'Walter Tavares': 'Edy Tavares',
'KJ Martin': 'Kenyon Martin Jr',
'Maurice Williams': 'Mo Williams',}


In [566]:
salaries['Player'].replace(dict_players, inplace= True)

# Jaren Jackson is wrongly named in salaries dataset
salaries.at[[163,740, 1283], 'Player'] = 'Jaren Jackson Jr'

In [567]:
salaries.iloc[[163,740, 1283], :]

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
163,Jaren Jackson Jr,7257360,7257360,2020/21
740,Jaren Jackson Jr,6927480,6972213,2019/20
1283,Jaren Jackson Jr,5922720,6059230,2018/19


# Fuzzy merging using `thefuzz`

In [495]:
# Create a list of players of the 2 datasets
player_list_salaries = salaries['Player'].unique()
player_list_stats = stats['Player'].unique()

# Create a similarity dictionary between the 2 datasets
keys = {}

# Greedy algorithm, takes 2 min to complete the dictionary
for player in player_list_stats:
    keys[player] = ((process.extract(player, player_list_salaries, limit = 1)))


In [568]:
keys[player][0]


('Wayne Turner', 100)

In [569]:
keys[player][0][1]

100

In [570]:
# How many names are a perfect match?
clean_names = []
for player in keys:
    if keys[player][0][1] == 100:
        clean_names += [keys[player][0][0]]
        
print("Matched names :", len(clean_names))
print("No-matched names :", len(player_list_stats)- len(clean_names))
print(str(round((len(clean_names)/len(player_list_stats)*100),2)), '% match between the players names')

        

Matched names : 2048
No-matched names : 79
96.29 % match between the players names


In [571]:
# Print the mostly similar names (90 to 100)
# From the 90 benchmark, it matched well 29 out of the 32.
for player in keys:
    if 90 < keys[player][0][1] <100:
        print(player, keys[player][0])

Troy Brown Jr ('Troy Brown', 95)
DeAndre' Bembry ('DeAndre Bembry', 97)
Marvin Bagley III ('Marvin Bagley', 95)
Derrick Jones Jr ('Derrick Jones', 95)
Wendell Carter Jr ('Wendell Carter', 95)
Frank Mason III ('Frank Mason', 95)
Dennis Smith Jr ('Dennis Smith', 95)
Kira Lewis Jr ('Kira Lewis', 95)
Jaren Jackson Jr ('Jaren Jackson', 95)
Xavier Tillman Sr ('Xavier Tillman', 95)
Dennis Schroder ('Dennis Schroeder', 97)
Lou Williams ('Louis Williams', 92)
Kevin Porter Jr ('Kevin Porter', 95)
Kelly Oubre Jr ('Kelly Oubre', 95)
Michael Porter Jr ('Michael Porter', 95)
Vernon Carey Jr ('Vernon Carey', 95)
Devonte' Graham ('Devonte Graham', 97)
Vince Edwards ('Vincent Edwards', 93)
Walt Lemon Jr ('Walter Lemon Jr', 93)
Omer Ask ('Omer Asik', 94)
James Webb III ('James Webb', 95)
John Lucas III ('John Lucas', 95)
Tibor Plei ('Tibor Pleiss', 91)
Lou Amundson ('Louis Amundson', 92)
Amar'e Stoudemire ('Amare Stoudemire', 97)
Toure' Murry ('Toure Murry', 96)
Viacheslav Kravtsov ('Vyacheslav Kravtsov

In [572]:
# Creating a dictionary with the good matches
players_matched = {}

for player in keys:
    if 90 < keys[player][0][1] <100:
        players_matched[keys[player][0][0]] = player

# Drop some wrongly matched names (e.g. 'Ervin Johnson' to 'Kevin Johnson')
wrong_matched = ['Brandon Knight', 'Ervin Johnson', 'Kenyon Martin', 'Jaren Jackson']

for i in wrong_matched:
    players_matched.pop(i, None)

In [573]:
players_matched

{'Troy Brown': 'Troy Brown Jr',
 'DeAndre Bembry': "DeAndre' Bembry",
 'Marvin Bagley': 'Marvin Bagley III',
 'Derrick Jones': 'Derrick Jones Jr',
 'Wendell Carter': 'Wendell Carter Jr',
 'Frank Mason': 'Frank Mason III',
 'Dennis Smith': 'Dennis Smith Jr',
 'Kira Lewis': 'Kira Lewis Jr',
 'Xavier Tillman': 'Xavier Tillman Sr',
 'Dennis Schroeder': 'Dennis Schroder',
 'Louis Williams': 'Lou Williams',
 'Kevin Porter': 'Kevin Porter Jr',
 'Kelly Oubre': 'Kelly Oubre Jr',
 'Michael Porter': 'Michael Porter Jr',
 'Vernon Carey': 'Vernon Carey Jr',
 'Devonte Graham': "Devonte' Graham",
 'Vincent Edwards': 'Vince Edwards',
 'Walter Lemon Jr': 'Walt Lemon Jr',
 'Omer Asik': 'Omer Ask',
 'James Webb': 'James Webb III',
 'John Lucas': 'John Lucas III',
 'Tibor Pleiss': 'Tibor Plei',
 'Louis Amundson': 'Lou Amundson',
 'Amare Stoudemire': "Amar'e Stoudemire",
 'Toure Murry': "Toure' Murry",
 'Vyacheslav Kravtsov': 'Viacheslav Kravtsov',
 'Vitor Faverani': 'Vitor Luiz Faverani',
 'Willie Solomon

In [575]:
salaries[salaries.Player == 'Troy Brown Jr']

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season


In [577]:
salaries['Player'].replace(players_matched, inplace=True)

# Merge

In [587]:
outer_merge = stats.merge(salaries, on = ['Player', 'Season'], how = 'outer')

In [590]:
# Players without salary
outer_merge[outer_merge['Salary adjusted by inflation'].isnull()].sort_values(by = 'Unit', ascending = True)

Unnamed: 0,Player,Pos,Team,Season,PER,Unit,DRtg,Salary,Salary adjusted by inflation
6859,Lou Amundson,PF,Phoenix Suns,2008/09,13.3,2.0,109.0,,
1051,Marquese Chriss,PF,Golden State Warriors,2019/20,19.5,2.0,110.0,,
4591,Anthony Tolliver,PF,Charlotte Bobcats,2013/14,11.0,2.0,107.0,,
11255,Felipe Lopez,SG,Vancouver Grizzlies,1999/00,11.9,2.0,108.0,,
10458,Anthony Johnson,SG,New Jersey Nets,2001/02,11.8,2.0,98.0,,
...,...,...,...,...,...,...,...,...,...
5955,Orien Greene,SG,New Jersey Nets,2010/11,33.0,3.0,95.0,,
5970,Mario West,SG,New Jersey Nets,2010/11,9.1,3.0,108.0,,
6098,Mike Harris,PF,Houston Rockets,2010/11,17.9,3.0,108.0,,
5550,Mikki Moore,C,Golden State Warriors,2011/12,6.3,3.0,109.0,,


In [591]:
# I will lose 272 Player by Season observations (normally residual players or players with no salry data in HypeHoops.com)
inner_merge = stats.merge(salaries, on = ['Player', 'Season'], how = 'inner')

In [592]:
inner_merge

Unnamed: 0,Player,Pos,Team,Season,PER,Unit,DRtg,Salary,Salary adjusted by inflation
0,Deni Avdija,SF,Washington Wizards,2020/21,7.6,2,113.0,4469160,4469160
1,Bradley Beal,SG,Washington Wizards,2020/21,22.7,1,115.0,28751774,28751774
2,Jordan Bell,C,Washington Wizards,2020/21,8.5,3,109.0,592368,592368
3,Jordan Bell,C,Golden State Warriors,2020/21,5.6,3,104.0,592368,592368
4,Davis Bertans,PF,Washington Wizards,2020/21,11.4,1,116.0,15000000,15000000
...,...,...,...,...,...,...,...,...,...
11422,Roshown McLeod,SF,Atlanta Hawks,1999/00,9.1,2,110.0,914760,1418907
11423,Dikembe Mutombo,C,Atlanta Hawks,1999/00,19.4,1,101.0,12820249,19885810
11424,Isaiah Rider,SG,Atlanta Hawks,1999/00,15.7,1,111.0,5410000,8391587
11425,Jason Terry,PG,Atlanta Hawks,1999/00,13.9,2,108.0,1468920,2278478


# Save created data

In [None]:
inner_merge.to_csv('out_data/players_model_data.csv', index = False)

# Examples of the challenges

In [40]:
stats[stats.Player.duplicated()]

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
51,Alex Len,C,Toronto Raptors,2020/21,76,4.3,20.0,3,111.0
56,Matt Thomas,SG,Toronto Raptors,2020/21,192,8.2,18.0,3,118.0
82,Terence Davis,SG,Sacramento Kings,2020/21,581,14.0,10.0,2,116.0
109,Rodney Hood,SF,Portland Trail Blazers,2020/21,726,5.0,11.0,3,119.0
117,Norman Powell,SF,Portland Trail Blazers,2020/21,928,14.2,9.0,2,118.0
...,...,...,...,...,...,...,...,...,...
11695,Roshown McLeod,SF,Atlanta Hawks,1999/00,860,9.1,9.0,2,110.0
11696,Dikembe Mutombo,C,Atlanta Hawks,1999/00,2984,19.4,1.0,1,101.0
11697,Isaiah Rider,SG,Atlanta Hawks,1999/00,2084,15.7,4.0,1,111.0
11698,Jason Terry,PG,Atlanta Hawks,1999/00,1888,13.9,6.0,2,108.0


In [41]:
stats[stats.Player == 'Matt Thomas']

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
38,Matt Thomas,SG,Utah Jazz,2020/21,134,10.5,15.0,3,112.0
56,Matt Thomas,SG,Toronto Raptors,2020/21,192,8.2,18.0,3,118.0
684,Matt Thomas,SG,Toronto Raptors,2019/20,440,13.3,12.0,3,109.0


In [42]:
salaries[salaries.Player == 'Matt Thomas']

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
437,Matt Thomas,1517981,1517981,2020/21
990,Matt Thomas,898310,904110,2019/20


In [43]:
stats[stats.Player == 'Roshown McLeod'] 

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
10895,Roshown McLeod,SF,Philadelphia 76ers,2000/01,15,0.9,18.0,3,106.0
11218,Roshown McLeod,SF,Atlanta Hawks,2000/01,907,10.6,10.0,2,107.0
11695,Roshown McLeod,SF,Atlanta Hawks,1999/00,860,9.1,9.0,2,110.0


In [45]:
salaries[salaries.Player == 'Roshown McLeod'] 

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
9726,Roshown McLeod,1509001,2185482,2001/02
10235,Roshown McLeod,978600,1463341,2000/01
10673,Roshown McLeod,914760,1418907,1999/00


In [46]:
stats[stats.Player == 'Monta Ellis'] .head()

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
2766,Monta Ellis,SG,Indiana Pacers,2016/17,1998,9.9,5.0,1,110.0
3314,Monta Ellis,SG,Indiana Pacers,2015/16,2734,13.7,2.0,1,103.0
3966,Monta Ellis,SG,Dallas Mavericks,2014/15,2699,16.5,1.0,1,107.0
4512,Monta Ellis,SG,Dallas Mavericks,2013/14,3023,16.8,1.0,1,109.0
4864,Monta Ellis,SG,Milwaukee Bucks,2012/13,3076,16.2,1.0,1,105.0


In [47]:
salaries[salaries.Player == 'Monta Ellis'] .head()

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
330,Monta Ellis,2245400,2245400,2020/21
1411,Monta Ellis,2245400,2297153,2018/19
1975,Monta Ellis,2245400,2363117,2017/18
2355,Monta Ellis,10763500,11512824,2016/17
2871,Monta Ellis,10300000,11126933,2015/16


- Players with accents and 'III' or 'Jr' suffix are different in the two datasets.

In [48]:
stats[stats.Player == 'Nikola Vučević'] .head()


Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
186,Nikola Vučević,C,Orlando Magic,2020/21,1500,23.5,2.0,1,111.0
539,Nikola Vučević,C,Chicago Bulls,2020/21,848,21.8,9.0,2,109.0
803,Nikola Vučević,C,Orlando Magic,2019/20,1998,21.9,3.0,1,107.0
1402,Nikola Vučević,C,Orlando Magic,2018/19,2510,25.5,3.0,1,103.0
2007,Nikola Vučević,C,Orlando Magic,2017/18,1683,19.7,5.0,1,106.0


In [49]:
salaries[salaries.Player == 'Nikola Vučević'] .head()


Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season


In [50]:
salaries[salaries.Player == 'Nikola Vucevic'] .head()

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
43,Nikola Vucevic,26000000,26000000,2020/21
600,Nikola Vucevic,28000000,28180805,2019/20
1191,Nikola Vucevic,12750000,13043869,2018/19
1766,Nikola Vucevic,12250000,12892217,2017/18
2342,Nikola Vucevic,11750000,12568002,2016/17


In [52]:
stats[stats.Player == 'Gary Trent Jr.'] .head()

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg
57,Gary Trent Jr.,SG,Toronto Raptors,2020/21,540,11.3,13.0,3,115.0
119,Gary Trent Jr.,SG,Portland Trail Blazers,2020/21,1262,12.4,7.0,2,119.0
745,Gary Trent Jr.,SF,Portland Trail Blazers,2019/20,1332,12.9,6.0,2,117.0
1334,Gary Trent Jr.,SG,Portland Trail Blazers,2018/19,111,3.6,16.0,3,115.0


In [53]:
salaries[salaries.Player == 'Gary Trent Jr.'] .head()

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season


In [54]:
salaries[salaries.Player == 'Gary Trent Jr'] .head()

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
403,Gary Trent Jr,1663861,1663861,2020/21
962,Gary Trent Jr,1416852,1426001,2019/20
1550,Gary Trent Jr,838464,857789,2018/19


In [60]:
# Matt Thomas keeps the same salary for the Jazz and the Raptors, as he played for them within same season.
total_stats[total_stats.Player == 'Matt Thomas']

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg,Salary,Salary adjusted by inflation
46,Matt Thomas,SG,Utah Jazz,2020/21,134.0,10.5,15.0,3.0,112.0,1517981.0,1517981.0
47,Matt Thomas,SG,Toronto Raptors,2020/21,192.0,8.2,18.0,3.0,118.0,1517981.0,1517981.0
693,Matt Thomas,SG,Toronto Raptors,2019/20,440.0,13.3,12.0,3.0,109.0,898310.0,904110.0


In [61]:
# Roshown McLeod in 2001/02 didn't play, so he has NaN values
total_stats[total_stats.Player == 'Roshown McLeod']

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg,Salary,Salary adjusted by inflation
10926,Roshown McLeod,SF,Philadelphia 76ers,2000/01,15.0,0.9,18.0,3.0,106.0,978600.0,1463341.0
10927,Roshown McLeod,SF,Atlanta Hawks,2000/01,907.0,10.6,10.0,2.0,107.0,978600.0,1463341.0
11697,Roshown McLeod,SF,Atlanta Hawks,1999/00,860.0,9.1,9.0,2.0,110.0,914760.0,1418907.0
12565,Roshown McLeod,,,2001/02,,,,,,1509001.0,2185482.0


In [62]:
# Players with accent names like Nikola Vučević were merged correctly
total_stats[total_stats.Player == 'Nikola Vucevic'] .head()

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg,Salary,Salary adjusted by inflation
220,Nikola Vucevic,C,Orlando Magic,2020/21,1500.0,23.5,2.0,1.0,111.0,26000000.0,26000000.0
221,Nikola Vucevic,C,Chicago Bulls,2020/21,848.0,21.8,9.0,2.0,109.0,26000000.0,26000000.0
823,Nikola Vucevic,C,Orlando Magic,2019/20,1998.0,21.9,3.0,1.0,107.0,28000000.0,28180805.0
1449,Nikola Vucevic,C,Orlando Magic,2018/19,2510.0,25.5,3.0,1.0,103.0,12750000.0,13043869.0
2030,Nikola Vucevic,C,Orlando Magic,2017/18,1683.0,19.7,5.0,1.0,106.0,12250000.0,12892217.0


In [63]:
# What about the "Junior names"?
total_stats[total_stats.Player == 'Gary Trent Jr'] .head()

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg,Salary,Salary adjusted by inflation
68,Gary Trent Jr,SG,Toronto Raptors,2020/21,540.0,11.3,13.0,3.0,115.0,1663861.0,1663861.0
69,Gary Trent Jr,SG,Portland Trail Blazers,2020/21,1262.0,12.4,7.0,2.0,119.0,1663861.0,1663861.0
760,Gary Trent Jr,SF,Portland Trail Blazers,2019/20,1332.0,12.9,6.0,2.0,117.0,1416852.0,1426001.0
1367,Gary Trent Jr,SG,Portland Trail Blazers,2018/19,111.0,3.6,16.0,3.0,115.0,838464.0,857789.0


In [65]:
total_stats.shape

(12706, 11)

In [66]:
total_stats[total_stats.Salary.isna()]

Unnamed: 0,Player,Pos,Team,Season,MP,PER,MP_Rank,Unit,DRtg,Salary,Salary adjusted by inflation
6,Troy Brown Jr,SF,Washington Wizards,2020/21,287.0,6.3,15.0,3.0,115.0,,
7,Troy Brown Jr,SF,Chicago Bulls,2020/21,237.0,11.2,15.0,3.0,113.0,,
22,Ish Smith,PG,Washington Wizards,2020/21,924.0,11.6,9.0,2.0,113.0,,
50,DeAndre' Bembry,SF,Toronto Raptors,2020/21,972.0,11.2,9.0,2.0,111.0,,
84,Patty Mills,PG,San Antonio Spurs,2020/21,1685.0,11.8,5.0,1.0,117.0,,
...,...,...,...,...,...,...,...,...,...,...,...
11555,Moochie Norris,PG,Houston Rockets,1999/00,502.0,18.4,12.0,3.0,106.0,,
11603,Popeye Jones,PF,Denver Nuggets,1999/00,330.0,12.4,14.0,3.0,106.0,,
11622,Dennis Rodman,PF,Dallas Mavericks,1999/00,389.0,8.4,12.0,3.0,103.0,,
11640,Donny Marshall,SF,Cleveland Cavaliers,1999/00,39.0,0.2,17.0,3.0,107.0,,


In [35]:
salaries[salaries.Player == 'Popeye Jones']

Unnamed: 0,Player,Salary,Salary adjusted by inflation,Season
8850,Popeye Jones,1070000,1501593,2003/04
9309,Popeye Jones,1030000,1475991,2002/03
9728,Popeye Jones,1500000,2172446,2001/02
10077,Popeye Jones,2812500,4205650,2000/01
