### Проанализировать датафрейм статистики NBA. (файл nba.csv)


Вывести:
1. Проранжировать бюджет команд на зарплату от самого высокого до самого низкого.
2. 10 наиболее высокооплачиваемых игроков в лиге.
3. 3 наиболее высооплачиваемых игроков в каждой команде.
4. Предложить первые пятерки каждой команды на основе их зарплаты (учитывайте, 
   что в первой пятерке на каждой позиции один игрок (см. рисунок)).
5. Самая молодая и самая возростная команда в лиге.
6. Наиболее часто используемый номер игрока в лиге.
7. Имя и рост самого высокого и самого низкого игрока в лиге в метрической системе.
   
P.S. Возможные пропуски в зарплатной ведомости заполнить медианным значением зарплаты команды игрока с пропуском.

**Import of libraries and settings**

In [1]:
import pandas as pd
import numpy as np
import copy
import warnings
warnings.simplefilter('ignore')

In [2]:
data_nba_statistics = pd.read_csv('nba.csv', delimiter=',')
pd.set_option('display.max_columns', 20)
pd.set_option('display.max_rows', 20)
data_nba_statistics                                         #read data, set options and show part of the dataframe 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


**Preparation of data**

In [3]:
data_nba_statistics.drop(data_nba_statistics.tail(1).index, inplace = True) #last row is empty so we can delete it
data_nba_statistics

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


In [4]:
data_nba_statistics['Salary'].isnull().sum()                 #the amount of empty values in column "Salary"

11

In [5]:
data_nba_statistics.groupby('Team')['Salary'].median() #median salary in each team

Team
Atlanta Hawks         2854940.0
Boston Celtics        3021242.5
Brooklyn Nets         1335480.0
Charlotte Hornets     4204200.0
Chicago Bulls         2380440.0
                        ...    
Sacramento Kings      3156600.0
San Antonio Spurs     2814000.0
Toronto Raptors       2900000.0
Utah Jazz             2433333.0
Washington Wizards    4000000.0
Name: Salary, Length: 30, dtype: float64

In [6]:
data_nba_statistics['Team'].isnull().sum()                  #check empty values in column 'Team'

0

In [7]:
data_nba_statistics['Salary'] = data_nba_statistics.groupby('Team')['Salary'].transform(lambda x: x.fillna(x.median()))
data_nba_statistics                                         #fill empty fields in column "Salary"

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,3021242.5
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


In [8]:
data_nba_statistics['Salary'].isnull().sum()                  #check if all voids has been filled

0

In [9]:
#convert height and weight to the International System of Units (kilos, meters)
data_nba_statistics['Height'] = data_nba_statistics['Height'].apply(lambda x: round((30.48 * int(x.split('-')[0]) + 2.54 * int(x.split('-')[1])) / 100, 2))
data_nba_statistics['Weight'] *= 0.4536
data_nba_statistics

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,1.88,81.6480,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,1.98,106.5960,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,1.96,92.9880,Boston University,3021242.5
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,1.96,83.9160,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,2.08,104.7816,,5000000.0
...,...,...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,2.08,106.1424,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,1.91,92.0808,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,1.85,81.1944,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,2.21,116.1216,,2900000.0


**Task 1**
1. Проранжировать бюджет команд на зарплату от самого высокого до самого низкого.

In [10]:
#we just need to sort players by their salary in each team
data_nba_statistics.groupby(['Team','Name']).max().sort_values(by=['Team','Salary'], ascending=[True, False]) 

Unnamed: 0_level_0,Unnamed: 1_level_0,Number,Position,Age,Height,Weight,College,Salary
Team,Name,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Atlanta Hawks,Paul Millsap,4.0,PF,31.0,2.03,111.5856,Louisiana Tech,18671659.0
Atlanta Hawks,Al Horford,15.0,C,30.0,2.08,111.1320,Florida,12000000.0
Atlanta Hawks,Tiago Splitter,11.0,C,31.0,2.11,111.1320,,9756250.0
Atlanta Hawks,Jeff Teague,0.0,PG,27.0,1.88,84.3696,Wake Forest,8000000.0
Atlanta Hawks,Kyle Korver,26.0,SG,35.0,2.01,96.1632,Creighton,5746479.0
...,...,...,...,...,...,...,...,...
Washington Wizards,Kelly Oubre Jr.,12.0,SF,20.0,2.01,92.9880,Kansas,1920240.0
Washington Wizards,Garrett Temple,17.0,SG,30.0,1.98,88.4520,LSU,1100602.0
Washington Wizards,Jarell Eddie,8.0,SG,24.0,2.01,98.8848,Virginia Tech,561716.0
Washington Wizards,JJ Hickson,21.0,C,27.0,2.06,109.7712,North Carolina State,273038.0


**Task 2**
2. 10 наиболее высокооплачиваемых игроков в лиге.

In [11]:
data_nba_statistics.nlargest(10, ['Salary']) 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
109,Kobe Bryant,Los Angeles Lakers,24.0,SF,37.0,1.98,96.1632,,25000000.0
169,LeBron James,Cleveland Cavaliers,23.0,SF,31.0,2.03,113.4,,22970500.0
33,Carmelo Anthony,New York Knicks,7.0,SF,32.0,2.03,108.864,Syracuse,22875000.0
251,Dwight Howard,Houston Rockets,12.0,C,30.0,2.11,120.204,,22359364.0
339,Chris Bosh,Miami Heat,1.0,PF,32.0,2.11,106.596,Georgia Tech,22192730.0
100,Chris Paul,Los Angeles Clippers,3.0,PG,31.0,1.83,79.38,Wake Forest,21468695.0
414,Kevin Durant,Oklahoma City Thunder,35.0,SF,27.0,2.06,108.864,Texas,20158622.0
164,Derrick Rose,Chicago Bulls,1.0,PG,27.0,1.91,86.184,Memphis,20093064.0
349,Dwyane Wade,Miami Heat,3.0,SG,34.0,1.93,99.792,Marquette,20000000.0
23,Brook Lopez,Brooklyn Nets,11.0,C,28.0,2.13,124.74,Stanford,19689000.0


**Task 3**
3. 3 наиболее высооплачиваемых игроков в каждой команде.

In [12]:
richest_players = pd.DataFrame()                           #create new dataframe for players
tmp = copy.deepcopy(data_nba_statistics)                   #making deep copy of main dataframe to keep it undamaged
tmp.set_index('Team', inplace=True)                        #set field 'team' as index
for i in tmp.index.unique():                               #go through each team
    max_salary = tmp.loc[i].nlargest(3, ['Salary'])        #find three highest paid players in each team
    richest_players = pd.concat([richest_players, max_salary]) #add them to new dataframe
richest_players

Unnamed: 0_level_0,Name,Number,Position,Age,Height,Weight,College,Salary
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Boston Celtics,Amir Johnson,90.0,PF,29.0,2.06,108.8640,,12000000.0
Boston Celtics,Avery Bradley,0.0,PG,25.0,1.88,81.6480,Texas,7730337.0
Boston Celtics,Isaiah Thomas,4.0,PG,27.0,1.75,83.9160,Washington,6912869.0
Brooklyn Nets,Brook Lopez,11.0,C,28.0,2.13,124.7400,Stanford,19689000.0
Brooklyn Nets,Thaddeus Young,30.0,PF,27.0,2.03,100.2456,Georgia Tech,11235955.0
...,...,...,...,...,...,...,...,...
Portland Trail Blazers,Ed Davis,17.0,C,27.0,2.08,108.8640,North Carolina,6980802.0
Portland Trail Blazers,Gerald Henderson,9.0,SG,28.0,1.96,97.5240,Duke,6000000.0
Utah Jazz,Gordon Hayward,20.0,SF,26.0,2.03,102.5136,Butler,15409570.0
Utah Jazz,Derrick Favors,15.0,PF,24.0,2.08,120.2040,Georgia Tech,12000000.0


**Task 4**
4. Предложить первые пятерки каждой команды на основе их зарплаты (учитывайте, 
   что в первой пятерке на каждой позиции один игрок (см. рисунок)).

In [13]:
tmp = copy.deepcopy(data_nba_statistics)                   #making deep copy of main dataframe to keep it undamaged
tmp.set_index(['Team', 'Position'], inplace=True)          #now we'll have multiindex 'team' + 'position'

In [14]:
teams_from_richest_players = pd.DataFrame()                           #new dataframe for result
for i in tmp.index.unique():                                         #our new indexes - pairs: {data, team}
    max_salary_in_position = tmp.loc[i].nlargest(1, ['Salary'])      #then we just find best paid players in each category
    teams_from_richest_players = pd.concat([teams_from_richest_players, max_salary_in_position]) 
teams_from_richest_players.sort_values(by = ['Team', 'Position'], inplace = True)
teams_from_richest_players

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Number,Age,Height,Weight,College,Salary
Team,Position,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Atlanta Hawks,C,Al Horford,15.0,30.0,2.08,111.1320,Florida,12000000.0
Atlanta Hawks,PF,Paul Millsap,4.0,31.0,2.03,111.5856,Louisiana Tech,18671659.0
Atlanta Hawks,PG,Jeff Teague,0.0,27.0,1.88,84.3696,Wake Forest,8000000.0
Atlanta Hawks,SF,Thabo Sefolosha,25.0,32.0,2.01,99.7920,,4000000.0
Atlanta Hawks,SG,Kyle Korver,26.0,35.0,2.01,96.1632,Creighton,5746479.0
...,...,...,...,...,...,...,...,...
Washington Wizards,C,Nene Hilario,42.0,33.0,2.11,113.4000,,13000000.0
Washington Wizards,PF,Markieff Morris,5.0,26.0,2.08,111.1320,Kansas,8000000.0
Washington Wizards,PG,John Wall,2.0,25.0,1.93,88.4520,Kentucky,15851950.0
Washington Wizards,SF,Otto Porter Jr.,22.0,23.0,2.03,89.8128,Georgetown,4662960.0


**Task 5**
5. Самая молодая и самая возрастная команда в лиге.

In [15]:
average_age = data_nba_statistics.groupby('Team')['Age'].mean() #calculate average age and then find min and max
average_age = round(average_age, 2)
print("The team with minimal average age of member is", average_age.idxmin(), 'with average age', average_age.min())
print("The team with maximal average age of member is", average_age.idxmax(), 'with average age', average_age.max())

The team with minimal average age of member is Utah Jazz with average age 24.47
The team with maximal average age of member is San Antonio Spurs with average age 31.6


**Task 6**
6. Наиболее часто используемый номер игрока в лиге.

In [16]:
data_nba_statistics['Number'].mode()    #the most frequently used value in field "Number"

0    5.0
dtype: float64

**Task 7**
7. Имя и рост самого высокого и самого низкого игрока в лиге в метрической системе.

In [17]:
height_of_players = data_nba_statistics[['Name', 'Height']].sort_values(by = 'Height', ascending = False)
print("The tallest player is", height_of_players.iloc[0].Name, 'with height', height_of_players.iloc[0].Height, 'meters')
print("The shortest player is", height_of_players.iloc[-1].Name, 'with height', height_of_players.iloc[-1].Height, 'meters')
#here we just sorted players by their height so that the first player will be the tallest ant the last one will be the shortest

The tallest player is Kristaps Porzingis with height 2.21 meters
The shortest player is Isaiah Thomas with height 1.75 meters
