DataFrames 1

DataFrames are 2-D data structures and they consist of rows and columns, in contrast to just columns for a series. 2-D data structures require 2 points of reference in order to extract a value. We will need to know the row and the column to know which specific cell we are referring to. 

In this section we will cover the following topics:
- Introduction to DataFrames
- Shared Methods and Attributes between Series and DataFrames
- Differences between Shared Methods
- Select One Column From the DataFrame
- Select Two or More Columns from a DataFrame
- Add New Column to DataFrame
- Broadcasting Operations on DataFrame
- A Review of the .value_counts() Method
- .drop() : Drop DataFrame Rows with NULL Values
- .fillna() : Fill in NULL DataFrame Values 
- .astype() : Convert DataFrame Column Types 
- .sort_values() : Sort a DataFrame 
- .sort_index() : Sort DataFrame Index 
- .rank() : Rank Series Values 

In [1]:
import pandas as pd

In [2]:
pd.__version__

'1.1.3'

In [4]:
nba = pd.read_csv("nba.csv")
nba 

# The index on the LHS is not included in the dataset and is generated by pandas. 
# We can think of the columns as individual series that are connected by a common index label.  

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [5]:
# Quick notes on the dataset:
# There are null values, these are entries with NaN (Not a Number).
# There's a missing row (row 457) which has NULL values for all 9 columns. 

Shared Methods and Attributes between Series and DataFrames. 

In [6]:
nba = pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [8]:
nba.head(2)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0


In [10]:
nba.tail(n = 1)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
457,,,,,,,,,


In [11]:
nba.index

RangeIndex(start=0, stop=458, step=1)

In [13]:
nba.values

# Returns the numpy array, which is essentially a multi-dimensional list of lists storing all the information that we have. 

array([['Avery Bradley', 'Boston Celtics', 0.0, ..., 180.0, 'Texas',
        7730337.0],
       ['Jae Crowder', 'Boston Celtics', 99.0, ..., 235.0, 'Marquette',
        6796117.0],
       ['John Holland', 'Boston Celtics', 30.0, ..., 205.0,
        'Boston University', nan],
       ...,
       ['Tibor Pleiss', 'Utah Jazz', 21.0, ..., 256.0, nan, 2900000.0],
       ['Jeff Withey', 'Utah Jazz', 24.0, ..., 231.0, 'Kansas', 947276.0],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=object)

In [15]:
nba.shape

# Returns a tuple and the deimension of the DataFrame (The number of rows and columns).

(458, 9)

In [19]:
nba.dtypes

# Returns a series object consisting of the index  (column names of our DataFrame) and the corresponding dtype of each column. 

Name         object
Team         object
Number      float64
Position     object
Age         float64
Height       object
Weight      float64
College      object
Salary      float64
dtype: object

In [22]:
nba.dtypes.value_counts()

# Counts the number of times a unique value occurs. Returns the count for each data type. 

object     5
float64    4
dtype: int64

In [23]:
# Now, some attributes exclusive to DataFrames. 

In [24]:
# columns : returns a list of the columns in the DataFrame. 
nba.columns

Index(['Name', 'Team', 'Number', 'Position', 'Age', 'Height', 'Weight',
       'College', 'Salary'],
      dtype='object')

In [25]:
# axes : returns a python of both of the indexes that we have in our DataFrame. 
nba.axes

[RangeIndex(start=0, stop=458, step=1),
 Index(['Name', 'Team', 'Number', 'Position', 'Age', 'Height', 'Weight',
        'College', 'Salary'],
       dtype='object')]

In [27]:
# info() : returns a big picture summary of the DataFrame as a whole. Bundles up what we saw previously into one output. 
nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 458 entries, 0 to 457
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   373 non-null    object 
 8   Salary    446 non-null    float64
dtypes: float64(4), object(5)
memory usage: 32.3+ KB


Differences Between Shared Methods

In [30]:
rev = pd.read_csv("revenue.csv", index_col = "Date")
rev 

Unnamed: 0_level_0,New York,Los Angeles,Miami
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/16,985,122,499
1/2/16,738,788,534
1/3/16,14,20,933
1/4/16,730,904,885
1/5/16,114,71,253
1/6/16,936,502,497
1/7/16,123,996,115
1/8/16,935,492,886
1/9/16,846,954,823
1/10/16,54,285,216


In [31]:
s = pd.Series([1, 2, 3])
s

0    1
1    2
2    3
dtype: int64

In [34]:
rev.sum()

# returns the sum for each column.

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [33]:
s.sum()

6

In [45]:
rev.sum(axis = "index")

# Moving along the index for every column and adding them up. The axis number is 0 for index. 

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [46]:
rev.sum(axis = 0)

# Same as above. 

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [43]:
rev.sum(axis = "columns")

# Moves across the columns and sums up results for each index label, which in this case, represents each date. 

Date
1/1/16     1606
1/2/16     2060
1/3/16      967
1/4/16     2519
1/5/16      438
1/6/16     1935
1/7/16     1234
1/8/16     2313
1/9/16     2623
1/10/16     555
dtype: int64

In [44]:
rev.sum(axis = 1)

# Same as above. The axis position for columns is 1. 

Date
1/1/16     1606
1/2/16     2060
1/3/16      967
1/4/16     2519
1/5/16      438
1/6/16     1935
1/7/16     1234
1/8/16     2313
1/9/16     2623
1/10/16     555
dtype: int64

Select One Column from a DataFrame

In [47]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [51]:
# Easy method for extracting a column as a series is to write the column name directly after the table name. 
# However, this method does not always work. Only works when column names do not have spaces. 
nba.Team

0      Boston Celtics
1      Boston Celtics
2      Boston Celtics
3      Boston Celtics
4      Boston Celtics
            ...      
453         Utah Jazz
454         Utah Jazz
455         Utah Jazz
456         Utah Jazz
457               NaN
Name: Team, Length: 458, dtype: object

In [55]:
nba.Name

Output = None # Trick to hide the output of this cell. 

In [59]:
# Second option works 100% of the time. Works even with spaces in the column name. 
nba["Name"]

0      Avery Bradley
1        Jae Crowder
2       John Holland
3        R.J. Hunter
4      Jonas Jerebko
           ...      
453     Shelvin Mack
454        Raul Neto
455     Tibor Pleiss
456      Jeff Withey
457              NaN
Name: Name, Length: 458, dtype: object

In [60]:
type(nba["Name"]) # Shows that when we extract a single column, it is extracted as a series object. 

pandas.core.series.Series

In [61]:
nba["Name"].head(2)

0    Avery Bradley
1      Jae Crowder
Name: Name, dtype: object

Select Two or More Columns from a DataFrame

In [62]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [74]:
# To extract two or more columns, provide a list of the columns you want to extract within the square brackets.
nba[["Name", "Team"]]

Unnamed: 0,Name,Team
0,Avery Bradley,Boston Celtics
1,Jae Crowder,Boston Celtics
2,John Holland,Boston Celtics
3,R.J. Hunter,Boston Celtics
4,Jonas Jerebko,Boston Celtics
...,...,...
453,Shelvin Mack,Utah Jazz
454,Raul Neto,Utah Jazz
455,Tibor Pleiss,Utah Jazz
456,Jeff Withey,Utah Jazz


In [74]:
# To extract two or more columns, provide a list of the columns you want to extract within the square brackets.
nba[["Name", "Team"]]

Unnamed: 0,Name,Team
0,Avery Bradley,Boston Celtics
1,Jae Crowder,Boston Celtics
2,John Holland,Boston Celtics
3,R.J. Hunter,Boston Celtics
4,Jonas Jerebko,Boston Celtics
...,...,...
453,Shelvin Mack,Utah Jazz
454,Raul Neto,Utah Jazz
455,Tibor Pleiss,Utah Jazz
456,Jeff Withey,Utah Jazz


In [78]:
# You can specify the order you want to columns to show up. 
nba[["Team", "Name"]].head(3)

Unnamed: 0,Team,Name
0,Boston Celtics,Avery Bradley
1,Boston Celtics,Jae Crowder
2,Boston Celtics,John Holland


In [80]:
# You can extract as many columns as you want using this method. 
nba[["Number", "College", "Salary"]]

Unnamed: 0,Number,College,Salary
0,0.0,Texas,7730337.0
1,99.0,Marquette,6796117.0
2,30.0,Boston University,
3,28.0,Georgia State,1148640.0
4,8.0,,5000000.0
...,...,...,...
453,8.0,Butler,2433333.0
454,25.0,,900000.0
455,21.0,,2900000.0
456,24.0,Kansas,947276.0


In [82]:
# Another way is to first create a variable for the list. 
col_list = ["Number", "College", "Salary"]
nba[col_list]

Unnamed: 0,Number,College,Salary
0,0.0,Texas,7730337.0
1,99.0,Marquette,6796117.0
2,30.0,Boston University,
3,28.0,Georgia State,1148640.0
4,8.0,,5000000.0
...,...,...,...
453,8.0,Butler,2433333.0
454,25.0,,900000.0
455,21.0,,2900000.0
456,24.0,Kansas,947276.0


Add New Column to DataFrame

In [83]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [84]:
nba["Name"]

# If the column "Name" does exist, then we extract the single column "Name".
# If it does not exist, it will return an error. 

0      Avery Bradley
1        Jae Crowder
2       John Holland
3        R.J. Hunter
4      Jonas Jerebko
           ...      
453     Shelvin Mack
454        Raul Neto
455     Tibor Pleiss
456      Jeff Withey
457              NaN
Name: Name, Length: 458, dtype: object

In [88]:
# First method to add a column is the following. The new column will always be added at the end using this method. 
nba["Sport"] = "Basketball"

# Even though "Sport" column does not exist, we have now assigned this column with a scalar value of "Basketball", meaning every value
# in this column will be "Basketball".

# Note that if we enter a column name that already exists such as:
# nba["Team"] = "Basketball"
# Then the whole "Team" column will be overwritten with the values "Basketball".

In [87]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Sport
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,Basketball
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,Basketball
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,Basketball


In [89]:
nba["League"] = "National Basketball Association"
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Sport,League
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,Basketball,National Basketball Association
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,Basketball,National Basketball Association
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,Basketball,National Basketball Association


In [96]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [97]:
# Second option of adding a new column to the DataFrame. You can specify where to add the column with this method. 
nba.insert(3, column = "Sport", value = "Basketball")
nba.head(3)

# The parameters in the insert() method:
# loc : provide the index location of where we would like to insert that column. 1st col --> index loc 0, 2nd col --> ind loc 1 and so on. 
# column : provide the name of the new column. 
# value : provide the value(s) you want to enter for that column. 

Unnamed: 0,Name,Team,Number,Sport,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,Basketball,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,Basketball,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,Basketball,SG,27.0,6-5,205.0,Boston University,


In [98]:
nba.insert(loc = 7, column = "League", value = "National Basketball Association")
nba.head(3)

Unnamed: 0,Name,Team,Number,Sport,Position,Age,Height,League,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,Basketball,PG,25.0,6-2,National Basketball Association,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,Basketball,SF,25.0,6-6,National Basketball Association,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,Basketball,SG,27.0,6-5,National Basketball Association,205.0,Boston University,


Broadcasting Operation

In [99]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [100]:
# Recall for the Series section, the apply() method is operated on every value within the series. This method is known as broadcasting. 
# We will look at a few methods that broadcasts operations to the entire series values one by one. 

In [101]:
# add() : adds a number (provided in the argument) to every value in the series. 
nba["Age"].add(5)

# Adds 5 to every existing value in the "Age" column. 
# The method will return NULL values when applied to NULL values. 

0      30.0
1      30.0
2      32.0
3      27.0
4      34.0
       ... 
453    31.0
454    29.0
455    31.0
456    31.0
457     NaN
Name: Age, Length: 458, dtype: float64

In [103]:
nba["Age"] + 5

# Same function as the cell above. Adds 5 to every value within our "Age" column. 

0      30.0
1      30.0
2      32.0
3      27.0
4      34.0
       ... 
453    31.0
454    29.0
455    31.0
456    31.0
457     NaN
Name: Age, Length: 458, dtype: float64

In [104]:
# subtract() : subtracts number (provied as the argument) from every value in the column. 
nba["Salary"].sub(5000000)

0      2730337.0
1      1796117.0
2            NaN
3     -3851360.0
4            0.0
         ...    
453   -2566667.0
454   -4100000.0
455   -2100000.0
456   -4052724.0
457          NaN
Name: Salary, Length: 458, dtype: float64

In [105]:
nba["Salary"] - 5000000

# Same result as above. Subtracts 5000000 from every value from the "Salary" column. 

0      2730337.0
1      1796117.0
2            NaN
3     -3851360.0
4            0.0
         ...    
453   -2566667.0
454   -4100000.0
455   -2100000.0
456   -4052724.0
457          NaN
Name: Salary, Length: 458, dtype: float64

In [106]:
# mul() : multiples every value from that column by the number (provided as the argument). 
nba["Weight"].mul(0.45)

# Converts weight in pounds to kilograms, achieved by multiplying pounds by 0.453592.

0       81.00
1      105.75
2       92.25
3       83.25
4      103.95
        ...  
453     91.35
454     80.55
455    115.20
456    103.95
457       NaN
Name: Weight, Length: 458, dtype: float64

In [108]:
nba["Weight in Kilograms"] = nba["Weight"] * 0.453592

# Same result as above. We have assigned a new column for weight in kilograms. 
# The new column does not have to be a singular value, it can be a new series. 

In [110]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight in Kilograms
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636


In [111]:
# div() : divides every value by the number provided as the argument. 
nba["Salary"].div(1000000)

0      7.730337
1      6.796117
2           NaN
3      1.148640
4      5.000000
         ...   
453    2.433333
454    0.900000
455    2.900000
456    0.947276
457         NaN
Name: Salary, Length: 458, dtype: float64

In [112]:
nba["Salary in Millions"] = nba["Salary"] / 1000000

# Assigned new columns for "Salary in Millions".

In [113]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight in Kilograms,Salary in Millions
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656,7.730337
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412,6.796117
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636,


A Review of the .value_counts() Method

.value_counts() is only available on a series. It's a great way to get a count of the unique values in a series. 

In [114]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [115]:
nba["Team"].value_counts()

# Pelicans have the most players (19), whereas the Magic have the least number of entries (14). 

New Orleans Pelicans      19
Memphis Grizzlies         18
New York Knicks           16
Milwaukee Bucks           16
Portland Trail Blazers    15
Dallas Mavericks          15
Toronto Raptors           15
San Antonio Spurs         15
Chicago Bulls             15
Utah Jazz                 15
Charlotte Hornets         15
Atlanta Hawks             15
Cleveland Cavaliers       15
Los Angeles Clippers      15
Detroit Pistons           15
Indiana Pacers            15
Golden State Warriors     15
Houston Rockets           15
Denver Nuggets            15
Washington Wizards        15
Philadelphia 76ers        15
Los Angeles Lakers        15
Boston Celtics            15
Oklahoma City Thunder     15
Sacramento Kings          15
Miami Heat                15
Phoenix Suns              15
Brooklyn Nets             15
Minnesota Timberwolves    14
Orlando Magic             14
Name: Team, dtype: int64

In [134]:
nba["Position"].value_counts()

# Returns the count for each position. There are the most SGs (102) in the league. 

SG    102
PF    100
PG     92
SF     85
C      78
Name: Position, dtype: int64

In [135]:
len(nba["Position"].value_counts())

# Returns the number of positions in the league. 

5

In [18]:
nba["Position"].value_counts().shape

# Another option to return the number of positions in the league. 

(5,)

In [20]:
nba["Position"].nunique()

#Easiest solution. 

5

In [17]:
nba["Salary"].value_counts().head(5)

# Returns the top 5 most common salaries in the nba. 

947276    31
845059    18
525093    13
0         11
981348     6
Name: Salary, dtype: int64

Drop Rows with NULL Values

In [152]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [153]:
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


In [4]:
# dropna() : drops rows that have one or more NULL values (by default). By default axis is set to 0 (rows). 
nba.dropna()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
...,...,...,...,...,...,...,...,...,...
449,Rodney Hood,Utah Jazz,5.0,SG,23.0,6-8,206.0,Duke,1348440.0
451,Chris Johnson,Utah Jazz,23.0,SF,26.0,6-6,206.0,Dayton,981348.0
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0


In [5]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [156]:
# Drop rows where all values are equal to NULL. 
nba.dropna(how = "all", inplace = True)

# inplace = True : overwrites the original nba DataFrame. 

In [157]:
# Drop columns (axis position equal to 1) with one or more NULL values. 
# To drop these columns permanently, use the (inplace = True) parameter. 
nba.dropna(axis = 1)

# nba.dropna(axis = "column") --> same. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0
...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0


In [159]:
# Remove rows if there is a NULL value in a specific column.

nba.dropna(axis = 0, subset = ["Salary"])

# Drops the row whenever there is a NULL value in the "Salary" column (does not take into account whether there are 
# NULL values anywhere else). 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
...,...,...,...,...,...,...,...,...,...
452,Trey Lyles,Utah Jazz,41.0,PF,20.0,6-10,234.0,Kentucky,2239800.0
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


Fill in NULL Values with the .fillna() Method

In [160]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [161]:
# fillna() : used to fill the NULL values. Parameters include:
# value : what we want to replace every NULL value with. 

nba.fillna(value = 0)

# Every NULL value has been replaced with a 0. This works well when all values are of the same data type (e.g. numberic). 
# However, in our case, we have other data types (string for College), so 0 does not work well. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,0,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,0,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,0,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [163]:
# Instead of calling .fillna() on our DataFrame, we call it on a specific column/series and overwrite with inplace = True. 

nba["Salary"].fillna(value = 0, inplace = True)

# Only the NULL values in the "Salary" column have been replaced with 0. 

In [164]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0.0


In [165]:
nba["College"].fillna(value = "No College", inplace = True)

In [167]:
nba.head(5)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,No College,5000000.0


The .astype() Method

Used to convert the data type of the columns. In order for this method to work, there cannot be any NULL values, so we will need to clean the data before using this method. 

In [3]:
nba = pd.read_csv("nba.csv")
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


In [7]:
nba = nba.dropna(how = "all")
nba.tail(3)

# Removed rows which had NULL values for all columns. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [9]:
nba["Salary"].fillna(value = 0, inplace = True)
nba["College"].fillna(value = "None", inplace = True )
nba.head(6)

# We have now taken care of the NULL values within our Salary and College columns. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0


In [11]:
nba.dtypes
nba.info()
# In Python, whenever there are NULL values in the numeric column, the data type will be set to float. 
# Now they are removed, we can convert the dtype of those columns to INT. 

<class 'pandas.core.frame.DataFrame'>
Int64Index: 457 entries, 0 to 456
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      457 non-null    object 
 1   Team      457 non-null    object 
 2   Number    457 non-null    float64
 3   Position  457 non-null    object 
 4   Age       457 non-null    float64
 5   Height    457 non-null    object 
 6   Weight    457 non-null    float64
 7   College   457 non-null    object 
 8   Salary    457 non-null    float64
dtypes: float64(4), object(5)
memory usage: 35.7+ KB


In [13]:
nba["Salary"] = nba["Salary"].astype("int")

# .astype() : First argument is the dtype you want to convert the column to. 
# We will convert the data type of the salary column from floats to integers. 
# Since the .astype() method does not have inplace parameters, we need to reassign the nba["Salary"] series. 

In [14]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0


In [15]:
nba["Number"] = nba["Number"].astype("int")
nba["Age"] = nba["Age"].astype("int")

In [16]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,6-2,180.0,Texas,7730337
1,Jae Crowder,Boston Celtics,99,SF,25,6-6,235.0,Marquette,6796117
2,John Holland,Boston Celtics,30,SG,27,6-5,205.0,Boston University,0


In [None]:
# Category is an ideal dtype when you have a small number of unique values in a dataframe. E.g, Gender column, months column. 
# Reduces storage compared to string, instead we are referencing only two objects (in the case of Male and Female). 

In [21]:
nba["Position"].nunique()

# Small number of unique values, therefore perfect for Category dtype. 

5

In [23]:
nba["Position"] = nba["Position"].astype("category")

# Rather than saving 457 unique strings, we just save 5 objects, one for each position and every time we run into the same values
# it's just going to point to the same object in the memory. You can check how much the memory usage has decreased by checking info().

In [26]:
nba["Team"] = nba["Team"].astype("category")

# Since there are only 30 teams, this is another ideal candidate to change to Category dtype. 

SyntaxError: invalid syntax (<ipython-input-26-d4ce58c44ec6>, line 3)

Sort a DataFrame with the .sort_values() Method - Part 1

Sorting a DataFrame using one column. 

In [27]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [28]:
# Sort by Name column in descending order. 

nba.sort_values(by = "Name", ascending = False)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
237,Zaza Pachulia,Dallas Mavericks,27.0,C,32.0,6-11,275.0,,5200000.0
271,Zach Randolph,Memphis Grizzlies,50.0,PF,34.0,6-9,260.0,Michigan State,9638555.0
402,Zach LaVine,Minnesota Timberwolves,8.0,PG,21.0,6-5,189.0,UCLA,2148360.0
270,Xavier Munford,Memphis Grizzlies,14.0,PG,24.0,6-3,180.0,Rhode Island,
386,Wilson Chandler,Denver Nuggets,21.0,SF,29.0,6-8,225.0,DePaul,10449438.0
...,...,...,...,...,...,...,...,...,...
404,Adreian Payne,Minnesota Timberwolves,33.0,PF,25.0,6-10,237.0,Michigan State,1938840.0
328,Aaron Harrison,Charlotte Hornets,9.0,SG,21.0,6-6,210.0,Kentucky,525093.0
356,Aaron Gordon,Orlando Magic,0.0,PF,20.0,6-9,220.0,Arizona,4171680.0
152,Aaron Brooks,Chicago Bulls,0.0,PG,31.0,6-0,161.0,Oregon,2250000.0


In [29]:
nba.sort_values(by = "Salary", ascending = False).head(3)

# To view the top 3 highest earning in the league. Again, this method is not actually modifying the original dataframe until we add
# the inplace = True parameter. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
109,Kobe Bryant,Los Angeles Lakers,24.0,SF,37.0,6-6,212.0,,25000000.0
169,LeBron James,Cleveland Cavaliers,23.0,SF,31.0,6-8,250.0,,22970500.0
33,Carmelo Anthony,New York Knicks,7.0,SF,32.0,6-8,240.0,Syracuse,22875000.0


In [30]:
# When sorting NULL values, by default, they will be at the very end. 
# This is due to the parameter na_position = "last" by default. We can change to na_position = "first" for NULL to be on top. 

nba.sort_values("Salary").tail(5)


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
350,Briante Weber,Miami Heat,12.0,PG,23.0,6-2,165.0,Virginia Commonwealth,
353,Dorell Wright,Miami Heat,11.0,SF,30.0,6-9,205.0,,
397,Axel Toupane,Denver Nuggets,6.0,SG,23.0,6-7,210.0,,
409,Greg Smith,Minnesota Timberwolves,4.0,PF,25.0,6-10,250.0,Fresno State,
457,,,,,,,,,


Sort a DataFrame with the .sort_values() Method - Part 2

Sorting the DataFrame using multiple columns. 

In [31]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [34]:
nba.sort_values(by = ["Team", "Name"])

# First sort by Team then by Name column. By default, both teams will be sorted in ascending order. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0
318,Dennis Schroder,Atlanta Hawks,17.0,PG,22.0,6-1,172.0,,1763400.0
323,Jeff Teague,Atlanta Hawks,0.0,PG,27.0,6-2,186.0,Wake Forest,8000000.0
309,Kent Bazemore,Atlanta Hawks,24.0,SF,26.0,6-5,201.0,Old Dominion,2000000.0
311,Kirk Hinrich,Atlanta Hawks,12.0,SG,35.0,6-4,190.0,Kansas,2854940.0
...,...,...,...,...,...,...,...,...,...
376,Markieff Morris,Washington Wizards,5.0,PF,26.0,6-10,245.0,Kansas,8000000.0
375,Nene Hilario,Washington Wizards,42.0,C,33.0,6-11,250.0,,13000000.0
378,Otto Porter Jr.,Washington Wizards,22.0,SF,23.0,6-8,198.0,Georgetown,4662960.0
379,Ramon Sessions,Washington Wizards,7.0,PG,30.0,6-3,190.0,Nevada,2170465.0


In [35]:
nba.sort_values(["Team", "Name"], ascending = False)

# Sorts by Team then by Name, both in descending order. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
379,Ramon Sessions,Washington Wizards,7.0,PG,30.0,6-3,190.0,Nevada,2170465.0
378,Otto Porter Jr.,Washington Wizards,22.0,SF,23.0,6-8,198.0,Georgetown,4662960.0
375,Nene Hilario,Washington Wizards,42.0,C,33.0,6-11,250.0,,13000000.0
376,Markieff Morris,Washington Wizards,5.0,PF,26.0,6-10,245.0,Kansas,8000000.0
381,Marcus Thornton,Washington Wizards,15.0,SF,29.0,6-4,205.0,LSU,200600.0
...,...,...,...,...,...,...,...,...,...
309,Kent Bazemore,Atlanta Hawks,24.0,SF,26.0,6-5,201.0,Old Dominion,2000000.0
323,Jeff Teague,Atlanta Hawks,0.0,PG,27.0,6-2,186.0,Wake Forest,8000000.0
318,Dennis Schroder,Atlanta Hawks,17.0,PG,22.0,6-1,172.0,,1763400.0
312,Al Horford,Atlanta Hawks,15.0,C,30.0,6-10,245.0,Florida,12000000.0


In [40]:
# In order to sort different columns in a different fashion. e.g, one in asc and other in desc. We create a list for the asceding parameter. 

nba.sort_values(by = ["Team", "Name"], ascending = [True, False], inplace = True)
nba.head(5)
#Sort by Team by ascedning order then by Name in descending order. 
# inplace = True parameter overwrites the original DataFrame. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
322,Walter Tavares,Atlanta Hawks,22.0,C,24.0,7-3,260.0,,1000000.0
310,Tim Hardaway Jr.,Atlanta Hawks,10.0,SG,24.0,6-6,205.0,Michigan,1304520.0
321,Tiago Splitter,Atlanta Hawks,11.0,C,31.0,6-11,245.0,,9756250.0
320,Thabo Sefolosha,Atlanta Hawks,25.0,SF,32.0,6-7,220.0,,4000000.0
315,Paul Millsap,Atlanta Hawks,4.0,PF,31.0,6-8,246.0,Louisiana Tech,18671659.0


Sort DataFrame with the .sort_index() Method

Very similar with how it works on a series since we do not have to specify columns. 

In [41]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [42]:
nba.sort_values(["Number", "Salary", "Name"], inplace = True)
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
68,Lucas Nogueira,Toronto Raptors,92.0,C,23.0,7-0,220.0,,1842000.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
457,,,,,,,,,


In [43]:
nba.sort_index()

# Sorts by index, essentially returns back to original DataFrame. 
# Since, we did not enter any parameters, the arguments will be set to default. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [45]:
nba.sort_index(ascending = False, inplace = True)
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
457,,,,,,,,,
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


Rank Values with the .rank() Method

We can call the .rank() method on a series to generate a brand new series of ranks.   
In order for it to work, we will have to get rid of the NULL values. 

In [47]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [48]:
nba = nba.dropna(how = "all")
nba["Salary"] = nba["Salary"].fillna(0).astype("int")
nba.head(3)

# The reason we don't use inplace for the .fillna() in this case is because we still need to change the dtype to int using .astype(). 


Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0


In [61]:
# Rank the players by their salary. E.g. Player with the highest salary has rank 1. 

nba["Salary Rank"] = nba["Salary"].rank(ascending = False).astype("int")

# Now we have a new series of integers representing the rank of the players' salary relative to the entire salary series.  
# This series is stored in the column "Salary Rank".

In [64]:
nba.sort_values(by = "Salary", ascending = False)

# By sorting by salary in descending order, we can verify that the Salary Rank series is working properly. 
# Kobe has the highest salary with $25M. LeBron is second highest with $22,970,500. 

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Salary Rank
109,Kobe Bryant,Los Angeles Lakers,24.0,SF,37.0,6-6,212.0,,25000000,1
169,LeBron James,Cleveland Cavaliers,23.0,SF,31.0,6-8,250.0,,22970500,2
33,Carmelo Anthony,New York Knicks,7.0,SF,32.0,6-8,240.0,Syracuse,22875000,3
251,Dwight Howard,Houston Rockets,12.0,C,30.0,6-11,265.0,,22359364,4
339,Chris Bosh,Miami Heat,1.0,PF,32.0,6-11,235.0,Georgia Tech,22192730,5
...,...,...,...,...,...,...,...,...,...,...
353,Dorell Wright,Miami Heat,11.0,SF,30.0,6-9,205.0,,0,452
264,Jordan Farmar,Memphis Grizzlies,4.0,PG,29.0,6-2,180.0,UCLA,0,452
409,Greg Smith,Minnesota Timberwolves,4.0,PF,25.0,6-10,250.0,Fresno State,0,452
273,Alex Stepheson,Memphis Grizzlies,35.0,PF,28.0,6-10,270.0,USC,0,452
