## Import Libraries
- A library or a Python built-in module is a bundle of features that add new functionality.
- Libraries are not imported by default to save memory -- use them as you need them.
- Libraries and modules must be imported into each project (the Jupyter Notebook)
- Import a library with the **`import`** keyword followed by the library name:  
- Libraries can be assigned an alias, or an alternate name. Aliases are optional.
- An alias reduces the amount of characters a user has to type when referencing a library.
- An alias can have any name; the community standard for `pandas` is `pd`.
- Follow the `import` statement with the `as` keyword and the alias (no quotes)
- Multiple libraries can be imported in a single cell

## Introduction to pandas
- **Pandas** is a data analysis library built on top of the Python programming language.
- The `pandas` library is built on top of the `numpy` library (Numerical Python)

In [1]:
import pandas as pd
import numpy as np

# DataFrames
- A **DataFrame** is a 2-dimensional table consisting of rows and columns.
- Pandas uses a `NaN` designation for cells that have a missing value. It is short for "not a number". Most operations on `NaN` values will produce `NaN` values.
- Pandas assigns an index position/label to each **DataFrame** row.

In [21]:
nba = pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [3]:
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0


In [4]:
nba.head(n=1)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0


In [5]:
nba.head(8)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
5,Trent Forrest,Atlanta Hawks,G,6-4,210.0,Florida State,508891.0
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920.0
7,Mouhamed Gueye,Atlanta Hawks,F,6-11,210.0,Washington State,1119563.0


In [6]:
nba.tail()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0
591,,,,,,,


In [7]:
nba.tail(n=7)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0
591,,,,,,,


In [8]:
nba.tail(1)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
591,,,,,,,


In [12]:
nba.index

RangeIndex(start=0, stop=592, step=1)

In [13]:
nba.values

array([['Saddiq Bey', 'Atlanta Hawks', 'F', ..., 215.0, 'Villanova',
        4556983.0],
       ['Bogdan Bogdanovic', 'Atlanta Hawks', 'G', ..., 225.0,
        'Fenerbahce', 18700000.0],
       ['Kobe Bufkin', 'Atlanta Hawks', 'G', ..., 195.0, 'Michigan',
        4094244.0],
       ...,
       ['Tristan Vukcevic', 'Washington Wizards', 'F', ..., 220.0,
        'Real Madrid', nan],
       ['Delon Wright', 'Washington Wizards', 'G', ..., 185.0, 'Utah',
        8195122.0],
       [nan, nan, nan, ..., nan, nan, nan]], shape=(592, 7), dtype=object)

In [14]:
nba.shape

(592, 7)

In [15]:
nba.dtypes

Name         object
Team         object
Position     object
Height       object
Weight      float64
College      object
Salary      float64
dtype: object

In [16]:
nba.columns

Index(['Name', 'Team', 'Position', 'Height', 'Weight', 'College', 'Salary'], dtype='object')

In [17]:
nba.axes

[RangeIndex(start=0, stop=592, step=1),
 Index(['Name', 'Team', 'Position', 'Height', 'Weight', 'College', 'Salary'], dtype='object')]

In [18]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 592 entries, 0 to 591
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      591 non-null    object 
 1   Team      591 non-null    object 
 2   Position  584 non-null    object 
 3   Height    585 non-null    object 
 4   Weight    584 non-null    float64
 5   College   578 non-null    object 
 6   Salary    488 non-null    float64
dtypes: float64(2), object(5)
memory usage: 32.5+ KB


## Differences between Shared Methods
- The `sum` method adds a **Series's** values.
- On a **DataFrame**, the `sum` method defaults to adding the values by traversing the index (row values).
- The `axis` parameter customizes the direction that we add across. Pass `"columns"` or `1` to add "across" the columns.

In [10]:
revenue = pd.read_csv("revenue.csv", index_col="Date")
revenue

Unnamed: 0_level_0,New York,Los Angeles,Miami
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1/1/26,985,122,499
1/2/26,738,788,534
1/3/26,14,20,933
1/4/26,730,904,885
1/5/26,114,71,253
1/6/26,936,502,497
1/7/26,123,996,115
1/8/26,935,492,886
1/9/26,846,954,823
1/10/26,54,285,216


In [12]:
revenue.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, 1/1/26 to 1/10/26
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype
---  ------       --------------  -----
 0   New York     10 non-null     int64
 1   Los Angeles  10 non-null     int64
 2   Miami        10 non-null     int64
dtypes: int64(3)
memory usage: 320.0+ bytes


In [22]:
revenue.sum()

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [13]:
revenue.sum(axis="index")
#revenue.sum(axis=0)

New York       5475
Los Angeles    5134
Miami          5641
dtype: int64

In [24]:
revenue.sum(axis="columns")
#revenue.sum(axis=1)

Date
1/1/26     1606
1/2/26     2060
1/3/26      967
1/4/26     2519
1/5/26      438
1/6/26     1935
1/7/26     1234
1/8/26     2313
1/9/26     2623
1/10/26     555
dtype: int64

In [25]:
revenue.sum(axis="columns").sum()

np.int64(16250)

## Select One Column from a DataFrame
- We can use attribute syntax (`df.column_name`) to select a column from a **DataFrame**. The syntax will not work if the column name has spaces.
- We can also use square bracket syntax (`df["column name"]`) which will work for any column name.
- Pandas extracts a column from a **DataFrame** as a **Series**.
- The **Series** is a view, so changes to the **Series** *will* affect the **DataFrame**.
- Pandas will display a warning if you mutate the **Series**. Use the `copy` method to create a duplicate.

In [14]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [15]:
nba.Team   # no space in name
# nba.team   # error

0           Atlanta Hawks
1           Atlanta Hawks
2           Atlanta Hawks
3           Atlanta Hawks
4           Atlanta Hawks
              ...        
587    Washington Wizards
588    Washington Wizards
589    Washington Wizards
590    Washington Wizards
591                   NaN
Name: Team, Length: 592, dtype: object

In [27]:
nba["Salary"]  # space in name

0       4556983.0
1      18700000.0
2       4094244.0
3      20616000.0
4       2581522.0
          ...    
587     1719864.0
588    10250000.0
589           NaN
590     8195122.0
591           NaN
Name: Salary, Length: 592, dtype: float64

In [22]:
#names = nba["Name"]
names = nba["Name"].copy()
names

0             Saddiq Bey
1      Bogdan Bogdanovic
2            Kobe Bufkin
3           Clint Capela
4         Bruno Fernando
             ...        
587         Ryan Rollins
588        Landry Shamet
589     Tristan Vukcevic
590         Delon Wright
591                  NaN
Name: Name, Length: 592, dtype: object

In [23]:
names.iloc[0] 

'Saddiq Bey'

In [24]:
names.iloc[0] = "Whatever"

In [26]:
names

0               Whatever
1      Bogdan Bogdanovic
2            Kobe Bufkin
3           Clint Capela
4         Bruno Fernando
             ...        
587         Ryan Rollins
588        Landry Shamet
589     Tristan Vukcevic
590         Delon Wright
591                  NaN
Name: Name, Length: 592, dtype: object

In [25]:
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0


## Select Multiple Columns from a DataFrame
- Use square brackets with a list of names to extract multiple **DataFrame** columns.
- Pandas stores the result in a new **DataFrame** (a copy).

In [27]:
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0


In [28]:
nba[["Name", "Team"]]

Unnamed: 0,Name,Team
0,Saddiq Bey,Atlanta Hawks
1,Bogdan Bogdanovic,Atlanta Hawks
2,Kobe Bufkin,Atlanta Hawks
3,Clint Capela,Atlanta Hawks
4,Bruno Fernando,Atlanta Hawks
...,...,...
587,Ryan Rollins,Washington Wizards
588,Landry Shamet,Washington Wizards
589,Tristan Vukcevic,Washington Wizards
590,Delon Wright,Washington Wizards


In [29]:
nba[["Team", "Name"]]

Unnamed: 0,Team,Name
0,Atlanta Hawks,Saddiq Bey
1,Atlanta Hawks,Bogdan Bogdanovic
2,Atlanta Hawks,Kobe Bufkin
3,Atlanta Hawks,Clint Capela
4,Atlanta Hawks,Bruno Fernando
...,...,...
587,Washington Wizards,Ryan Rollins
588,Washington Wizards,Landry Shamet
589,Washington Wizards,Tristan Vukcevic
590,Washington Wizards,Delon Wright


In [46]:
nba[["Salary", "Team", "Name"]]

Unnamed: 0,Salary,Team,Name
0,4556983.0,Atlanta Hawks,Saddiq Bey
1,18700000.0,Atlanta Hawks,Bogdan Bogdanovic
2,4094244.0,Atlanta Hawks,Kobe Bufkin
3,20616000.0,Atlanta Hawks,Clint Capela
4,2581522.0,Atlanta Hawks,Bruno Fernando
...,...,...,...
587,1719864.0,Washington Wizards,Ryan Rollins
588,10250000.0,Washington Wizards,Landry Shamet
589,,Washington Wizards,Tristan Vukcevic
590,8195122.0,Washington Wizards,Delon Wright


In [30]:
columns_to_select = ["Salary", "Team", "Name"]
nba[columns_to_select]

Unnamed: 0,Salary,Team,Name
0,4556983.0,Atlanta Hawks,Saddiq Bey
1,18700000.0,Atlanta Hawks,Bogdan Bogdanovic
2,4094244.0,Atlanta Hawks,Kobe Bufkin
3,20616000.0,Atlanta Hawks,Clint Capela
4,2581522.0,Atlanta Hawks,Bruno Fernando
...,...,...,...
587,1719864.0,Washington Wizards,Ryan Rollins
588,10250000.0,Washington Wizards,Landry Shamet
589,,Washington Wizards,Tristan Vukcevic
590,8195122.0,Washington Wizards,Delon Wright


## Add New Column to DataFrame
- Use square bracket extraction syntax with an equal sign to add a new **Series** to a **DataFrame**.
- The `insert` method allows us to insert an element at a specific column index.
- On the right-hand side, we can reference an existing **DataFrame** column and perform a broadcasting operation on it to create the new **Series**.

In [48]:
nba["Sport"] = "Basketball"

In [49]:
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,Sport
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0,Basketball
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0,Basketball
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0,Basketball
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0,Basketball
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0,Basketball


In [50]:
nba.insert(loc=3, column="Sports", value="Basketball Test")

In [51]:
nba.head()

Unnamed: 0,Name,Team,Position,Sports,Height,Weight,College,Salary,Sport
0,Saddiq Bey,Atlanta Hawks,F,Basketball Test,6-7,215.0,Villanova,4556983.0,Basketball
1,Bogdan Bogdanovic,Atlanta Hawks,G,Basketball Test,6-5,225.0,Fenerbahce,18700000.0,Basketball
2,Kobe Bufkin,Atlanta Hawks,G,Basketball Test,6-5,195.0,Michigan,4094244.0,Basketball
3,Clint Capela,Atlanta Hawks,C,Basketball Test,6-10,256.0,Elan Chalon,20616000.0,Basketball
4,Bruno Fernando,Atlanta Hawks,F-C,Basketball Test,6-10,240.0,Maryland,2581522.0,Basketball


In [52]:
nba["Salary"] * 2

0       9113966.0
1      37400000.0
2       8188488.0
3      41232000.0
4       5163044.0
          ...    
587     3439728.0
588    20500000.0
589           NaN
590    16390244.0
591           NaN
Name: Salary, Length: 592, dtype: float64

In [53]:
nba.head()

Unnamed: 0,Name,Team,Position,Sports,Height,Weight,College,Salary,Sport
0,Saddiq Bey,Atlanta Hawks,F,Basketball Test,6-7,215.0,Villanova,4556983.0,Basketball
1,Bogdan Bogdanovic,Atlanta Hawks,G,Basketball Test,6-5,225.0,Fenerbahce,18700000.0,Basketball
2,Kobe Bufkin,Atlanta Hawks,G,Basketball Test,6-5,195.0,Michigan,4094244.0,Basketball
3,Clint Capela,Atlanta Hawks,C,Basketball Test,6-10,256.0,Elan Chalon,20616000.0,Basketball
4,Bruno Fernando,Atlanta Hawks,F-C,Basketball Test,6-10,240.0,Maryland,2581522.0,Basketball


In [54]:
nba["Salary"].mul(2)

0       9113966.0
1      37400000.0
2       8188488.0
3      41232000.0
4       5163044.0
          ...    
587     3439728.0
588    20500000.0
589           NaN
590    16390244.0
591           NaN
Name: Salary, Length: 592, dtype: float64

In [55]:
nba["Salary Doubled"] = nba["Salary"].mul(2)

In [56]:
nba.head()

Unnamed: 0,Name,Team,Position,Sports,Height,Weight,College,Salary,Sport,Salary Doubled
0,Saddiq Bey,Atlanta Hawks,F,Basketball Test,6-7,215.0,Villanova,4556983.0,Basketball,9113966.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Basketball Test,6-5,225.0,Fenerbahce,18700000.0,Basketball,37400000.0
2,Kobe Bufkin,Atlanta Hawks,G,Basketball Test,6-5,195.0,Michigan,4094244.0,Basketball,8188488.0
3,Clint Capela,Atlanta Hawks,C,Basketball Test,6-10,256.0,Elan Chalon,20616000.0,Basketball,41232000.0
4,Bruno Fernando,Atlanta Hawks,F-C,Basketball Test,6-10,240.0,Maryland,2581522.0,Basketball,5163044.0


In [57]:
nba["Salary"] - 5000000

0       -443017.0
1      13700000.0
2       -905756.0
3      15616000.0
4      -2418478.0
          ...    
587    -3280136.0
588     5250000.0
589           NaN
590     3195122.0
591           NaN
Name: Salary, Length: 592, dtype: float64

In [58]:
nba["Salary"].sub(5000000)

0       -443017.0
1      13700000.0
2       -905756.0
3      15616000.0
4      -2418478.0
          ...    
587    -3280136.0
588     5250000.0
589           NaN
590     3195122.0
591           NaN
Name: Salary, Length: 592, dtype: float64

In [59]:
nba["New Salary"] = nba["Salary"].sub(5000000)

In [60]:
nba.head()

Unnamed: 0,Name,Team,Position,Sports,Height,Weight,College,Salary,Sport,Salary Doubled,New Salary
0,Saddiq Bey,Atlanta Hawks,F,Basketball Test,6-7,215.0,Villanova,4556983.0,Basketball,9113966.0,-443017.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,Basketball Test,6-5,225.0,Fenerbahce,18700000.0,Basketball,37400000.0,13700000.0
2,Kobe Bufkin,Atlanta Hawks,G,Basketball Test,6-5,195.0,Michigan,4094244.0,Basketball,8188488.0,-905756.0
3,Clint Capela,Atlanta Hawks,C,Basketball Test,6-10,256.0,Elan Chalon,20616000.0,Basketball,41232000.0,15616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,Basketball Test,6-10,240.0,Maryland,2581522.0,Basketball,5163044.0,-2418478.0


In [34]:
nba["test"] = ""

In [35]:
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,test
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0,
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0,
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0,
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0,
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0,


In [36]:
nba = pd.read_csv("nba.csv")
nba.head()

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0


## A Review of the value_counts Method
- The `value_counts` method counts the number of times that each unique value occurs in a **Series**.

In [37]:
nba["Position"].value_counts()


Position
G      229
F      187
C       47
G-F     46
F-C     37
C-F     23
F-G     15
Name: count, dtype: int64

In [64]:
nba["Position"].value_counts(normalize=True)

Position
G      0.392123
F      0.320205
C      0.080479
G-F    0.078767
F-C    0.063356
C-F    0.039384
F-G    0.025685
Name: proportion, dtype: float64

## Drop Rows with Missing Values
- Pandas uses a `NaN` designation for cells that have a missing value.
- The `dropna` method deletes rows with missing values. Its default behavior is to remove a row if it has *any* missing values.
- Pass the `how` parameter an argument of "all" to delete rows where all the values are `NaN`.
- The `subset` parameters customizes/limits the columns that pandas will use to drop rows with missing values.

In [69]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [38]:
nba.dropna()   #  deltete all row which have any NaN value 

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0


In [39]:
nba = pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [40]:
 nba.dropna(how="any")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0


In [50]:
nba = pd.read_csv("nba.csv")
nba.dropna(how="all")
   

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [45]:
nba.dropna(subset=["Salary"])

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0


In [47]:
nba.dropna(subset=["College"])

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [49]:
nba.dropna(subset=["College", "Salary"])

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0


In [51]:
nba.dropna(subset=["College", "Salary"] , how="all")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [56]:
nba.isnull().sum(axis=1)

0      0
1      0
2      0
3      0
4      0
      ..
587    0
588    0
589    1
590    0
591    7
Length: 592, dtype: int64

In [57]:
nba = pd.read_csv("nba.csv")
nba.dropna(thresh=2)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


## Fill in Missing Values with the fillna Method
- The `fillna` method replaces missing `NaN` values with its argument.
- The `fillna` method is available on both **DataFrames** and **Series**.
- An extracted **Series** is a view on the original **DataFrame**, but the `fillna` method returns a copy.

In [69]:
nba = pd.read_csv("nba.csv").dropna(how="all")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [79]:
nba.fillna(0)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0


In [70]:
nba["Salary"].count()

np.int64(488)

In [71]:
nba["Salary"] = nba["Salary"].fillna(0)

In [72]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0


In [73]:
nba["Salary"].count()

np.int64(591)

In [75]:
nba["College"].count()

np.int64(578)

In [78]:
nba["College"] = nba["College"].fillna(value="Unknown")

In [79]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0


In [80]:
nba["College"].count()

np.int64(591)

## The astype Method I
- The `astype` method converts a **Series's** values to a specified type.
- Pass in the specified type as either a string or the core Python data type.
- Pandas cannot convert `NaN` values to numeric types, so we need to eliminate/replace them before we perform the conversion.
- The `dtypes` attribute returns a **Series** with the **DataFrame's** columns and their types.
- The `category` type is ideal for columns with a limited number of unique values.
- The `nunique` method will return a **Series** with the number of unique values in each column.
- With categories, pandas does not create a separate value in memory for each "cell". Rather, the cells point to a single copy for each unique value.

In [85]:
nba = pd.read_csv("nba.csv").dropna(how="all")
nba["Salary"] = nba["Salary"].fillna(0)
nba["Weight"] = nba["Weight"].fillna(0)
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0


In [81]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Index: 591 entries, 0 to 590
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      591 non-null    object 
 1   Team      591 non-null    object 
 2   Position  584 non-null    object 
 3   Height    585 non-null    object 
 4   Weight    584 non-null    float64
 5   College   591 non-null    object 
 6   Salary    591 non-null    float64
dtypes: float64(2), object(5)
memory usage: 36.9+ KB


In [82]:
nba["Salary"].astype("int")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357.0
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0.0


In [88]:
nba["Salary"].astype(int)

0       4556983
1      18700000
2       4094244
3      20616000
4       2581522
         ...   
586    27955357
587     1719864
588    10250000
589           0
590     8195122
Name: Salary, Length: 591, dtype: int64

In [83]:
nba["Salary"] = nba["Salary"].astype(int)

In [84]:
nba


Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0


In [85]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Index: 591 entries, 0 to 590
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      591 non-null    object 
 1   Team      591 non-null    object 
 2   Position  584 non-null    object 
 3   Height    585 non-null    object 
 4   Weight    584 non-null    float64
 5   College   591 non-null    object 
 6   Salary    591 non-null    int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 36.9+ KB


In [87]:
df = pd.DataFrame( {
    'int' : [ 1,2,3,4],
    'float' : [1.0,2.0,3.0,4.0]
})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   int     4 non-null      int64  
 1   float   4 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 196.0 bytes


In [88]:
df.memory_usage()

Index    132
int       32
float     32
dtype: int64

In [89]:
nba["Team"].nunique()

30

In [92]:
nba.nunique()

Name        591
Team         30
Position      7
Height       20
Weight       94
College     182
Salary      299
dtype: int64

In [90]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Index: 591 entries, 0 to 590
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Name      591 non-null    object 
 1   Team      591 non-null    object 
 2   Position  584 non-null    object 
 3   Height    585 non-null    object 
 4   Weight    584 non-null    float64
 5   College   591 non-null    object 
 6   Salary    591 non-null    int64  
dtypes: float64(1), int64(1), object(5)
memory usage: 36.9+ KB


In [93]:
nba["Position"] = nba["Position"].astype("category")

In [94]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Index: 591 entries, 0 to 590
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Name      591 non-null    object  
 1   Team      591 non-null    category
 2   Position  584 non-null    category
 3   Height    585 non-null    object  
 4   Weight    584 non-null    float64 
 5   College   591 non-null    object  
 6   Salary    591 non-null    int64   
dtypes: category(2), float64(1), int64(1), object(3)
memory usage: 30.5+ KB


In [95]:
nba["Team"] = nba["Team"].astype("category")

In [96]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Index: 591 entries, 0 to 590
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Name      591 non-null    object  
 1   Team      591 non-null    category
 2   Position  584 non-null    category
 3   Height    585 non-null    object  
 4   Weight    584 non-null    float64 
 5   College   591 non-null    object  
 6   Salary    591 non-null    int64   
dtypes: category(2), float64(1), int64(1), object(3)
memory usage: 30.5+ KB


In [97]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0


## Sort a DataFrame with the sort_values Method I
- The `sort_values` method sorts a **DataFrame** by the values in one or more columns. The default sort is an ascending one (alphabetical for strings).
- The first parameter (`by`) expects the column(s) to sort by.
- If sorting by a single column, pass a string with its name.
- The `ascending` parameter customizes the sort order.
- The `na_position` parameter customizes where pandas places `NaN` values.

In [99]:
nba.sort_values("Name")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
122,A.J. Lawson,Dallas Mavericks,G,6-6,179.0,South Carolina,0
324,AJ Green,Milwaukee Bucks,G,6-5,190.0,Northern Iowa,1901769
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920
141,Aaron Gordon,Denver Nuggets,F,6-8,235.0,Arizona,22266182
198,Aaron Holiday,Houston Rockets,G,6-0,185.0,UCLA,2346614
...,...,...,...,...,...,...,...
515,Zach Collins,San Antonio Spurs,F-C,6-11,250.0,Gonzaga,7700000
83,Zach LaVine,Chicago Bulls,G,6-5,200.0,UCLA,40064220
149,Zeke Nnaji,Denver Nuggets,F-C,6-9,240.0,Arizona,4306281
291,Ziaire Williams,Memphis Grizzlies,F,6-9,185.0,Stanford,4810200


In [100]:
nba["Team"].sort_values()

0           Atlanta Hawks
17          Atlanta Hawks
16          Atlanta Hawks
15          Atlanta Hawks
14          Atlanta Hawks
              ...        
571    Washington Wizards
570    Washington Wizards
589    Washington Wizards
579    Washington Wizards
590    Washington Wizards
Name: Team, Length: 591, dtype: category
Categories (30, object): ['Atlanta Hawks', 'Boston Celtics', 'Brooklyn Nets', 'Charlotte Hornets', ..., 'San Antonio Spurs', 'Toronto Raptors', 'Utah Jazz', 'Washington Wizards']

In [101]:
nba.sort_values(by="Name")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
122,A.J. Lawson,Dallas Mavericks,G,6-6,179.0,South Carolina,0
324,AJ Green,Milwaukee Bucks,G,6-5,190.0,Northern Iowa,1901769
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920
141,Aaron Gordon,Denver Nuggets,F,6-8,235.0,Arizona,22266182
198,Aaron Holiday,Houston Rockets,G,6-0,185.0,UCLA,2346614
...,...,...,...,...,...,...,...
515,Zach Collins,San Antonio Spurs,F-C,6-11,250.0,Gonzaga,7700000
83,Zach LaVine,Chicago Bulls,G,6-5,200.0,UCLA,40064220
149,Zeke Nnaji,Denver Nuggets,F-C,6-9,240.0,Arizona,4306281
291,Ziaire Williams,Memphis Grizzlies,F,6-9,185.0,Stanford,4810200


In [102]:
nba.sort_values(by="Name", ascending=True)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
122,A.J. Lawson,Dallas Mavericks,G,6-6,179.0,South Carolina,0
324,AJ Green,Milwaukee Bucks,G,6-5,190.0,Northern Iowa,1901769
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920
141,Aaron Gordon,Denver Nuggets,F,6-8,235.0,Arizona,22266182
198,Aaron Holiday,Houston Rockets,G,6-0,185.0,UCLA,2346614
...,...,...,...,...,...,...,...
515,Zach Collins,San Antonio Spurs,F-C,6-11,250.0,Gonzaga,7700000
83,Zach LaVine,Chicago Bulls,G,6-5,200.0,UCLA,40064220
149,Zeke Nnaji,Denver Nuggets,F-C,6-9,240.0,Arizona,4306281
291,Ziaire Williams,Memphis Grizzlies,F,6-9,185.0,Stanford,4810200


In [103]:
nba.sort_values(by="Name", ascending=False)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
370,Zion Williamson,New Orleans Pelicans,F,6-6,284.0,Duke,34005250
291,Ziaire Williams,Memphis Grizzlies,F,6-9,185.0,Stanford,4810200
149,Zeke Nnaji,Denver Nuggets,F-C,6-9,240.0,Arizona,4306281
83,Zach LaVine,Chicago Bulls,G,6-5,200.0,UCLA,40064220
515,Zach Collins,San Antonio Spurs,F-C,6-11,250.0,Gonzaga,7700000
...,...,...,...,...,...,...,...
198,Aaron Holiday,Houston Rockets,G,6-0,185.0,UCLA,2346614
141,Aaron Gordon,Denver Nuggets,F,6-8,235.0,Arizona,22266182
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920
324,AJ Green,Milwaukee Bucks,G,6-5,190.0,Northern Iowa,1901769


In [104]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0


## Sort a DataFrame with the sort_values Method I
- The `sort_values` method sorts a **DataFrame** by the values in one or more columns. The default sort is an ascending one (alphabetical for strings).
- The first parameter (`by`) expects the column(s) to sort by.
- If sorting by a single column, pass a string with its name.
- The `ascending` parameter customizes the sort order.
- The `na_position` parameter customizes where pandas places `NaN` values.

## Sort a DataFrame with the sort_values Method II
- To sort by multiple columns, pass the `by` parameter a list of column names. Pandas will sort in the specified column order (first to last).
- Pass the `ascending` parameter a Boolean to sort all columns in a consistent order (all ascending or all descending).
- Pass `ascending` a list to customize the sort order *per* column. The `ascending` list length must match the `by` list.

In [98]:
nba= pd.read_csv("nba.csv")
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


In [99]:
nba.sort_values("Salary")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
5,Trent Forrest,Atlanta Hawks,G,6-4,210.0,Florida State,508891.0
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
540,Markquis Nowell,Toronto Raptors,G,5-8,160.0,Kansas State,559782.0
523,Sir'Jabari Rice,San Antonio Spurs,G,6-4,180.0,Texas,559782.0
103,Isaiah Mobley,Cleveland Cavaliers,F,6-8,238.0,Southern California,559782.0
...,...,...,...,...,...,...,...
547,Gary Trent Jr.,Toronto Raptors,G-F,6-5,209.0,Duke,
578,Taj Gibson,Washington Wizards,F,6-9,232.0,Southern California,
584,Kendrick Nunn,Washington Wizards,G,6-3,190.0,Oakland,
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [100]:
nba.sort_values("Salary", ascending=False)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
175,Stephen Curry,Golden State Warriors,G,6-2,185.0,Davidson,51915615.0
461,Kevin Durant,Phoenix Suns,F,6-10,240.0,Texas,47649433.0
261,LeBron James,Los Angeles Lakers,F,6-9,250.0,St. Vincent-St. Mary HS (OH),47607350.0
436,Joel Embiid,Philadelphia 76ers,C-F,7-0,280.0,Kansas,47607350.0
145,Nikola Jokic,Denver Nuggets,C,6-11,284.0,Mega Basket,47607350.0
...,...,...,...,...,...,...,...
547,Gary Trent Jr.,Toronto Raptors,G-F,6-5,209.0,Duke,
578,Taj Gibson,Washington Wizards,F,6-9,232.0,Southern California,
584,Kendrick Nunn,Washington Wizards,G,6-3,190.0,Oakland,
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [101]:
nba.sort_values("Salary", na_position="last")

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
5,Trent Forrest,Atlanta Hawks,G,6-4,210.0,Florida State,508891.0
585,Eugene Omoruyi,Washington Wizards,F,6-6,235.0,Oregon,559782.0
540,Markquis Nowell,Toronto Raptors,G,5-8,160.0,Kansas State,559782.0
523,Sir'Jabari Rice,San Antonio Spurs,G,6-4,180.0,Texas,559782.0
103,Isaiah Mobley,Cleveland Cavaliers,F,6-8,238.0,Southern California,559782.0
...,...,...,...,...,...,...,...
547,Gary Trent Jr.,Toronto Raptors,G-F,6-5,209.0,Duke,
578,Taj Gibson,Washington Wizards,F,6-9,232.0,Southern California,
584,Kendrick Nunn,Washington Wizards,G,6-3,190.0,Oakland,
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,


In [102]:
nba.sort_values("Salary", na_position="first")


Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
23,Blake Griffin,Boston Celtics,F,6-9,250.0,Oklahoma,
26,Mfiondu Kabengele,Boston Celtics,C,6-10,250.0,Florida State,
28,Svi Mykhailiuk,Boston Celtics,G-F,6-7,205.0,Kansas,
35,Robert Williams III,Boston Celtics,C-F,6-9,237.0,Texas A&M,
39,Nic Claxton,Brooklyn Nets,C,6-11,215.0,Georgia,
...,...,...,...,...,...,...,...
436,Joel Embiid,Philadelphia 76ers,C-F,7-0,280.0,Kansas,47607350.0
261,LeBron James,Los Angeles Lakers,F,6-9,250.0,St. Vincent-St. Mary HS (OH),47607350.0
145,Nikola Jokic,Denver Nuggets,C,6-11,284.0,Mega Basket,47607350.0
461,Kevin Durant,Phoenix Suns,F,6-10,240.0,Texas,47649433.0


In [103]:
nba.sort_values("Salary", na_position="first", ascending=False)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
23,Blake Griffin,Boston Celtics,F,6-9,250.0,Oklahoma,
26,Mfiondu Kabengele,Boston Celtics,C,6-10,250.0,Florida State,
28,Svi Mykhailiuk,Boston Celtics,G-F,6-7,205.0,Kansas,
35,Robert Williams III,Boston Celtics,C-F,6-9,237.0,Texas A&M,
39,Nic Claxton,Brooklyn Nets,C,6-11,215.0,Georgia,
...,...,...,...,...,...,...,...
55,Leaky Black,Charlotte Hornets,F,6-9,209.0,North Carolina,559782.0
74,Onuralp Bitim,Chicago Bulls,F,6-6,215.0,,559782.0
523,Sir'Jabari Rice,San Antonio Spurs,G,6-4,180.0,Texas,559782.0
540,Markquis Nowell,Toronto Raptors,G,5-8,160.0,Kansas State,559782.0


In [110]:
nba.sort_values(["Team", "Name"])
# nba.sort_values(by=["Team", "Name"])
# nba.sort_values(by=["Team", "Name"], ascending=True)
# nba.sort_values(by=["Team", "Name"], ascending=False)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
6,AJ Griffin,Atlanta Hawks,F,6-6,220.0,Duke,3712920
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000
8,De'Andre Hunter,Atlanta Hawks,F-G,6-8,221.0,Virginia,20089286
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864
578,Taj Gibson,Washington Wizards,F,6-9,232.0,Southern California,0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0
580,Tyus Jones,Washington Wizards,G,6-2,196.0,Duke,14000000


In [111]:
nba.sort_values(by=["Team", "Name"], ascending=[True, False])

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
12,Wesley Matthews,Atlanta Hawks,G,6-5,220.0,Marquette,3196448
5,Trent Forrest,Atlanta Hawks,G,6-4,210.0,Florida State,508891
17,Trae Young,Atlanta Hawks,G,6-1,164.0,Oklahoma,40064220
10,Seth Lundy,Atlanta Hawks,G,6-6,220.0,Penn State,559782
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983
...,...,...,...,...,...,...,...
577,Danilo Gallinari,Washington Wizards,F,6-10,236.0,Olimpia Milano,6802950
576,Daniel Gafford,Washington Wizards,F-C,6-10,234.0,Arkansas,12402000
581,Corey Kispert,Washington Wizards,F,6-6,224.0,Gonzaga,3722040
574,Bilal Coulibaly,Washington Wizards,G,6-6,195.0,Metropolitans 92,6614256


In [104]:
nba.sort_values(["Position", "Salary"])

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
143,Jay Huff,Denver Nuggets,C,7-1,240.0,Virginia,559782.0
252,Colin Castleton,Los Angeles Lakers,C,6-11,250.0,Florida,559782.0
343,Luka Garza,Minnesota Timberwolves,C,6-10,243.0,Iowa,559782.0
406,Olivier Sarr,Oklahoma City Thunder,C,6-11,240.0,Kentucky,559782.0
473,Ibou Badji,Portland Trail Blazers,C,7-1,240.0,FC Barcelona,559782.0
...,...,...,...,...,...,...,...
138,Armaan Franklin,Denver Nuggets,,,,,1119563.0
299,Caleb Daniels,Miami Heat,,,,,1119563.0
541,Kevin Obanor,Toronto Raptors,,6-8,235.0,,1119563.0
564,Nick Ongenda,Utah Jazz,,,,,1119563.0


In [108]:
nba = nba.sort_values(["Position", "Salary"], ascending=[False, True])
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
392,Dylan Windler,New York Knicks,G-F,6-7,196.0,Belmont,559782.0
202,Joshua Obiesie,Houston Rockets,G-F,6-6,195.0,,1119563.0
484,Rayan Rupert,Portland Trail Blazers,G-F,6-6,185.0,New Zealand Breakers,1119563.0
130,Joe Wieskamp,Dallas Mavericks,G-F,6-6,205.0,Iowa,2019706.0
197,Nate Hinton,Houston Rockets,G-F,6-5,210.0,Houston,2019706.0
...,...,...,...,...,...,...,...
138,Armaan Franklin,Denver Nuggets,,,,,1119563.0
299,Caleb Daniels,Miami Heat,,,,,1119563.0
541,Kevin Obanor,Toronto Raptors,,6-8,235.0,,1119563.0
564,Nick Ongenda,Utah Jazz,,,,,1119563.0


## Sort a DataFrame by its Index
- The `sort_index` method sorts the **DataFrame** by its index positions/labels.

In [109]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
392,Dylan Windler,New York Knicks,G-F,6-7,196.0,Belmont,559782.0
202,Joshua Obiesie,Houston Rockets,G-F,6-6,195.0,,1119563.0
484,Rayan Rupert,Portland Trail Blazers,G-F,6-6,185.0,New Zealand Breakers,1119563.0
130,Joe Wieskamp,Dallas Mavericks,G-F,6-6,205.0,Iowa,2019706.0
197,Nate Hinton,Houston Rockets,G-F,6-5,210.0,Houston,2019706.0
...,...,...,...,...,...,...,...
138,Armaan Franklin,Denver Nuggets,,,,,1119563.0
299,Caleb Daniels,Miami Heat,,,,,1119563.0
541,Kevin Obanor,Toronto Raptors,,6-8,235.0,,1119563.0
564,Nick Ongenda,Utah Jazz,,,,,1119563.0


In [110]:
nba.sort_index()
# nba.sort_index(ascending=True)
# nba.sort_index(ascending=False)
# nba = nba.sort_index(ascending=False)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983.0
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000.0
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244.0
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000.0
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522.0
...,...,...,...,...,...,...,...
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864.0
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000.0
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,
590,Delon Wright,Washington Wizards,G,6-5,185.0,Utah,8195122.0


## Rank Values with the rank Method
- The `rank` method assigns a numeric ranking to each **Series** value.
- Pandas will assign the same rank to equal values and create a "gap" in the dataset for the ranks.

In [115]:
nba = pd.read_csv("nba.csv").dropna(how="all")
nba["Salary"] = nba["Salary"].fillna(0).astype(int)
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522
...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0


In [116]:
nba["Salary"].rank()

0      361.0
1      512.0
2      349.0
3      523.0
4      284.0
       ...  
586    544.0
587    197.5
588    452.0
589     52.0
590    429.0
Name: Salary, Length: 591, dtype: float64

In [117]:
nba["Salary"].rank(ascending=True)

0      361.0
1      512.0
2      349.0
3      523.0
4      284.0
       ...  
586    544.0
587    197.5
588    452.0
589     52.0
590    429.0
Name: Salary, Length: 591, dtype: float64

In [118]:
nba["Salary"].rank(ascending=False)

0      231.0
1       80.0
2      243.0
3       69.0
4      308.0
       ...  
586     48.0
587    394.5
588    140.0
589    540.0
590    163.0
Name: Salary, Length: 591, dtype: float64

In [119]:
nba["Salary"].rank(ascending=False).astype(int)

0      231
1       80
2      243
3       69
4      308
      ... 
586     48
587    394
588    140
589    540
590    163
Name: Salary, Length: 591, dtype: int64

In [120]:
nba["Salary Rank"] = nba["Salary"].rank(ascending=False).astype(int)

In [121]:
nba

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,Salary Rank
0,Saddiq Bey,Atlanta Hawks,F,6-7,215.0,Villanova,4556983,231
1,Bogdan Bogdanovic,Atlanta Hawks,G,6-5,225.0,Fenerbahce,18700000,80
2,Kobe Bufkin,Atlanta Hawks,G,6-5,195.0,Michigan,4094244,243
3,Clint Capela,Atlanta Hawks,C,6-10,256.0,Elan Chalon,20616000,69
4,Bruno Fernando,Atlanta Hawks,F-C,6-10,240.0,Maryland,2581522,308
...,...,...,...,...,...,...,...,...
586,Jordan Poole,Washington Wizards,G,6-4,194.0,Michigan,27955357,48
587,Ryan Rollins,Washington Wizards,G,6-3,180.0,Toledo,1719864,394
588,Landry Shamet,Washington Wizards,G,6-4,190.0,Wichita State,10250000,140
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0,540


In [122]:
nba.sort_values("Salary", ascending=False).head(10)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,Salary Rank
175,Stephen Curry,Golden State Warriors,G,6-2,185.0,Davidson,51915615,1
461,Kevin Durant,Phoenix Suns,F,6-10,240.0,Texas,47649433,2
145,Nikola Jokic,Denver Nuggets,C,6-11,284.0,Mega Basket,47607350,4
436,Joel Embiid,Philadelphia 76ers,C-F,7-0,280.0,Kansas,47607350,4
261,LeBron James,Los Angeles Lakers,F,6-9,250.0,St. Vincent-St. Mary HS (OH),47607350,4
456,Bradley Beal,Phoenix Suns,G,6-4,207.0,Florida,46741590,6
480,Damian Lillard,Portland Trail Blazers,G,6-2,195.0,Weber State,45640084,8
316,Giannis Antetokounmpo,Milwaukee Bucks,F,7-0,243.0,Filathlitikos,45640084,8
239,Paul George,Los Angeles Clippers,F,6-8,220.0,Fresno State,45640084,8
241,Kawhi Leonard,Los Angeles Clippers,F,6-7,225.0,San Diego State,45640084,8


In [123]:
nba.sort_values("Salary", ascending=False).tail(5)

Unnamed: 0,Name,Team,Position,Height,Weight,College,Salary,Salary Rank
543,Otto Porter Jr.,Toronto Raptors,F,6-8,198.0,Georgetown,0,540
544,Dennis Schroder,Toronto Raptors,G,6-1,172.0,Braunschweig,0,540
584,Kendrick Nunn,Washington Wizards,G,6-3,190.0,Oakland,0,540
547,Gary Trent Jr.,Toronto Raptors,G-F,6-5,209.0,Duke,0,540
589,Tristan Vukcevic,Washington Wizards,F,6-10,220.0,Real Madrid,0,540


# Filtering Data

In [124]:
employees = pd.read_csv("employees.csv")

In [125]:
employees.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


In [126]:
employees.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   First Name         933 non-null    object 
 1   Gender             855 non-null    object 
 2   Start Date         1000 non-null   object 
 3   Last Login Time    1000 non-null   object 
 4   Salary             1000 non-null   int64  
 5   Bonus %            1000 non-null   float64
 6   Senior Management  933 non-null    object 
 7   Team               957 non-null    object 
dtypes: float64(1), int64(1), object(6)
memory usage: 62.6+ KB


In [128]:
employees["Start Date"] = pd.to_datetime(employees["Start Date"], format="%m/%d/%Y")
employees["Last Login Time"] = pd.to_datetime(employees["Last Login Time"], format="%H:%M %p").dt.time
employees["Senior Management"] = employees["Senior Management"].astype(bool)
employees["Gender"] = employees["Gender"].astype("category")
employees.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   First Name         933 non-null    object        
 1   Gender             855 non-null    category      
 2   Start Date         1000 non-null   datetime64[ns]
 3   Last Login Time    1000 non-null   object        
 4   Salary             1000 non-null   int64         
 5   Bonus %            1000 non-null   float64       
 6   Senior Management  1000 non-null   bool          
 7   Team               957 non-null    object        
dtypes: bool(1), category(1), datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 49.1+ KB


In [129]:
employees.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.34,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services


In [132]:
import datetime as dt

In [137]:
employees["Gender"] == "Male"

0       True
1       True
2      False
3       True
4       True
       ...  
995    False
996     True
997     True
998     True
999     True
Name: Gender, Length: 1000, dtype: bool

In [133]:
employees["Gender"].unique()

['Male', 'Female', NaN]
Categories (2, object): ['Female', 'Male']

In [134]:
employees.nunique()

First Name           200
Gender                 2
Start Date           972
Last Login Time      542
Salary               995
Bonus %              971
Senior Management      2
Team                  10
dtype: int64

In [138]:
employees["Gender"].value_counts()

Gender
Female    431
Male      424
Name: count, dtype: int64

In [138]:
employees[employees["Gender"] == "Male"]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
994,George,Male,2013-06-21,05:47:00,98874,4.479,True,Marketing
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [139]:
on_finance_team = employees["Team"] == "Finance"
employees[on_finance_team]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
7,,Female,2015-07-20,10:43:00,45906,11.598,True,Finance
14,Kimberly,Female,1999-01-14,07:13:00,41426,14.543,True,Finance
46,Bruce,Male,2009-11-28,10:47:00,114796,6.796,False,Finance
...,...,...,...,...,...,...,...,...
907,Elizabeth,Female,1998-07-27,11:12:00,137144,10.081,False,Finance
954,Joe,Male,1980-01-19,04:06:00,119667,1.148,True,Finance
987,Gloria,Female,2014-12-08,05:08:00,136709,10.331,True,Finance
992,Anthony,Male,2011-10-16,08:35:00,112769,11.625,True,Finance


In [140]:
employees[employees["Senior Management"]].head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
3,Jerry,Male,2005-03-04,01:00:00,138705,9.34,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product


In [141]:
employees[employees["Salary"] > 110000]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
9,Frances,Female,2002-08-08,06:51:00,139852,7.524,True,Business Development
12,Brandon,Male,1980-12-01,01:08:00,112807,17.492,True,Human Resources
...,...,...,...,...,...,...,...,...
987,Gloria,Female,2014-12-08,05:08:00,136709,10.331,True,Finance
991,Rose,Female,2002-08-25,05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,08:35:00,112769,11.625,True,Finance
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution


In [142]:
employees[employees["Bonus %"] < 1.5]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
15,Lillian,Female,2016-06-05,06:09:00,59414,1.256,False,Product
58,Theresa,Female,2010-04-11,07:18:00,72670,1.481,True,Engineering
77,Charles,Male,2004-09-14,08:13:00,107391,1.26,True,Marketing
175,Willie,Male,1998-02-17,08:20:00,146651,1.451,True,Engineering
189,Clarence,Male,1998-05-02,03:16:00,85700,1.215,False,Sales
217,Douglas,Male,1999-09-03,04:00:00,83341,1.015,True,Client Services
273,Nicholas,Male,1994-04-12,08:21:00,74669,1.113,True,Product
279,Ruby,Female,2000-11-08,07:35:00,105946,1.139,False,Business Development
365,Gloria,,1983-07-19,01:57:00,140885,1.113,False,Human Resources


In [143]:
employees[employees["Start Date"] < "1985-01-01"]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,1980-08-12,09:01:00,63241,15.132,True,
12,Brandon,Male,1980-12-01,01:08:00,112807,17.492,True,Human Resources
18,Diana,Female,1981-10-23,10:27:00,132940,19.082,False,Client Services
28,Terry,Male,1981-11-27,06:30:00,124008,13.464,True,Client Services
37,Linda,Female,1981-10-19,08:49:00,57427,9.557,True,Client Services
...,...,...,...,...,...,...,...,...
982,Rose,Female,1982-04-06,10:43:00,91411,8.639,True,Human Resources
983,John,Male,1982-12-23,10:35:00,146907,11.738,False,Engineering
985,Stephen,,1983-07-10,08:10:00,85668,1.909,False,Legal
986,Donna,Female,1982-11-26,07:04:00,82871,17.999,False,Marketing


## Filter with More than One Conditio


In [139]:
# female employees who work in Marketing who earn over $100k a year

is_female = employees["Gender"] == "Female"
is_in_marketing = employees["Team"] == "Marketing"
salary_over_100k = employees["Salary"] > 100000
is_female & is_in_marketing & salary_over_100k

0      False
1      False
2      False
3      False
4      False
       ...  
995    False
996    False
997    False
998    False
999    False
Length: 1000, dtype: bool

In [140]:
employees[is_female & is_in_marketing & salary_over_100k]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
98,Tina,Female,2016-06-16,07:47:00,100705,16.961,True,Marketing
140,Shirley,Female,1981-02-28,01:23:00,113850,1.854,False,Marketing
158,Norma,Female,1999-02-28,08:45:00,114412,8.756,True,Marketing
305,Margaret,Female,1993-02-06,01:05:00,125220,3.733,False,Marketing
319,Jacqueline,Female,1981-11-25,03:01:00,145988,18.243,False,Marketing
379,,Female,2002-09-18,12:39:00,118906,4.537,True,Marketing
468,Janice,Female,1997-06-28,01:48:00,136032,10.696,True,Marketing
490,Judith,Female,2007-11-23,01:22:00,117055,7.461,False,Marketing
531,Virginia,Female,2010-05-02,09:10:00,123649,10.154,True,Marketing
585,Shirley,Female,1988-04-16,11:09:00,132156,2.754,False,Marketing


In [141]:
# Employees who are either senior management OR started before January 1st, 1990

is_senior_management = employees["Senior Management"]
started_in_80s = employees["Start Date"] < "1990-01-01"

employees[is_senior_management | started_in_80s]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
992,Anthony,Male,2011-10-16,08:35:00,112769,11.625,True,Finance
993,Tina,Female,1997-05-15,03:53:00,56450,19.040,True,Engineering
994,George,Male,2013-06-21,05:47:00,98874,4.479,True,Marketing
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance


In [142]:
# First Name is Robert who work in Client Services OR Start Date after 2016-06-01

In [143]:
is_robert = employees["First Name"] == "Robert"
is_in_client_services = employees["Team"] == "Client Services"
start_date_after_june_2016 = employees["Start Date"] > "2016-06-01"

In [144]:
employees[(is_robert & is_in_client_services) | start_date_after_june_2016]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
15,Lillian,Female,2016-06-05,06:09:00,59414,1.256,False,Product
98,Tina,Female,2016-06-16,07:47:00,100705,16.961,True,Marketing
387,Robert,Male,1994-10-29,04:26:00,123294,19.894,False,Client Services
451,Terry,,2016-07-15,12:29:00,140002,19.49,True,Marketing


In [154]:
# Legal Team or Sales Team or Product Team

legal_team = employees["Team"] == "Legal"
sales_team = employees["Team"] == "Sales"
product_team = employees["Team"] == "Product"

employees[legal_team | sales_team | product_team]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product
11,Julie,Female,1997-10-26,03:19:00,102508,12.637,True,Legal
13,Gary,Male,2008-01-27,11:40:00,109831,5.831,False,Sales
15,Lillian,Female,2016-06-05,06:09:00,59414,1.256,False,Product
...,...,...,...,...,...,...,...,...
981,James,Male,1993-01-15,05:19:00,148985,19.280,False,Legal
985,Stephen,,1983-07-10,08:10:00,85668,1.909,False,Legal
989,Justin,,1991-02-10,04:58:00,38344,3.794,False,Legal
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product


In [155]:
target_teams = employees["Team"].isin(["Legal", "Sales", "Product"])
employees[target_teams]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product
11,Julie,Female,1997-10-26,03:19:00,102508,12.637,True,Legal
13,Gary,Male,2008-01-27,11:40:00,109831,5.831,False,Sales
15,Lillian,Female,2016-06-05,06:09:00,59414,1.256,False,Product
...,...,...,...,...,...,...,...,...
981,James,Male,1993-01-15,05:19:00,148985,19.280,False,Legal
985,Stephen,,1983-07-10,08:10:00,85668,1.909,False,Legal
989,Justin,,1991-02-10,04:58:00,38344,3.794,False,Legal
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product


In [145]:
employees = pd.read_csv("employees.csv", parse_dates=["Start Date"], date_format="%m/%d/%Y")
employees["Last Login Time"] = pd.to_datetime(employees["Last Login Time"], format="%H:%M %p").dt.time
employees["Senior Management"] = employees["Senior Management"].astype(bool)
employees["Gender"] = employees["Gender"].astype("category")
employees.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.34,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services


In [147]:
employees["Team"].isnull().sum()

np.int64(43)

In [4]:
employees[employees["Team"].isnull()]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
10,Louise,Female,1980-08-12,09:01:00,63241,15.132,True,
23,,Male,2012-06-14,04:19:00,125792,5.042,True,
32,,Male,1998-08-21,02:27:00,122340,6.417,True,
91,James,,2005-01-26,11:00:00,128771,8.309,False,
109,Christopher,Male,2000-04-22,10:15:00,37919,11.449,False,
139,,Female,1990-10-03,01:08:00,132373,10.527,True,
199,Jonathan,Male,2009-07-17,08:15:00,130581,16.736,True,
258,Michael,Male,2002-01-24,03:04:00,43586,12.659,False,
290,Jeremy,Male,1988-06-14,06:20:00,129460,13.657,True,


In [5]:
employees[employees["Team"].notnull()]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [6]:
employees[employees["First Name"].isnull() & employees["Team"].notnull()]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
7,,Female,2015-07-20,10:43:00,45906,11.598,True,Finance
25,,Male,2012-10-08,01:12:00,37076,18.576,True,Client Services
39,,Male,2016-01-29,02:33:00,122173,7.797,True,Client Services
51,,,2011-12-17,08:29:00,41126,14.009,True,Sales
62,,Female,2007-06-12,05:25:00,58112,19.414,True,Marketing
116,,Male,1991-06-22,08:58:00,76189,18.988,True,Legal
149,,Female,2014-08-17,02:00:00,86230,8.578,True,Distribution
157,,Female,2005-07-27,08:32:00,79536,14.443,True,Product
165,,Female,2014-03-23,01:28:00,59148,9.061,True,Legal
166,,Female,1991-07-09,06:52:00,42341,7.014,True,Sales


In [8]:
employees[employees["Salary"].between(60000, 70000)]


Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product
10,Louise,Female,1980-08-12,09:01:00,63241,15.132,True,
20,Lois,,1995-04-22,07:18:00,64714,4.934,True,Legal
41,Christine,,2015-06-28,01:08:00,66582,11.308,True,Business Development
...,...,...,...,...,...,...,...,...
965,Catherine,Female,1989-09-25,01:31:00,68164,18.393,False,Client Services
970,Alice,Female,1988-09-03,08:54:00,63571,15.397,True,Product
974,Harry,Male,2011-08-30,06:31:00,67656,16.455,True,Client Services
978,Sean,Male,1983-01-17,02:23:00,66146,11.178,False,Human Resources


In [9]:
employees[employees["Start Date"].between("1991-01-01", "1992-01-01")]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
27,Scott,,1991-07-11,06:58:00,122367,5.218,False,Legal
75,Bonnie,Female,1991-07-02,01:27:00,104897,5.118,True,Human Resources
88,Donna,Female,1991-11-27,01:59:00,64088,6.155,True,Legal
116,,Male,1991-06-22,08:58:00,76189,18.988,True,Legal
148,Patrick,,1991-07-14,02:24:00,124488,14.837,True,Sales
166,,Female,1991-07-09,06:52:00,42341,7.014,True,Sales
172,Sara,Female,1991-09-23,06:17:00,97058,9.402,False,Finance
220,,Female,1991-06-17,12:49:00,71945,5.56,True,Marketing
245,Victor,Male,1991-04-11,07:44:00,70817,17.138,False,Engineering
277,Brenda,,1991-05-29,06:32:00,82439,19.062,False,Sales


In [148]:
employees["First Name"].duplicated().sum()

np.int64(799)

In [149]:
employees["Senior Management"].duplicated().sum()

np.int64(998)

In [10]:
employees[employees["First Name"].duplicated()]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
23,,Male,2012-06-14,04:19:00,125792,5.042,True,
25,,Male,2012-10-08,01:12:00,37076,18.576,True,Client Services
32,,Male,1998-08-21,02:27:00,122340,6.417,True,
34,Jerry,Male,2004-01-10,12:56:00,95734,19.096,False,Client Services
39,,Male,2016-01-29,02:33:00,122173,7.797,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [11]:
employees[employees["First Name"].duplicated(keep="first")]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
23,,Male,2012-06-14,04:19:00,125792,5.042,True,
25,,Male,2012-10-08,01:12:00,37076,18.576,True,Client Services
32,,Male,1998-08-21,02:27:00,122340,6.417,True,
34,Jerry,Male,2004-01-10,12:56:00,95734,19.096,False,Client Services
39,,Male,2016-01-29,02:33:00,122173,7.797,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [12]:
employees[employees["First Name"].duplicated(keep="last")]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
959,Albert,Male,1992-09-19,02:35:00,45094,5.850,True,Business Development
960,Stephen,Male,1989-10-29,11:34:00,93997,18.093,True,Business Development
970,Alice,Female,1988-09-03,08:54:00,63571,15.397,True,Product
973,Russell,Male,2013-05-10,11:08:00,137359,11.105,False,Business Development


In [13]:
employees[employees["First Name"].duplicated(keep=False)]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [14]:
employees[~employees["First Name"].duplicated(keep=False)]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
8,Angela,Female,2005-11-22,06:29:00,95570,18.523,True,Engineering
33,Jean,Female,1993-12-18,09:07:00,119082,16.18,False,Business Development
190,Carol,Female,1996-03-19,03:39:00,57783,9.129,False,Finance
291,Tammy,Female,1984-11-11,10:30:00,132839,17.463,True,Client Services
495,Eugene,Male,1984-05-24,10:54:00,81077,2.117,False,Sales
688,Brian,Male,2007-04-07,10:47:00,93901,17.821,True,Legal
832,Keith,Male,2003-02-12,03:02:00,120672,19.467,False,Legal
887,David,Male,2009-12-05,08:48:00,92242,15.407,False,Legal


In [16]:
employees.drop_duplicates()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [17]:
employees.drop_duplicates("Team")

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product
8,Angela,Female,2005-11-22,06:29:00,95570,18.523,True,Engineering
9,Frances,Female,2002-08-08,06:51:00,139852,7.524,True,Business Development
12,Brandon,Male,1980-12-01,01:08:00,112807,17.492,True,Human Resources
13,Gary,Male,2008-01-27,11:40:00,109831,5.831,False,Sales


In [18]:
employees.drop_duplicates("Team", keep="first")

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,04:20:00,65476,10.012,True,Product
8,Angela,Female,2005-11-22,06:29:00,95570,18.523,True,Engineering
9,Frances,Female,2002-08-08,06:51:00,139852,7.524,True,Business Development
12,Brandon,Male,1980-12-01,01:08:00,112807,17.492,True,Human Resources
13,Gary,Male,2008-01-27,11:40:00,109831,5.831,False,Sales


In [19]:
employees.drop_duplicates("Team", keep="last")

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
951,,Female,2010-09-14,05:19:00,143638,9.662,True,
988,Alice,Female,2004-10-05,09:34:00,47638,11.209,False,Human Resources
989,Justin,,1991-02-10,04:58:00,38344,3.794,False,Legal
990,Robin,Female,1987-07-24,01:35:00,100765,10.982,True,Client Services
993,Tina,Female,1997-05-15,03:53:00,56450,19.04,True,Engineering
994,George,Male,2013-06-21,05:47:00,98874,4.479,True,Marketing
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [20]:
employees.drop_duplicates("Team", keep=False)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team


In [21]:
employees.drop_duplicates("First Name", keep=False)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,01:35:00,115163,10.125,False,Legal
8,Angela,Female,2005-11-22,06:29:00,95570,18.523,True,Engineering
33,Jean,Female,1993-12-18,09:07:00,119082,16.18,False,Business Development
190,Carol,Female,1996-03-19,03:39:00,57783,9.129,False,Finance
291,Tammy,Female,1984-11-11,10:30:00,132839,17.463,True,Client Services
495,Eugene,Male,1984-05-24,10:54:00,81077,2.117,False,Sales
688,Brian,Male,2007-04-07,10:47:00,93901,17.821,True,Legal
832,Keith,Male,2003-02-12,03:02:00,120672,19.467,False,Legal
887,David,Male,2009-12-05,08:48:00,92242,15.407,False,Legal


In [22]:
employees.drop_duplicates(["Senior Management", "Team"]).sort_values("Team")

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
33,Jean,Female,1993-12-18,09:07:00,119082,16.18,False,Business Development
9,Frances,Female,2002-08-08,06:51:00,139852,7.524,True,Business Development
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
18,Diana,Female,1981-10-23,10:27:00,132940,19.082,False,Client Services
60,Paula,,2005-11-23,02:01:00,48866,4.271,False,Distribution
40,Michael,Male,2008-10-10,11:25:00,99283,2.665,True,Distribution
8,Angela,Female,2005-11-22,06:29:00,95570,18.523,True,Engineering
54,Sara,Female,2007-08-15,09:23:00,83677,8.999,False,Engineering
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.34,True,Finance


In [23]:
employees.drop_duplicates(["Senior Management", "Team"], keep="last").sort_values("Team")

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
971,Patrick,Male,2002-12-30,02:01:00,75423,5.368,True,Business Development
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development
965,Catherine,Female,1989-09-25,01:31:00,68164,18.393,False,Client Services
990,Robin,Female,1987-07-24,01:35:00,100765,10.982,True,Client Services
946,,Female,1985-09-15,01:50:00,133472,16.941,True,Distribution
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
993,Tina,Female,1997-05-15,03:53:00,56450,19.04,True,Engineering
984,Maria,Female,2011-10-15,04:53:00,43455,13.04,False,Engineering
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
992,Anthony,Male,2011-10-16,08:35:00,112769,11.625,True,Finance


In [24]:
employees["Gender"].unique()

['Male', 'Female', NaN]
Categories (2, object): ['Female', 'Male']

In [25]:
type(employees["Gender"].unique())

pandas.core.arrays.categorical.Categorical

In [27]:
employees["Team"].unique()

array(['Marketing', nan, 'Finance', 'Client Services', 'Legal', 'Product',
       'Engineering', 'Business Development', 'Human Resources', 'Sales',
       'Distribution'], dtype=object)

In [26]:

type(employees["Team"].unique())

numpy.ndarray

In [28]:
employees["Team"].nunique()

10

In [29]:
employees["Team"].nunique(dropna=True)

10

In [30]:
employees["Team"].nunique(dropna=False)

11

In [31]:
employees.nunique()

First Name           200
Gender                 2
Start Date           972
Last Login Time      542
Salary               995
Bonus %              971
Senior Management      2
Team                  10
dtype: int64

## The set_index and reset_index Methods
- The `set_index` method sets an existing column as the index of the DataFrame.
- The `reset_index` method sets the standard ascending numeric index as the index of the DataFrame.

In [32]:
employees

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,01:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,04:45:00,60500,11.985,False,Business Development


In [34]:
employees = employees.reset_index().set_index("Start Date")
employees.head()

Unnamed: 0_level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
Start Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1993-08-06,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1996-03-31,1,Thomas,Male,06:53:00,61933,4.17,True,
1993-04-23,2,Maria,Female,11:17:00,130590,11.858,False,Finance
2005-03-04,3,Jerry,Male,01:00:00,138705,9.34,True,Finance
1998-01-24,4,Larry,Male,04:47:00,101004,1.389,True,Client Services


In [36]:
employees = employees.reset_index()
employees.head()

Unnamed: 0,Start Date,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
0,1993-08-06,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1,1996-03-31,1,Thomas,Male,06:53:00,61933,4.17,True,
2,1993-04-23,2,Maria,Female,11:17:00,130590,11.858,False,Finance
3,2005-03-04,3,Jerry,Male,01:00:00,138705,9.34,True,Finance
4,1998-01-24,4,Larry,Male,04:47:00,101004,1.389,True,Client Services


In [38]:
employees = employees.reset_index().set_index("Start Date")
employees.head()

Unnamed: 0_level_0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
Start Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1993-08-06,0,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1996-03-31,1,1,Thomas,Male,06:53:00,61933,4.17,True,
1993-04-23,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance
2005-03-04,3,3,Jerry,Male,01:00:00,138705,9.34,True,Finance
1998-01-24,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services


In [39]:
employees = employees.reset_index(drop=True)
employees.head()

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
0,0,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1,1,1,Thomas,Male,06:53:00,61933,4.17,True,
2,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance
3,3,3,Jerry,Male,01:00:00,138705,9.34,True,Finance
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services


In [40]:
employees.iloc[5]

level_0                     5
index                       5
First Name             Dennis
Gender                   Male
Last Login Time      01:35:00
Salary                 115163
Bonus %                10.125
Senior Management       False
Team                    Legal
Name: 5, dtype: object

In [41]:
employees.iloc[[15, 20]]

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
15,15,15,Lillian,Female,06:09:00,59414,1.256,False,Product
20,20,20,Lois,,07:18:00,64714,4.934,True,Legal


In [42]:
employees.iloc[4:8]

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services
5,5,5,Dennis,Male,01:35:00,115163,10.125,False,Legal
6,6,6,Ruby,Female,04:20:00,65476,10.012,True,Product
7,7,7,,Female,10:43:00,45906,11.598,True,Finance


In [44]:
employees.iloc[:6]


Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
0,0,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1,1,1,Thomas,Male,06:53:00,61933,4.17,True,
2,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance
3,3,3,Jerry,Male,01:00:00,138705,9.34,True,Finance
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services
5,5,5,Dennis,Male,01:35:00,115163,10.125,False,Legal


In [46]:
employees.iloc[980:]

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
980,980,980,Kimberly,Female,12:57:00,46233,8.862,True,Engineering
981,981,981,James,Male,05:19:00,148985,19.28,False,Legal
982,982,982,Rose,Female,10:43:00,91411,8.639,True,Human Resources
983,983,983,John,Male,10:35:00,146907,11.738,False,Engineering
984,984,984,Maria,Female,04:53:00,43455,13.04,False,Engineering
985,985,985,Stephen,,08:10:00,85668,1.909,False,Legal
986,986,986,Donna,Female,07:04:00,82871,17.999,False,Marketing
987,987,987,Gloria,Female,05:08:00,136709,10.331,True,Finance
988,988,988,Alice,Female,09:34:00,47638,11.209,False,Human Resources
989,989,989,Justin,,04:58:00,38344,3.794,False,Legal


In [47]:
employees.iloc[0, 2]

'Douglas'

In [48]:
employees.iloc[3, 5]

np.int64(138705)

In [49]:
employees.iloc[[0, 2], 3]


0      Male
2    Female
Name: Gender, dtype: category
Categories (2, object): ['Female', 'Male']

In [50]:
employees.iloc[[0, 2], [3, 5]]

Unnamed: 0,Gender,Salary
0,Male,97308
2,Female,130590


In [51]:
employees.iloc[:7, :3]

Unnamed: 0,level_0,index,First Name
0,0,0,Douglas
1,1,1,Thomas
2,2,2,Maria
3,3,3,Jerry
4,4,4,Larry
5,5,5,Dennis
6,6,6,Ruby


In [52]:
employees

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
0,0,0,Douglas,Male,12:42:00,97308,6.945,True,Marketing
1,1,1,Thomas,Male,06:53:00,61933,4.170,True,
2,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance
3,3,3,Jerry,Male,01:00:00,138705,9.340,True,Finance
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...,...
995,995,995,Henry,,06:09:00,132483,16.655,False,Distribution
996,996,996,Phillip,Male,06:30:00,42392,19.675,False,Finance
997,997,997,Russell,Male,12:39:00,96914,1.421,False,Product
998,998,998,Larry,Male,04:45:00,60500,11.985,False,Business Development


In [53]:
employees.loc[0 , "First Name"] = "Connery"

In [54]:
employees

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team
0,0,0,Connery,Male,12:42:00,97308,6.945,True,Marketing
1,1,1,Thomas,Male,06:53:00,61933,4.170,True,
2,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance
3,3,3,Jerry,Male,01:00:00,138705,9.340,True,Finance
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...,...
995,995,995,Henry,,06:09:00,132483,16.655,False,Distribution
996,996,996,Phillip,Male,06:30:00,42392,19.675,False,Finance
997,997,997,Russell,Male,12:39:00,96914,1.421,False,Product
998,998,998,Larry,Male,04:45:00,60500,11.985,False,Business Development


In [77]:
employees.drop(columns=["level_0"])

Unnamed: 0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
0,0,Connery,Male,12:42:00,97308,6.945,True,Marketing,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
1,1,Thomas,Male,06:53:00,61933,4.170,True,,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
2,2,Maria,Female,11:17:00,130590,11.858,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
3,3,Jerry,Male,01:00:00,138705,9.340,True,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,995,Henry,,06:09:00,132483,16.655,False,Distribution,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
996,996,Phillip,Male,06:30:00,42392,19.675,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
997,997,Russell,Male,12:39:00,96914,1.421,False,Product,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
998,998,Larry,Male,04:45:00,60500,11.985,False,Business Development,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [78]:
employees

Unnamed: 0,level_0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
0,0,0,Connery,Male,12:42:00,97308,6.945,True,Marketing,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
1,1,1,Thomas,Male,06:53:00,61933,4.170,True,,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
2,2,2,Maria,Female,11:17:00,130590,11.858,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
3,3,3,Jerry,Male,01:00:00,138705,9.340,True,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
4,4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,995,995,Henry,,06:09:00,132483,16.655,False,Distribution,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
996,996,996,Phillip,Male,06:30:00,42392,19.675,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
997,997,997,Russell,Male,12:39:00,96914,1.421,False,Product,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
998,998,998,Larry,Male,04:45:00,60500,11.985,False,Business Development,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [79]:
employees.drop(columns=["level_0"], inplace=True)

In [80]:
employees

Unnamed: 0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
0,0,Connery,Male,12:42:00,97308,6.945,True,Marketing,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
1,1,Thomas,Male,06:53:00,61933,4.170,True,,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
2,2,Maria,Female,11:17:00,130590,11.858,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
3,3,Jerry,Male,01:00:00,138705,9.340,True,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
4,4,Larry,Male,04:47:00,101004,1.389,True,Client Services,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,995,Henry,,06:09:00,132483,16.655,False,Distribution,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
996,996,Phillip,Male,06:30:00,42392,19.675,False,Finance,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
997,997,Russell,Male,12:39:00,96914,1.421,False,Product,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
998,998,Larry,Male,04:45:00,60500,11.985,False,Business Development,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [81]:
employees.sample()

Unnamed: 0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
302,302,Adam,Male,11:59:00,71276,5.027,True,Human Resources,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [82]:
employees.sample(n=5)


Unnamed: 0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
273,273,Nicholas,Male,08:21:00,74669,1.113,True,Product,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
916,916,Marilyn,Female,07:18:00,118369,7.696,True,Business Development,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
209,209,Emily,Female,11:25:00,89434,11.295,False,Engineering,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
941,941,William,Male,08:33:00,104840,15.653,True,Engineering,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
204,204,Willie,Male,09:45:00,55281,4.935,True,Marketing,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [83]:
employees.sample(n=3, axis="rows")

Unnamed: 0,index,First Name,Gender,Last Login Time,Salary,Bonus %,Senior Management,Team,Marketing,NaN,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
47,47,Kathy,Female,04:51:00,66820,9.0,True,Client Services,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
652,652,Willie,Male,05:39:00,141932,1.017,True,Engineering,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution
576,576,Michael,Male,05:35:00,35013,14.879,False,Product,Marketing,,Finance,Client Services,Legal,Product,Engineering,Business Development,Human Resources,Sales,Distribution


In [84]:
employees.sample(n=2, axis="columns")

Unnamed: 0,Finance,Gender
0,Finance,Male
1,Finance,Male
2,Finance,Female
3,Finance,Male
4,Finance,Male
...,...,...
995,Finance,
996,Finance,Male
997,Finance,Male
998,Finance,Male


## The nsmallest and nlargest Methods
- The `nlargest` method returns a specified number of rows with the largest values from a given column.
- The `nsmallest` method returns rows with the smallest values from a given column.
- The `nlargest` and `nsmallest` methods are more efficient than sorting the entire **DataFrame**.

In [86]:
employees = pd.read_csv("employees.csv")
employees.sort_values("Salary", ascending=False).head(4)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
644,Katherine,Female,8/13/1996,12:21 AM,149908,18.912,False,Finance
429,Rose,Female,5/28/2015,8:40 AM,149903,5.63,False,Human Resources
828,Cynthia,Female,7/12/2006,8:55 AM,149684,7.864,False,Product
186,,Female,2/23/2005,9:50 PM,149654,1.825,,Sales


In [87]:
employees["Salary"].nlargest(4)

644    149908
429    149903
828    149684
186    149654
Name: Salary, dtype: int64

In [88]:
employees["Salary"].nsmallest(4)

576    35013
238    35061
82     35095
63     35203
Name: Salary, dtype: int64

In [89]:
employees.sort_values("Salary", ascending=True).head(4)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
576,Michael,Male,7/30/1993,5:35 PM,35013,14.879,False,Product
238,Kevin,Male,3/25/1982,7:31 AM,35061,5.128,False,Legal
82,Steven,Male,3/30/1980,9:20 PM,35095,8.379,True,Client Services
63,Matthew,Male,1/2/2013,10:33 PM,35203,18.04,False,Human Resources


In [94]:
len(Team)

10

In [91]:
Team.size()

Team
Business Development    101
Client Services         106
Distribution             90
Engineering              92
Finance                 102
Human Resources          91
Legal                    88
Marketing                98
Product                  95
Sales                    94
dtype: int64

In [92]:
Team.first()

Unnamed: 0_level_0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Business Development,Frances,Female,8/8/2002,6:51 AM,139852,7.524,True
Client Services,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True
Distribution,Michael,Male,10/10/2008,11:25 AM,99283,2.665,True
Engineering,Angela,Female,11/22/2005,6:29 AM,95570,18.523,True
Finance,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False
Human Resources,Brandon,Male,12/1/1980,1:08 AM,112807,17.492,True
Legal,Dennis,Male,4/18/1987,1:35 AM,115163,10.125,False
Marketing,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True
Product,Ruby,Female,8/17/1987,4:20 PM,65476,10.012,True
Sales,Gary,Male,1/27/2008,11:40 PM,109831,5.831,False


In [93]:
Team.last()

Unnamed: 0_level_0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management
Team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Business Development,Larry,Male,4/20/2013,4:45 PM,60500,11.985,False
Client Services,Robin,Female,7/24/1987,1:35 PM,100765,10.982,True
Distribution,Henry,Female,11/23/2014,6:09 AM,132483,16.655,False
Engineering,Tina,Female,5/15/1997,3:53 PM,56450,19.04,True
Finance,Phillip,Male,1/31/1984,6:30 AM,42392,19.675,False
Human Resources,Alice,Female,10/5/2004,9:34 AM,47638,11.209,False
Legal,Justin,Male,2/10/1991,4:58 PM,38344,3.794,False
Marketing,George,Male,6/21/2013,5:47 PM,98874,4.479,True
Product,Russell,Male,5/20/2013,12:39 PM,96914,1.421,False
Sales,Albert,Male,5/15/2012,6:24 PM,129949,10.169,True
