# AAI614: Data Science & its Applications

*Notebook 2.1: Practice with Pandas*

<a href="https://colab.research.google.com/github/harmanani/AAI614/blob/main/Week%202/Notebook2.1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

*This notebook contains excerpts from the [Pandas in Action](https://www.manning.com/books/pandas-in-action) by Boris Paskhaver.*

### 1. Read the file

In [None]:
import pandas as pd
import numpy as np
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

nba = pd.read_csv("https://raw.githubusercontent.com/harmanani/AAI614/main/Week%202/nba.csv")

### 1. Shared and Exclusive Attributes between Series and DataFrames

In [None]:
pd.Series([1, 2, 3]).dtype

In [None]:
nba.dtypes

In [None]:
nba.dtypes.value_counts()

In [None]:
nba.index

In [None]:
nba.columns

In [None]:
nba.ndim

In [None]:
nba.shape

In [None]:
nba.size

In [None]:
nba.count()

In [None]:
nba.count().sum()

In [None]:
data = {
    "A": [1, np.nan],
    "B": [2, 3]
}

df = pd.DataFrame(data)
df

In [None]:
df.size

In [None]:
df.count()

In [None]:
df.count().sum()

### 2. Shared Methods between Series and DataFrames

In [None]:
nba.head(2)

In [None]:
nba.tail(n = 3)

In [None]:
nba.tail()

In [None]:
nba.sample(3)

In [None]:
nba.nunique()

In [None]:
nba.max()

In [None]:
nba.min()

In [None]:
nba.nlargest(n = 4, columns = "Salary")

In [None]:
nba['Birthday'] = pd.to_datetime(nba['Birthday'])

In [None]:
nba.nsmallest(n = 3, columns = ["Birthday"])

#### The following should give an error.  No need to panic :-)

In [None]:
#nba.sum()

In [None]:
nba.sum(numeric_only = True)

In [None]:
nba.mean(numeric_only = True)

In [None]:
nba.median(numeric_only = True)

In [None]:
nba.mode(numeric_only = True)

In [None]:
nba.std(numeric_only = True)

## 3. Sorting a DataFrame

### 3.1 Sorting by Single Column

In [None]:
# The two lines below are equivalent
nba.sort_values("Name")
nba.sort_values(by = "Name")

In [None]:
nba.sort_values("Name", ascending = False).head()

In [None]:
nba.sort_values("Birthday", ascending = False).head()

### 4.2 Sorting by Multiple Columns

In [None]:
nba.sort_values(by = ["Team", "Name"])

In [None]:
nba.sort_values(["Team", "Name"], ascending = False)

In [None]:
nba.sort_values(
    by = ["Team", "Salary"], ascending = [True, False]
)

In [None]:
nba = nba.sort_values(
    by = ["Team", "Salary"],
    ascending = [True, False]
)

## 5 Sorting by Index

In [None]:
nba.head()

### 5.1 Sorting by Row Index

In [None]:
# The two lines below are equivalent
nba.sort_index().head()
nba.sort_index(ascending = True).head()

In [None]:
nba.sort_index(ascending = False).head()

In [None]:
nba = nba.sort_index()

### 5.2 Sorting by Column Index

In [None]:
# The two lines below are equivalent
nba.sort_index(axis = "columns").head()
nba.sort_index(axis = 1).head()

In [None]:
nba.sort_index(axis = "columns", ascending = False).head()

## 5.3 Setting a New Index

In [None]:
# The two lines below are equivalent
nba.set_index(keys = "Name")

In [None]:
nba.set_index("Name")

## 6 Selecting Columns and Rows from a DataFrame

### 6.1 Selecting a Single Column from a DataFrame

In [None]:
nba.Salary

In [None]:
nba["Position"]

### 6.2 Selecting Multiple Columns from a DataFrame

In [None]:
nba[["Salary", "Birthday"]].head()

In [None]:
nba[["Birthday", "Salary"]].head()

In [None]:
nba.select_dtypes(include = "object")

In [None]:
nba.select_dtypes(exclude = ["object", "int"])

## 7 Selecting Rows from a DataFrame

### 7.1 Extracting Rows by Index Label

In [None]:
nba

In [None]:
nba.index = nba["Name"]
nba = nba.set_index('Name', drop = True)

In [None]:
nba.loc["LeBron James"]

In [None]:
nba.loc[["Kawhi Leonard", "Paul George"]]

In [None]:
nba.loc[["Paul George", "Kawhi Leonard"]]

In [None]:
nba.sort_index().loc["Otto Porter":"Patrick Beverley"]

In [None]:
players = ["Otto Porter", "PJ Dozier", "PJ Washington"]
players[0:2]

In [None]:
nba.sort_index().loc["Zach Collins":]

In [None]:
nba.sort_index().loc[:"Al Horford"]

**NOTE**: I've commented out the code below so that the Notebook can run without raising an error.

In [None]:
#nba.loc["Bugs Bunny"]

### 7.2 Extracting Rows by Index Position

In [None]:
nba.iloc[300]

In [None]:
nba.iloc[[100, 200, 300, 400]]

In [None]:
nba.iloc[400:404]

In [None]:
nba.iloc[:2]

In [None]:
nba.iloc[447:]

In [None]:
nba.iloc[-10:-6]

In [None]:
nba.iloc[0:10:2]

### 7.3 Extracting Values from Specific Columns

In [None]:
nba['Team'].loc["Giannis Antetokounmpo"]

In [None]:
nba.loc["James Harden", ["Position", "Birthday"]]

In [None]:
nba.loc[
    ["Russell Westbrook", "Anthony Davis"],
    ["Team", "Salary"]
]

In [None]:
nba.loc["Joel Embiid", "Position":"Salary"]

In [None]:
nba.loc["Joel Embiid", "Salary":"Position"]

In [None]:
nba.iloc[57, 3]

In [None]:
nba.iloc[100:104, :3]

In [None]:
nba.at["Austin Rivers", "Birthday"]

In [None]:
nba.iat[263, 1]

In [None]:
%%timeit
nba.at["Austin Rivers", "Birthday"]

In [None]:
%%timeit
nba.loc["Austin Rivers", "Birthday"]

In [None]:
%%timeit
nba.iat[263, 1]

In [None]:
%%timeit
nba.iloc[263, 1]

## 8 Extracting Values from Series

In [None]:
nba["Salary"].loc["Damian Lillard"]

In [None]:
nba["Salary"].at["Damian Lillard"]

In [None]:
nba["Salary"].iloc[234]

In [None]:
nba["Salary"].iat[234]

## 9 Renaming Columns or Rows

In [None]:
nba.columns

In [None]:
nba.columns = ["Team", "Position", "Date of Birth", "Pay"]
nba.head(1)

In [None]:
nba.rename(columns = { "Date of Birth": "Birthday" })

In [None]:
nba = nba.rename(columns = { "Date of Birth": "Birthday" })

In [None]:
nba.loc["Giannis Antetokounmpo"]

In [None]:
nba = nba.rename(
    index = { "Giannis Antetokounmpo": "Greek Freak" }    
)

In [None]:
nba.loc["Greek Freak"]

## 10 Resetting an Index

In [None]:
nba.set_index("Team").head()

In [None]:
nba.reset_index().head()

In [None]:
nba.reset_index().set_index("Team").head()

In [None]:
nba = nba.reset_index().set_index("Team")