# DataFrames

Data frames are usually used on 2+ dimenional structures. When thinking about dimensions, you can think of the number of indexes in order to get back a value. In a 2-dimensional table, you would need two indices, one for the row, and one for the column in order to extract a value from a cell.

In [None]:
import pandas as pd

In [None]:
nba = pd.read_csv("data_files/nba.csv")

In [None]:
nba.shape

In [None]:
nba.head(5)

## Extract a Single Column ( Returns a Series )

Takes the original index column and the designated column and returns a series

In [None]:
nba.Name

In [None]:
# NOTE: This syntax is better, it will work all the time even if the name has spaces or characters that wouldn't work with the dot syntax
nba["Name"]

## Select Two or More Columns from A `DataFrame`

In [None]:
name_and_team = nba[["Name", "Team"]]

In [None]:
name_and_team.head()

## Adding a Column to a DataFrame

In [None]:
name_and_team["Name"][0]

In [None]:
# NOTE: You can also set it to a list with the same dimension of the column
# NOTE: The warning that is given here on the scalar implmentation
# NOTE: There is a better way below, use insert
# The reason is because it is ambigious to pandas if you meant to assign to the pandas column or create a copy reference and overwrite the reference with the scalar
name_and_team["Booyah"] = "Scalar Example"

In [None]:
name_and_team.head()

In [None]:
# NOTE: This method gives you control of where the column goes
name_and_team.insert(1, "New Second Column", 0)

In [None]:
name_and_team.head()

## Broadcasting Operations

A single message is sent out and is received from all the cells in a `DataFrame`

### Special Note

When broadcasting to a column, you get a new column back with the operation applied

In [None]:
name_and_age = nba[["Name", "Age"]]

In [None]:
name_and_age.head()

In [None]:
# NOTE: Pandas does not throw an error if the cell value is empty
name_and_age["Age"].add(5)

In [None]:
name_and_age["Age"] + 100

In [None]:
name_and_age["Age"].sub(100)

In [None]:
name_and_age.head()

In [None]:
name_and_age["Age"] = name_and_age["Age"] + 5

In [None]:
name_and_age.head()

## Drop rows that have a `NULL` value

NOTE: Returns a new data frame, can do it `inplace` if you like.

In [None]:
nba.tail()

In [None]:
nba.dropna(how="all").tail()

In [None]:
nba.head()

In [None]:
nba.dropna().head()

In [None]:
nba.head()

In [None]:
nba.dropna(subset=["Salary"]).head()

## Fill in Null Values with the `.fillna()` Method

You can pass `inplace=True` to modify it in place

In [None]:
nba.head()

In [None]:
nba.fillna({"Salary": 0, "College": "No College"}).head()

## Convert `DataFrame` Column Types with the `astype()` Method

In [None]:
nba.dtypes

In [None]:
nba.info()

In [None]:
nba_copy = nba.copy()

In [None]:
nba_copy.fillna({"Salary": 0}, inplace=True)

In [None]:
nba_copy["Salary"] = nba_copy["Salary"].astype("int")

In [None]:
nba_copy.head()

In [None]:
nba_copy.info()

## Unique Values Within A Column

In [None]:
nba_copy["Position"].nunique()

In [None]:
nba_copy.info()

In [None]:
# NOTE: converting a column when appropriate can save memory, allow for alternate underlying sorting, and signal to other python code to use categorical related statistical methods
nba_copy["Position"] = nba["Position"].astype("category")

In [None]:
nba_copy.info()

## Sorting Data Frames

In [None]:
nba.head()

A cool feature, check out `na_position` to move `NaN` values to the appropriate sorting position

NOTE: `na_position` is not a great idea, it's probably better to rip or replace those values as it may be unreliable to try and grab the top 10 that are not `NaN` because the length has to be greater than 10 + the number of `NaN`.

In [None]:
nba.sort_values("Salary", ascending = False, na_position = "last").head()

In [None]:
nba.sort_values(["Team", "Salary"], ascending = False).head()

In [None]:
nba.sort_values(["Team", "Salary"], ascending = [True, False]).head()

## Sort Data Frame by Index

In [None]:
# The original sorted order
nba.head()

In [None]:
# NOW, the original indexes are not in order because we sorted by different columns
nba_sorted = nba.sort_values(["Team", "Salary"], ascending = [True, False])
nba_sorted.head()

In [None]:
# Lets put it back to it's original order
nba_sorted.sort_index().head()

## Rank Values with the `.rank()` Method

In [None]:
nba.tail()

In [None]:
nba_clean_salaries = nba.dropna(how = "all").copy() # clear all rows that are all Nan

In [None]:
nba_clean_salaries.tail()

In [None]:
nba_clean_salaries.fillna({"Salary": 0}, inplace = True)