### Pandas

Library for working with `relational` or labeled data, built on top of NumPy package. [<u>more details</u>](https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/)

In [None]:
import pandas as pd

### DataFrame

A DataFrame is a `multi-dimensional table` made up from a collection of Series.

In [None]:
data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2]
}

df = pd.DataFrame(data)
display(df)

### Read CSV


Import a comma-separated values (csv) file into a `DataFrame`.

In [None]:
df = pd.read_csv("_data/titanic.csv")
df.head()

### Location

Both `iloc` and loc are very useful during search and data cleaning.

In [None]:
df = pd.read_csv("_data/titanic.csv")

print("Select 2nd to 4th:")
display(df.iloc[1:4])

df = df.set_index(df['Name'])
print("Select by index (name):")
display(df.loc['Allen, Miss Elisabeth Walton'])

### Conditional

Conditional selecting and `filtering` data are common tasks.

In [None]:
df = pd.read_csv("_data/titanic.csv")

females = df[df['Sex'] == 'female']
males_60 = df[(df['Sex'] == 'male') & (df['Age'] >= 60)]

print("Females:")
display(females)

print("Males over 60:")
display(males_60)

### Replace

Replace accepts `regex` regular expressions.

In [None]:
df = pd.read_csv("_data/titanic.csv")

df['Sex'] = df['Sex'].replace(['female', 'male'], ['Woman', 'Man'])
df['PClass'] = df['PClass'].replace(r'1st', 'First', regex=True)

display(df)

### Statistics

Pandas has multiple `built-in methods` for descriptive statistics.

In [81]:
df = pd.read_csv("_data/titanic.csv")

# Statistics (by Age)
A = pd.DataFrame()
A['max'] = [df['Age'].max()]
A['min'] = [df['Age'].min()]
A['avg'] = [df['Age'].mean()]
display(A)

# Value counts (by PClass)
A = pd.DataFrame()
A['PClass'] = df['PClass'].value_counts()
display(A)

# Unique values (by Sex)
A = pd.DataFrame()
A['unique_values'] = df['Sex'].unique()
display(A)


Unnamed: 0,max,min,avg
0,71.0,0.17,30.397989


Unnamed: 0,PClass
3rd,711
1st,322
2nd,279
*,1


Unnamed: 0,unique_values
0,female
1,male
