Exploring Data
-----------------

In this directory, there is a file called `employees.csv`.  Let's use pandas to load it into a dataframe:

In [None]:
import pandas as pd
from datetime import date

# Read in the data
df = pd.read_csv('employees.csv', index_col='employee_id')

# Show a few rows
df.head()

Get statistics for a series
---

In [None]:
mean_waist = df['waist'].mean()
print(f"The mean of the waist series is {mean_waist:.2f} meters.")

The describe method gathers several statistics at once

In [None]:
df.salary.describe()

Edit series (no loops)
---

In [None]:
# Convert strings to dates for dob and death
df['dob'] = df['dob'].apply(lambda x: date.fromisoformat(x))
df['death'] = df['death'].apply(lambda x: date.fromisoformat(x))

# Make a new column
df['final_age'] = df['death'] - df['dob']

# Show a few rows
df.head()

Get info on categorical series
----

In [None]:
print("\n*** Gender ***")
series = df["gender"]
missing = series.isnull()
print(f"{missing.sum()} rows have no value for gender.")
series_counts = series.value_counts()
for value in series_counts.index:
    print(f"{series_counts.loc[value]} employees are \"{value}\"")                  