# Pandas Data Frames

This notebook contains exercises for Pandas Data Frames.

**At the end of each exercise there are cells containing `assert` statements that you can use to check your answers.**

In [None]:
import pandas as pd
import numpy as np
from utils.dataset_loader import FAKE_WHALE_DATASET_PATH
%autosave 30

## Exercise 1: Indexing Into Data Frames

---

You will now use pandas to extract some information from a dataset of fake whales.

It's useful to `print` what you select.
You can also put the variable you want printed at the last line of a cell.
This is especially useful with pandas objects, which render differently in Jupyter cells.

Printing/displaying like this often will let you know if, for instance, you're selecting rows when you mean to select columns, etc.

### Question 1.1

Load the fake whale dataset (its path is imported above in `FAKE_WHALE_DATASET_PATH`).
You will need to specify that column `0` contains the index

In [None]:
whale_df = ...

### Question 1.2

The pandas functions `df.head(N)` selects the first `N` rows of the DataFrame. 
Use it to examine the dataset and familiarize yourself with it.

*HINT: Put a single call to `.head` in the cell below instead of using `print`. Pandas objects render better when displayed like this in Jupyter notebooks.*

In [None]:
...

### Question 1.3

Select the size of all whales in the dataset.

If you use `whale_df.size`, what goes wrong?

In [None]:
whale_sizes = ...

### Question 1.4

Select the weight of the whale at index 567.

In [None]:
whale_567_weight = ...

### Question 1.5

Select the age and whale_type of the first 100 whales.

In [None]:
age_whale_type_first_100 = ...

### Question 1.6

Select the index of the largest whale (in size) in the dataset.
* You can use `whale_sizes` that you created above and which contains the sizes of all whales.
* Use `idxmax` (index-max) on this Series to find the index of the largest whale.

In [None]:
largest_whale_index = ...

#### Run these cells after finishing the exercise questions to check your answers.

In [None]:
assert whale_sizes.shape == (whale_df.shape[0],), "Shape of whale_sizes is wrong!"

In [None]:
assert age_whale_type_first_100.shape == (100, 2), "Shape of age_whale_type_first_100 is wrong!"

In [None]:
assert np.isclose(whale_567_weight, 6.641571451865368, rtol=1e-6), "Weight of whale 567 is wrong!"

In [None]:
assert largest_whale_index == 2045, "Wrong largest whale selected!"

## Exercise 2: Mathematical Operations

---

You will now use pandas to compute some statistics about whales.

In [None]:
# Run this cell first to display some more information about the dataset.

whale_df.describe()

### Question 2.1

The above cell displays `whale_df.describe()`.

What kind of object is returned by `whale_df.describe()`? Store it into the variable `desc`.

In [None]:
desc = ...

### Question 2.2

Extract from `desc` the mean (average) size of whales.

In [None]:
mean_size_whales = ...

### Question 2.3

Extract the mean (average) age of the first 100 whales.

*HINT: You need to use the original DataFrame for this question.*

In [None]:
mean_age_first_100_whales = ...

### Question 2.4

Extract the total weight of all whales.

In [None]:
total_weight_all_whales = ...

### Question 2.5

Extract the age of the oldest whale.

*HINT: Use `.max()` on the 'age' column of the original DataFrame.*

In [None]:
oldest_whale_age = ...

#### Run this cell after finishing the exercise questions to check your answers.

In [None]:
assert np.isclose(mean_size_whales, 6.621894, rtol=1e-4), "Mean size of whale is wrong!"

In [None]:
assert np.isclose(mean_age_first_100_whales, 37.78, rtol=1e-2), "Mean age of first 100 whales is wrong!"

In [None]:
assert np.isclose(total_weight_all_whales, 117907.7810983918, rtol=1e-5), "Total weight of all whales is wrong!"

In [None]:
assert np.isclose(oldest_whale_age, 114, rtol=1e-5), "Oldest whale weight is wrong!"