# Data Vis: Visualizing Numerical and Categorical Data
* Notebook 1: Visualizing Proportions

## Setup

In [None]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import squarify
import seaborn as sns

## Data

In this notebook, we will use the NYC Flights 2013 dataset, which contains information about all domestic flights that departed from NYC in 2013. The dataset includes the following tables:
- `flights`: Contains information about each flight, including the origin and destination airports, departure and arrival times, and delays.
- `planes`: Contains information about the planes, including their tail numbers and model years.
- `airports`: Contains information about the airports, including their names and locations.
- `airlines`: Contains information about the airlines, including their names and IATA codes.
- `weather`: Contains information about the weather at the origin airports, including temperature, wind speed, and precipitation.

In [None]:
data = pd.read_csv('flights_joined.csv')

In [None]:
data.shape

In [None]:
data.head()

## Treemap

A treemap is a fancy way to visualize proportions. It is a hierarchical visualization that uses (nested) rectangles to represent the proportions of different categories within a dataset. The size of each rectangle is proportional to the value it represents, and the color can be used to represent another variable. Unfortunately, seaborn does not have a built-in treemap function, but we can use the `squarify` library to create one (it uses `matplotlib` in the background). 

But before we can create a treemap, we need to aggregate the data (in this case by `origin`).

In [None]:
data_grouped = data.groupby('origin').size().reset_index(name='num_flights')
data_grouped.head()

Now we can use the `squarify` library to create a treemap. The `squarify` library requires the following parameters:
- `sizes`: A list of the sizes of each rectangle (in this case, the number of flights from each origin).
- `label`: A list of the labels for each rectangle (in this case, the origin airport codes).
- `color`: A list of colors for each rectangle (in this case, the number of origin airports).

In [None]:
plt.figure(figsize=(16, 9))
squarify.plot(sizes=data_grouped['num_flights'], label=data_grouped['origin'],
  color=sns.color_palette('viridis', n_colors=len(data_grouped['origin'])),
  text_kwargs={'color': 'white', 'fontweight': 'bold', 'fontsize':12})
plt.axis('off')
plt.title('Treemap of Number of Flights by Origin')
plt.show()


`squarify` does not support nested treemaps. Have a look at `plotly` for a more advanced interactive treemap visualizations: https://plotly.com/python/treemaps/

Now it's your turn. Create a heatmap of different part-whole relationship in the dataset...

In [None]:
# YOUR CODE HERE