# Exercise 2: Notebooks tutorial

Here we learn to use ipynb notebooks and a few Python libraries that will be helpful for working with data.

## Computational notebooks

One nice affordance of computational notebooks is that we can interleave blocks of text (like this one) written in [Markdown](https://www.markdownguide.org/getting-started/) with chunks of `code` written in Python in our case. This style of coding is called **"literate programming"**. It facilitates process documentation, reporducibile analyses, and reflection on analysis choices. 


Some advised practices for literate programming are:

1. Split up your code into distinct high level operations (e.g., load data).
2. Say why you are doing things. Why did you transform the data?
3. Interpret your charts in text. What patterns do you see? What do they mean for your analysis?
4. Comment your code if syntax isn't obvious.

The idea is that somebody unfamiliar with your work should be able to read this documentation of your work and repeat your analysis for themself, including your thought process. This is critical to doing data science.

## Practice!

Let's practice literate programming by filling in the brief analysis below. Each time you see a codeblock with the comment `# PROMPT:...` or text *PROMPT: ...*, you should write code or text to complete the analysis.

For most of the quarter, you will write documents like this one yourself. However, you will first ease into literate programming by filling in this document.

### Loading data

One of the first things we will want to do in many computational notebooks is load a dataset. In our case, you will load the `storms.csv` dataset.

In [2]:
# PROMPT: load the storms dataset based on its relative path
import pandas as pd



Let's have a first look at this dataset.

In [None]:
# PROMPT: use the head method to see the data in a table

*PROMPT: what do you notice about the column labels? What do you notice as you scan across rows of the data table?*

### Visual exploration

Now we will practice making queries against this dataset and rendering views to show us specific combinations of variables or cross sections of the data. Creating some of these views will require a little bit of data wrangling.

In [None]:
# PROMPT: use the altair library to investigate the relationship between wind and pressure
import altair as alt
import vegafusion as vf
vf.enable(row_limit=50000)



*PROMPT: what do you notice about the chart you just made?*

Now, let's see how wind and pressure are related to other variables.

*PROMPT: create three charts below showing the relationship between wind and pressure and other variables in the dataset. for each chart you create, add a cell below interpreting it. you should add 6 cells total.*

One thing we notice when exploring the data is the status column. Let's see unique values of the `status` variable.

In [None]:
# PROMPT: print unique values of the status column

Let's create an indicator variable `is_hurricane` to differentiate hurricanes from other storms in the dataset.

In [16]:
# PROMPT: add a hurricane indicator varible to the dataframe

Now let's see where the hurricanes end up in our distribution.

In [None]:
# PROMPT: plot the distribution of wind and pressure highlighting hurricanes in color

Let's take a look at where hurricanes end up in different geographic regions. We do this by binning `lat` and `long` coordinates and using these bins to facet our preferred visualization of `wind`, `pressure`, and `is_hurricane`.

*PROMPT: Bin the lat and long variables to create a 4-by-4 grid of plots showing the relationship between wind, pressure, and is_hurricane across different geographic zones measured in the data*

In [21]:
# HINT: you can use the `pd.cut` method to bin lat and long coordinates

*PROMPT: what do you notice about the chart you just made?*

Now let's look only at the storms that are hurricanes.

In [23]:
# PROMPT: create a filtered version of the dataframe containing only the storms that were designated as hurricanes

Let's take a look at storm wind and pressure grouped by hurricane category.

In [None]:
# PROMPT: using the filtered data, plot wind and pressure by hurricane category

*PROMPT: what do you notice about the chart you just made?*

Now, let's create a version of this chart where we superimpose lines representing the trend between wind and pressure within each category. 

We'll start by creating the line chart showing mean pressure by wind and category.

In [None]:
# PROMPT: using the filtered data, plot the trend of wind x pressure within categories

*PROMPT: what do you notice about the chart you just made?*

Now, let's layer our line chart on top of the chart we created earlier comparing wind, pressure, and hurricane category.  

In [32]:
# PROMPT: using the filtered data, plot the trend of wind x pressure within categories as a scatterplot with superimposed mean lines
# HINT: you can superimpose layers in altair using the `+` operator

As a final challenge, let's add an additional layer showing one standard error around our lines as an interval.

In [None]:
# PROMPT: add an interval representing the standard error of the mean to the scatterplot with lines superimposed
# HINT: you'll need a mark type called `mark_errorband` for this

*PROMPT: what do you notice about the chart you just made?*