# Lab 04 Functions and Visualization

<i>Elements of Data Science</i><br><br>
Welcome to lab 4!
This week, we will focus on functions and visualization. <br>Functions are described in [Chapter 8](https://inferentialthinking.com/chapters/08/Functions_and_Tables.html) of the Inferential Thinking text. <br>Visualizations is covered in [Chapter 7](https://inferentialthinking.com/chapters/07/Visualization.html).
<br>**<center>Learning Goals**
|Area|Concept|
|---|---|
|Tables|Load and analyze data sets. |
|Time Trends|Using EDS module to examine and plot time trends in datascience Tables|
|Visualization|Line plot and scatter plots using matplotlib and `ptrend`
|Functions|Learn to define your own functions and apply them to arrays and Table columns|

First, set up the tests and imports by running the cell below.

**Enter your name as a string**

In [None]:
name = 

In [None]:
import numpy as np
from datascience import *
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import matplotlib.dates as mdates
from matplotlib import ticker
from gofer.ok import check # This line loads the tests.
import os
user = os.getenv('JUPYTERHUB_USER')

from EDS_mod.EDS_mod import *
from gofer.ok import check
notebooks = glob.glob('*.ipynb')
notebook = max(notebooks, key=os.path.getmtime)

## Part 1: A quick review of tables
### Creating a data table
We will begin by creating a simple data table. Most of the time, we will read the data from a file, but for this demonstration table we will create it from scratch using the `with_columns` method. Recall that a table is comprised of columns. Each column had a label (the column header) and an an associated array of data (string or numeric) and that all of the columns must be the same length.

In [None]:
# Define the data
animal_data = Table().with_columns(
    "Animal", ["Elephant", "Giraffe", "Lion", "Tiger", "Zebra"],
    "Weight (kg)", [6000, 1600, 200, 300, 350],
    "Height (cm)", [300, 550, 120, 100, 140],
    "Lifespan (years)", [70, 25, 15, 20, 25]
)
animal_data

### Some common table methods

|Method|Example|Result|
|-|-|-|
|show()|show(3)| Display the first 3 rows of a table|
|drop()|drop["Height (cm)"| Returns a new table without the "Height (cm)" column|
|select()|select(["Animal", "Weight(kg)"])| Returns a new table with just the two columns specified|
|column()|column("Lifespan (years)")| Returns that data array from the specified column|

All of these methods and more can be found in the [online data tables reference](https://www.data8.org/datascience/tables.html).

A common point of confusion is `select` versus `column.`

In [None]:
# Returns a table of the one or more selected columns
animal_data.select("Lifespan (years)")

<font color=blue> **Question 1. Extracting an array from a table** </font><br />
Extract the array of animal weights from the data table.

In [None]:
weights = ...
weights

In [None]:
check('tests/q1n.py')

We often extract data a data column when we want to calculate some value from the data. For example, if we wanted to know the maximum weight.

In [None]:
max(weights)

We often want to calculate basic statistics on data table, so there is a method for this. 

In [None]:
animal_data.stats()

The `stats()` method gives us the minimum, maximum, median, and sum for each column. For the column containing string data (animal names) the min and max are the first and last entries alphabetically, respectively. The median and sum have no meaning for string data, so they are blank.

There are also method for finding the number of rows and columns in a table. These come in handy for large data tables.

In [None]:
animal_data.num_rows

In [None]:
animal_data.num_columns

Do you remember how to sort the table? Let's sort by Lifespan.

In [None]:
animal_data.sort("Lifespan (years)")

<font color=blue> **Question 2. Sort Descending** </font><br />
Sort the data by weight, but in ascending order from lightest to heaviest.<br>
*Hint*: You need the `descending=False` option in your sort method.

In [None]:
animal_data_heavy_to_light = ...

In [None]:
check('tests/q2n.py')

#### Save our important animal data for latter!
Saving data to a file is as important as creating and analyzing data. A `.csv` or comma separated value file is the most standard format since computer code and spreadsheet programs can read this text format. Below replace the `...` with your own filename. Note that we add or what we call "append" a `.csv` file ending to be sure the file is recognizable as data.

In [None]:
myanimalfile = "..." + ".csv"
print("File name = ",myanimalfile)
animal_data.to_csv(myanimalfile)

In [None]:
check('tests/q3n.py')

## Part 2: Writing Functions

Until now, you have just been using functions, starting way back in Lab01 when you learned to use the `print()` function. Now it is time to graduate from function user to function creator. You will continue to use functions, but you will also write your own! By breaking down complex problems into smaller, well-defined functions, you enhance code readability and make your code easier to understand, debug, and modify.

Functions will seem complicated at first, but they actually follow a simple pattern. We will start with super-simple function and gradually add complexity to imprint this pattern into your Python DNA.

### Function 1: A super simple function. No parameters, no return value.

In [None]:
def print_one_dad_joke():
    print("What did one plant say to the other? Aloe! Long thyme no see.")

- `def` is short for "define." You have to define a function before you can use it
- Then you give the function a name. Function names follow the same rules as Python variables. This function is named `print_one_dad_joke`.
- Right after the function name you put any variables that will be passed to the function inside parenthesis. These are called the function's parameters, or arguments. This function has none, so the parentheses are empty: `()`. We'll come back to this later.
- After the parentheses we put a colon to mark the start of the body of the function. This is where the function does its work.
- All the lines that comprise the body of the function must be indented. When you stop indenting, Python knows you have finished defining your function. This function has only one line, a print statement.

Now that the function has been defined, we can use it.

In [None]:
print_one_dad_joke()

## Function 2: Add a function parameter.

Our super simple function has no flexibility. Everytime you call it, it tells the same joke. (Hmmm. That is rather dad-like...)
Let's add an argument to the function to allow it to print whatever dad joke we pass it.

In [None]:
def print_any_dad_joke(joke):
    print("Dad joke: ", joke)

In [None]:
dad_joke1 = "I only seem to get sick on weekdays. I must have a weekend immune system"
dad_joke2 = "What brand of underwear do do chemists wear? Kelvin Klein."

print_any_dad_joke(dad_joke1)
print_any_dad_joke(dad_joke2)

This new function takes one parameter, `joke`, and uses it in a print statement. Notice that the name of the variable you pass the function doesn't matter. Whatever you pass it will be renamed 'joke' inside the function. When we called the function with the statement `print_any_dad_joke(dad_joke1)` the variable `dad_joke1` was passed to the function where it became a new variable `joke` that exists only inside the function. If the variable joke is changed inside the function, it doesn't alter any of the variables outside of the function.

Let's test this.

In [None]:
def print_any_dad_joke(joke):
    print("Dad joke: ", joke)
    joke = "I hate my job—all I do is crush cans all day. It’s soda pressing."

joke = "How do cows stay up to date? They read the Moo-spaper."
print_any_dad_joke(joke)
print(joke)

In this version of the function, the variable joke was changed inside the function, but not outside of the function, so even after you run the function the variable `joke` contains the first joke. *This is actually very important.*  When you write a function for others to use, you have no idea what variables they might already have in their program; you don't want any variables you define in your function to accidentally change some value in their program. **What is happens in a function stays in a function.**

What if you **want** to get something back from the function? Then you need to add a `return` statement.

### Function 3: Return a value from a function.

Let's write a function that accepts a dad joke (well actually, any string) and returns it in all capital letters.

In [None]:
def capitalize_dad_joke(joke):
    return joke.upper()

In [None]:
dad_joke = "Where do pirates get their hooks? Second hand stores."

joke = capitalize_dad_joke(dad_joke)
print(joke)

This function no longer prints the joke, it just return a capitalized verion. 

## Function 4: A function that takes more than one input parameter.
Your function can be designed to accept more than one input parameter. Let's write a function that accepts two, yes two, dad jokes and returns the total number of characters in the two jokes combined.

In [None]:
def total_joke_length(joke1, joke2):
    length_joke1 = len(joke1)
    length_joke2 = len(joke2)
    return length_joke1 + length_joke2

In [None]:
dad_joke1 = "What do you call a beehive without an exit? Unbelievable."
dad_joke2 = "Did you know that the first french fries weren’t cooked in France? They were cooked in Greece."

total_joke_length(dad_joke1, dad_joke2)

There are a couple of things to notice here:
- First, as mentioned before, the variables inside the function have the names given in the function definition, not the names passed to the function.
- Second, we used the `return` statement to pass back total number of characters in the two jokes.
- Third, these jokes are killing me!

### Function 4: A function can return more than one value
Just as a function can be defined to accept multiple input parameters, functions can return multiple output parameters. Let's modify this last function to return the length of the both jokes rather than the total.

In [None]:
def joke_length(joke1, joke2):
    length_joke1 = len(joke1)
    length_joke2 = len(joke2)
    return length_joke1, length_joke2

In [None]:
dad_joke1 = "If prisoners could take their own mug shots…They’d be called cellfies."
dad_joke2 = "I just broke up with my mathematician girlfriend. She was obsessed with an X."

length1, length2 = joke_length(dad_joke1, dad_joke2)
print("Length of joke 1: ", length1)
print("Length of joke 2: ", length2)

The function returns two values, so when calling the function you can put two variable on the left of the `=` to receive the return values.

### Function 5: Keyword parameters
Sometimes you want a function that has options. You want to give the user choices in how to use the function without having to write multiple versions of the same function. This can be accomplished using `keyword parameters.` These parameters have default values that will be used if the user doesn't change them.

Let's write a function that takes a dad joke and returns the length of the joke, but gives the user the option of also printing the joke, or not.

In [None]:
def joke_length_with_optional_print(joke, print_joke=False):
    if print_joke == True:
        print(joke)
    return len(joke)

In [None]:
dad_joke = "If a pig loses its voice…does it become disgruntled?"
joke_length_with_optional_print(dad_joke)

We called the function without supplying the keyword parameter, so it used the default value of `False`, and did not print the joke. (You will learn about `if` statements and other conditionals in the next lab.)

In [None]:
joke_length_with_optional_print(dad_joke, print_joke=True)

This time we changed the value of the keyword parameter to `True`, so the function printed the joke. You might recognise that you have used keyword parameters before, such as when you used the Table sort method with `descending=True`, which changed the default sort behavior for a table.

**Important point: In Python functions, keyword parameters must come after regular parameters**, so in `joke_length_with_optional_print(joke, print_joke=False):` joke came before print_joke. This is true both when defining and when calling the function.

## Functions and algebra

Functions return a value(s) for values of one or more variables or arguments. In algebra we develop the concept of functions such as the following:
$$ f(x) = 3 \cdot x-5 $$
If we substitute the value:
$$ x = 3$$
$$f(3) = 3 \cdot 3-5 = 4$$

This function can be coded in Python in the following straightforward way:
```python
def f(x):
    result = 3*x - 5
    return result
```
To compute the value of the function f(x) at x = 3:
```python
f(3)
4
```

<font color=blue> **Question 3. Functions and algebra** </font><br />

Define a Python function for the following algebraic function:
$$ f(x) = 2 \cdot x + 5 $$

In [None]:
def f(x):
    ...
    return ...

Use your function to evaluate the function f(x) at x = 4

In [None]:
f(...) # test function

In [None]:
check('tests/q3a.py')

Note something special. If you call the function with an input parameter that contains multiple values, such as an array, it will compute the function on all of those values automatically. How cool is that?

In [None]:
some_numbers = np.array([1, 3, 5, 7, 9])
f(some_numbers)

## Part 3: Plotting Data

We often want to visualize our data, particularly when we have a data table with a large number of points. There are many data visualization packages available for Python, but in this course we will mainly stick with one of the oldest and best known: `matplotlib.` If you look back at the start of the lab where we import the module you will see a the line:

import matplotlib.pyplot as plt

We aliased the library as plt so that instead of typing `matplotlib.pyplot` can access the plotting commands with just the `plt.` prefix. 

The plotting commands in matplotlib expect array or lists of data, not tables so we have to extract the data arrays from the tables to make our plot. For example, let's make a scatter plot of animal height vs weight.

**Read back the animal data**
The `.readtable()` method of datascience Tables is the way to read comma separated value (`.csv`) files into a Table variable. 

In [None]:
animal_data = Table().read_table(myanimalfile)
print("Loaded my data Table from the file: ",myanimalfile) 

In [None]:
# Extract the data arrays
height = animal_data.column("Height (cm)")
weight = animal_data.column("Weight (kg)")

# Make the plot
plt.scatter(height, weight)

By default, matplotlib scales the axes to match the data ranges, and choose the symbol and color for you. All of this can be customized, and you can add axes labels, a graph title, and much more.

In [None]:
# Extract the data arrays
height = animal_data.column("Height (cm)")
weight = animal_data.column("Weight (kg)")

# Change the color of the points
plt.scatter(height, weight, color="blue") 

# Label the axes
plt.xlabel("Height (cm)")
plt.ylabel("Weight (kg)")

# Add a title
plt.title("Animal Weight vs. Height")

Scatter plots are not the only option. You can make a line plot, which connects the points. Line plots are useful for data showing a trend, but here connecting the points doesn't make much sense.

In [None]:
# Extract the data arrays
height = animal_data.column("Height (cm)")
weight = animal_data.column("Weight (kg)")

# Line plot
plt.plot(height, weight)

# Title
plt.title("Connecting the dots makes no sense in this case")

Or you can make a bar chart, which is useful to plot numbers versus a category.

In [None]:
# Extract the arrays
animal = animal_data.column("Animal")
weight = animal_data.column("Weight (kg)")

# Bar chart
plt.bar(animal, weight)

# Label y-axis
plt.ylabel("Weight (kg)")

What if we wanted the bar chart to display the weights in order from lightest to heaviest? Then we would first sort the table before extracting the data columns.

In [None]:
# Sort the data
animals_light2heavy = animal_data.sort("Weight (kg)")

# Extract the arrays
animal = animals_light2heavy.column("Animal")
weight = animals_light2heavy.column("Weight (kg)")

# Bar chart
plt.bar(animal, weight)

# Label y-axis
plt.ylabel("Weight (kg)")

### Table plot commands
Because plotting table data is so common, there are plotting methods built into tables, but you should know that these convience methods are calling matplotlib behind the scenes. 

In [None]:
# Using the table scatter() method, you just pass the column names
animal_data.scatter("Height (cm)", "Weight (kg)")

Notice that the `scatter` method for tables was smart enough to use the column labels to label the axes. 

In [None]:
# This also works for bar charts
animal_data.bar("Animal", "Weight (kg)")

**Why not always use the convenience functions built into tables?** In a word -- flexibility. If you are happy with the defaults, you can use the table plotting functions, but somtimes to customize you need work directly with matplotlib. Matplotlib is also a logical choice if your data is not already in a data table.

You will see many examples of both approaches in subsequent labs.

<font color=blue> **Question 4. Make your own plot** </font><br />
Get creative. Make any type of plot you choose using `animal_data`. 

**Note:** There is no automatic check in this case.

### Putting it all together
Let us, however reluctantly, leave the world of dad jokes, and write a function that illustrates many of these concepts in mathmatical context.

Write a function that:
- Plots a polynomial $ y = ax^2 + bx + c $
- Input parameters are the coefficients a, b, and c.
- Optional input parameters are the starting x and ending x value (two keyword parameters with default values specified)

In [None]:
def poly_wants_a_nomial(a, b, c, xlo=-10, xhi=10):
    """Plot a polynomial"""
    x = np.arange(xlo, xhi, 0.5)
    y = a * x**2 + b * x + c
    plt.plot(x, y, '-*')
    return y

In [None]:
# Call the function accepting the default values for the keyword parameters
y = poly_wants_a_nomial(2, 10, 3)

In [None]:
y = poly_wants_a_nomial(2, 10, 3, xlo=-15)

In [None]:
# Call the function overriding the default values for xlo and xhi
y = poly_wants_a_nomial(2, 10, 3, xlo=-15, xhi=20)

<font color=blue> **Question 5. Ideal gas density function** </font><br />
You will define a function to compute the density of an ideal gas from the parameters pressure, P, temperature, T, and molecular mass, M, and then plot a computed array of densities given an array of temperatures for a fixed pressure and the molecular mass of water.

Write a function which computes the density of an ideal gas off a given molecular weight, temperature, pressure. 
$$ PV = nRT $$
$$ \frac{n}{V} = \frac{P}{RT} $$
To convert to grams from number of moles we use the molecular mass, $ M $. <br>Water has a molecular mass of $ M = 18.0 \frac{g}{mol}$ <br>
density is given the symbol $\rho $ and has units of $ \frac{g}{L} $ <br><br>
$$  \rho = \frac{M\cdot P}{R\cdot T} $$



In [None]:
def density(P, T, M):
    """ Computes density of a gas with parameters:
        pressure, P, temperature, T, in K
        and molecular mass, M """  
    
    R = 0.082057
    ...
    return ...

Test the function by calculating the density of water vapor (gas, $ M = 18.0 \frac{g}{mol}$ ) at `P = 1` (atm) and 298 K. R is the gas constant.<br> $$ R = 0.082057 \frac{L\cdot atm}{K\cdot mol}$$

In [None]:
# Substitute values for P, T, and M
P = ...
T = ...
M = ...

waterdensity = density(P,...,...)

print(f'Water density at {T}K, Pressure of {P} atm is {waterdensity:.2f} g/L')

In [None]:
check('tests/q5a.py')

Now create an array with temperatures in Kelvin from freezing, 273.15, to 313 ($40^\circ$ C) in 1.0 degree steps With the array, create a new array using the above `density` function. Make a scatter plot of these arrays.

In [None]:
temperatures = np.arange(...,...,...)

Use the same approach to plot the density computed with your function versus the temperature. Hint: you can create an array of temperatures and pass this to your density function to compute densities for given temperatures or use list comprehension like with the ADK peaks in Lab 02.

In [None]:
densities = [density(P, ..., M) for ... in temperatures]

In [None]:
plt.scatter(temperatures, ...)

plt.xlabel('Temperature [K]')
plt.ylabel('Density [g/L]')
plt.title('Water Vapor Density [g/L]')

### Time Trends and Dates with Data Science Tables

We will use the EDS module to handle dates in Tables. The EDS module is just a collection of predefined functions
to save you writing the same code over and over. 

Date formatting is a challenge when creating a plot of the data over time. We will see below that the standard Table and matplotlib functions can not correctly interpret dates.
- To plot a time trend using EDS, use `ptrend(tbl_variable, date column label string, data column label string)`

#### Examples
Using the 5 year Google Trend search volume for Chemistry, Biology, and Nobel Prize. 

#### Like before, we read a CSV file.

In [None]:
Nobel = Table().read_table("data/Nobel_2023.csv") 
Nobel

In [None]:
prize = Nobel.column("Nobel Prize: (United States)")
chem = Nobel.column("Chemistry: (United States)")
bio  = Nobel.column("Biology: (United States)")

plt.plot(prize)

#### <font color=green> We see there is a peak of Google search interest in the pattern but the date of the peak is hard to identify at a bit less then 40(?). What does 40 mean? The 41st data point.

#### Use ptrend (short for plot trend) to plot this data.
- This function expects you to provide a table name, and the time column label and column label to be plotted.

We can use the `ptrend` function to plot the Google Trend Level for topics related to the Nobel Prize to see if there is a relationship with the date on which the award notifications are made each Fall and with some topical areas of Nobel Prize awards.<br>
Plot a time trend using EDS, use `ptrend(tbl_variable, date column label string, data column label string)`
- date column label string = "Week"
- data column label string = "Nobel Prize: (United States)"

In [None]:
ptrend(Nobel,"Week","Nobel Prize: (United States)")

#### <font color=green> Now we can see the peak is around October of 2023 when the prize is regularly awarded.

<font color=blue> **Question 6. Plotting Nobel prize and related science topic time trends** </font><br />
In preparing to look at disease trend data including COVID data we will first plot Chemistry and Biology [Google Trend](https://trends.google.com/trends/) search volumes for the period included in the Nobel data above. The Google Trend data gives the relative search volume as a function of day or week over a time period. An example of Google Trend data is searching for the trend of Turkey, Thanksgiving and Football as shown below.


<img src="turkey_trend.png" alt="Turkey Google Trend" style="width: 800px;"/>

Examine this data for the Nobel prize and Biology. Nobel prizes are announced early October annually and awarded December 10 at 7:00 AM in honor of Alfred Nobel's death.  Look for the very small peak around this date in the Nobel Prize Google search volums data.

**Chemistry**<br>
Use column labeled, `'Chemistry: (United States)'` to plot Chemistry Nobel prize search volume in `Nobel` Table. We can overlay two plots by placing the `ptrend` functions on susequent lines as below.

In [None]:
# Replace the ... with column name.
# Use the example above as a model.

In [None]:
ptrend(Nobel,"Week", "Nobel Prize: (United States)")
ptrend(Nobel,"Week", ...)

**Biology**

In [None]:
# Now try filling in all of the missing fuction arguments.
# This time plotting Biology Nobel prize winners.
ptrend(Nobel,"Week", "Nobel Prize: (United States)")
plotcheck = ptrend(...)

In [None]:
check('tests/q6new.py')

## Part 3 Application: Data Tables for Tracking Disease


### Now we are ready use our datascience toolbox including Tables to explore real data!
**Let's explore data from the COVID pandemic**<br>
This data is updated and stored at GitHub: https://github.com/nytimes/covid-19-data <br>
US rolling average: https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us.csv <br>
US States rolling average: https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us-states.csv <br>
US Wastewater Surveillance: [https://covid.cdc.gov/covid-data-tracker/#wastewater-surveillance](https://covid.cdc.gov/covid-data-tracker/#wastewater-surveillance) 

#### Set a variable to contain the path to the data file on the internet.
- `.csv` is a CSV file, which stands for "comma separated variables," which
is a common human-readable for format for data files as described above with our animal data.

In [None]:
COVID_data = 'https://raw.githubusercontent.com/nytimes/covid-19-data/master/rolling-averages/us.csv'

#### Read the data file into a data table assigned the name "COVID"

In [None]:
COVID = Table.read_table(COVID_data)

#### Show data table show() method to display the first three rows of the data table

In [None]:
COVID.show(3)

Now we can sort the data by date. Since the data starts at the beginning of the pandemic we see very few cases.

### Plot
If we attempt to plot using the 'date' column the bottom axis has starnge numbers which are multiplied by 1e9 ($1\cdot 10^9$) as shown in the lower right corner of the plot. These are the number of seconds from the epoch (January 1, 1970).  Another name for this time unit is UNIX time which has an [interesting history](https://en.wikipedia.org/wiki/Unix_time#:~:text=History,-Learn%20more&text=The%20earliest%20versions%20of%20Unix,two%20and%20a%20quarter%20years.). This unit of time is a way that is convenient for computers to store time as an integer but not at all convenient for us as data scientists! Using the `ptrend` function from the EDS module will alleviate this problem.

##### This example uses the built in plot() method of data tables,
- the date is in seconds -- not very human-friendly.
- The datascience module wasn't created with time series data in mind.

In [None]:
COVID.plot("date", "cases_avg_per_100k")

#### Now use ptrend

#### The `ptrend()` method described above in Question 8 using dates -- much nicer!

In [None]:
ptrend(COVID,"date","cases_avg_per_100k")

**Approximately when was the peak date for COVID cases?**

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q9a_open_ended.py")

### Histogram
Many of the matplotlib plot are available as methods availabe to data tables. A histogram method, for example, is realized by appending .hist('column name')

**Data tables have a built-in histogram method, `.hist()`**<br>
Let's use this function to look at the average daily US deaths during the pandemic

In [None]:
COVID.hist('deaths_avg')

<font color=blue> **Question 9.** </font>

In order to study the true impact of the COVID pandemic we need a measure of the fraction of cases which result in death. To do this we will define a function which takes the ratio of two numbers or arrays which we will ultimately use to create a new column to study this fraction of cases that result in deaths. First we examine the range of values for cases, cases_avg, deaths, and deaths_avg using `COVID.stats()`

#### First let's check on the range of our data

In [None]:
COVID.stats()

#### After running the `.stats()` what do you notice? Are there some anomolous minimum values? Can we use the deaths_avg and cases_avg instead of deaths and cases?

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q9a_open_ended.py")

To examine the COVID data more carefully it is useful to think about which cases lead to death and if the deaths per case changes as better treatments are available.  We will need to compute the ratio of `deaths_avg` to `cases_avg`. Ratio is the same as dividing two numbers. On January 3, 2021, `deaths_avg` was 2606.68 and `cases_avg` was 212412 so the deathrate = 2606.68/212412 = 0.0123.

**Now create a function `ratio()` that returns the ratio of two numbers**

In [None]:
def ratio(x,y):
    """ Returns a ratio of x/y """
    r = ...
    return r

In [None]:
check('tests/q9b.py')

<font color=blue> **Question 10. deathrate** </font> <br>
How many cases were fatal? A better way to look at this is the fraction of cases that were fatal.<br>

Now apply your function to create a new column, *deathrate*. Examine the histogram for deathrate. Now plot the trend for *deathrate* for the entire timeperiod of the dataset.  Discuss the results in the markdown cell below.

### COVID cases leading to bad outcomes
Now we will use the function with our COVID data to explore the deathrate throughout the pandemic. 
1. Create arrays from the `"deaths_avg"` and `"cases_avg"` columns.
2. We will compute a new numpy array which will store the result of using the ratio function on the arrays extracted from the Table.
3. We will use the *with_columns* method of a Table object to place the new `deathrate` array in a new column. 

**First create an array, `deathrate`, that stores the ratio of the `deaths_avg` and the `cases_avg`**

In [None]:
deaths_avg = COVID.column('deaths_avg')
cases_avg = COVID.column('cases_avg')

deathrate = ratio(deaths_avg, ...)

**What is the maximum and minumum deathrate in the Table? What does a deathrate of 1.0 mean?**<br> Use code and then answer below.<br>Hint: use `max()` and `min()` functions<br>code:

Answer below:

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q10_open_ended.py")

**Now use `.with_columns()`, to put `deathrate` array into Table as a new column labelled 'deathrate'**

In [None]:
COVID = ...
COVID

This new column of data in the COVID table contains the deathrate, we can now proceed with the visualization.<br>
**Prepare a time series plot of deathrate using `ptrend`**<br>
Hint: See above for plot of time trend for original COVID data Table and above that for Nobel prize data

In [None]:
...

In [None]:
check('tests/q10a.py')

**When was the COVID deathrate highest? Approximately what was the deathrate on that date?**

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q10b_open_ended.py")

## Comparison with recent data based on the Centers for Disease Control (CDC) Wastewater analysis
Wastewater monitoring has become the statndard for epidemiological monitoring: 
[CDC Wastewater Program](https://www.cdc.gov/nwss/wastewater-surveillance.html)

In [None]:
COVID_waste_water = 'data/CDC_COVID_Waste_Water.csv'

In [None]:
COVID_waste = Table().read_table(COVID_waste_water)
COVID_waste

**Look at the National trend**

In [None]:
ptrend(COVID_waste,'date','National')

**Now let's overlay the COVID deathrate column with the wastewater data**<br>Wastewater monitoring is the new standard but did not start until late in the pandemic and does not include data on deaths since the wastewater signature of COVID is indirect and related to viral particles from sewage. The scale of the two data sets may not overlay given our maximum deathrate of 1.0 and a larger value for wastewater values.

In [None]:
ptrend(COVID_waste,'date','National')
ptrend(COVID,'date','deathrate')

**Create new COVID data column which is %**<br>
To plot both trends on the same scale we can create a column which is the percentage by multiplying our original deathrate column by 100.

In [None]:
COVID = COVID.with_columns('death%', deathrate * ...)

**Now let's overlay the COVID death% column with the wastewater data**<br>Wastewater monitoring is the new standard but did not start until late in the pandemic and does not include data on deaths since the wastewater signature of COVID is indirect and related to viral particles from sewage. 

In [None]:
ptrend(COVID_waste,'date','National')
ptrend(COVID,'date','death%')

**What do you notice? Why might the first wastewater peak in January 2022 be earlier then the deathrate peak?**

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q10c_open_ended.py")

#### A quick look at newer CDC COVID death data compared to wastewater data

In [None]:
COVID_deaths_CDC = 'data/CDC_COVID_deaths.csv'

In [None]:
COVID_deaths = Table().read_table(COVID_deaths_CDC)
COVID_deaths

In [None]:
ptrend(COVID_deaths,'Date','Weekly Deaths')

In [None]:
ptrend(COVID_deaths,'Date','Weekly Deaths')
ptrend(COVID,"date",'deaths_avg')

In [None]:
ptrend(COVID_deaths,'Date','Weekly Deaths')
ptrend(COVID,"date",'deaths_avg',7) # CDC data is weekly so we use a feature of ptrends to multiply by 7 to match amplitude

### RSV monitoring now also carried out

In [None]:
RSV_waste_water = 'data/CDC_RSV_Waste_Water.csv'

In [None]:
RSV_waste = Table.read_table(RSV_waste_water)
RSV_waste

In [None]:
ptrend(RSV_waste,'date','National')

**What can be said about the annual trends in RSV?**<br>
RSV (Respiratory Syncytial Virus) is particularly dangerous for infants and older adults.

---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNGING THIS TEST
check("tests/q10d_open_ended.py")

#### <font color=blue> Your discussion of results from COVID and waster monitoring of diseases:</font>
 What are your overall findings when comparing COVID data from various sources? Is the wastewater data reliable based on your observations? CDC is adding Mpox and and Influenza A as well, is this a good idea?<br>
 Replace the `ANSWER`  with as many lines of text as you need for your answer.

 ---

ANSWER

In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNING THIS TEST
check('tests/q10e_open_ended.py')

### <font color=blue> **Question 11.** </font>

At the end of each lab, please include a reflection. 
* How did this lab go?
* What did you learn?
* What aspects of visualization or functions do you find confusing?
* Were there questions you found especially challenging you would like your instructor to review in class? 
* How long did the lab take you to complete?

Share your feedback so we can continue to improve this class!

**Provide your feedback in the cell below. Replace the ... lines with as many lines of test as you need.**

---

YOUR FEEDBACK...


In [None]:
# BE SURE TO SAVE YOUR NOTEBOOK BEFORE RUNNING THIS TEST
check("tests/q11_open_ended.py")

### <font color=blue> Success! 

Congratulations, you're done with lab 4!  Be sure to 
- **run all the tests and verify that they all pass (the next cell needs to be executed),**
- **and that you have answered any questions requiring a written response.**
- **Save and Export as, HTML** from the `File` menu,
- **Right click on file name to download noteook** 
- **Upload and Submit your files to Canvas** under the corresponding assignment .

In [None]:
import glob
from gofer.ok import check
correct = 0
questions = [
    "1n",
    "2n",
    "3n",
    "3a",
    "5a",
    "6new",
    "9a_open_ended",
    "9b",
    "10_open_ended",
    "10a",
    "10b_open_ended",
    "10c_open_ended",
    "10d_open_ended",
    "10e_open_ended",
    "11_open_ended",
]
for x in questions:
    print("Testing question {}: ".format(x))
    display(check("tests/q{}.py".format(x)))
    score = check('tests/q{}.py'.format(str(x)))
    if score.grade == 1.0:
        correct += 1

In [None]:
perc_correct = correct/len(questions)*100
if perc_correct < 80:
    msg = 'look over your work again, seek help, some errors!!!'
else:
    msg = 'nice work!'
print(f"----\n{name} {msg}\n----\nusername: {user}")
import time;
localtime = time.asctime( time.localtime(time.time()) )
print("Submitted @ ", localtime)
print(f'Score: {correct/len(questions)*100:.1f}%')