In [28]:
from datetime import datetime
from datascience import *
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

### Review of Functions
If you recall last week we had a set up similar to this when trying to build the average function. To begin we will go over talking about the difference between **global** and **local** environment variables to clear up confusion.

In [29]:
x = make_array(1,2,3)
array = make_array(4,5,6)

**Build a function that takes in an array and returns the average of that array**

In [30]:
def average(array):
    return sum(array) / len(array)

* The **argument** ``array`` that is defined **within** the function ``average`` is considered a **local** variable


* The **variable** ``array`` that is defined in the second block of code is considered a **global** variable


* When a function takes in an argument and goes through its code i.e. ``sum(array) / len(array)`` python first looks for any **local** variables with the name `array`
    * Since there is a local variable called `array` from the argument of the function, python uses the argument `array` to process the function, **not** the variable ``array`` defined in the second block of code

Based on the two chunks of code above, what would be wrong with using the function?

```python
def average(array):
    return sum(x) / len(x)
    
```

Is the function below the same as the one we created above?

```python
def average(x):
    return sum(x) / len(x)
    
```

Does replacing the argument `array` actually change anything?

In [31]:
5

5

### Practicing Data Types

| Data Types      | Examples |
| ----------- | ----------- |
| int      | 5, 10, any whole number|
| Table   | Table()        |
|array| make_array(1,2,3), np.array(2,3,4), ...|

1) Imagine we have a table called `tbl` with a bunch of colums and rows and we want to apply `.where` to that table. What **data type** will a call of `tbl.where(...)` return?

2) What **data type** does `tbl.column(...)` return?

### Table Warm Up!

Reconstruct the table below!

| Instructor      | Office Hours |Office Hour Day |Study Group | Study Group Day|
| ----------- | ----------- |-------| --------- |-----|
| Isaac      | 5:00 pm       |Tuesday|6:30 pm|Thursday|
| Alex   | 11:00 am        |Wednesday|11:00 am|Monday|

In [32]:
Table().with_columns("Instructor", ["Isaac", "Alex"], 
                    "Office Hours", ["5:00 pm", "11:00 am"], 
                    "Office Hour Day", ["Tuesday", "Wednesday"], 
                   "Study Group", ["6:30 pm", "11:00 am"], 
                   "Study Group Day", ["Thursday", "Monday"])

Instructor,Office Hours,Office Hour Day,Study Group,Study Group Day
Isaac,5:00 pm,Tuesday,6:30 pm,Thursday
Alex,11:00 am,Wednesday,11:00 am,Monday


### Load Up Data

We begin by loading data on COVID-19 during the early times of the pandemic back in early 2020

In [33]:
def datetime_parser(string):
    datetime_object = datetime.strptime(string, '%m/%d/%y')
    return datetime_object

In [34]:
covid_us = Table().read_table("../covid_data/covid_us.txt")

In [35]:
covid_us = covid_us.with_column("New Date", covid_us.apply(datetime_parser, "Date"))

Let's see what the columns are for this table. **What is the command for accessing the columns and what data type does it return?**

In [36]:
covid_us.labels

('Unnamed: 0',
 'UID',
 'iso2',
 'iso3',
 'code3',
 'FIPS',
 'Admin2',
 'Province_State',
 'Country_Region',
 'Lat',
 'Long_',
 'Combined_Key',
 'Date',
 'Confirmed',
 'Deaths',
 'New Date')

##### Discuss with someone near you what each of these columns mean. Feel free to play around with the data to figure it out

In [47]:
covid_us.sort("Province_State")

Unnamed: 0,Admin2,Province_State,Date,Confirmed,Deaths,New Date
82,Autauga,Alabama,1/22/20,0,0,2020-01-22 00:00:00
83,Baldwin,Alabama,1/22/20,0,0,2020-01-22 00:00:00
84,Barbour,Alabama,1/22/20,0,0,2020-01-22 00:00:00
85,Bibb,Alabama,1/22/20,0,0,2020-01-22 00:00:00
86,Blount,Alabama,1/22/20,0,0,2020-01-22 00:00:00
87,Bullock,Alabama,1/22/20,0,0,2020-01-22 00:00:00
88,Butler,Alabama,1/22/20,0,0,2020-01-22 00:00:00
89,Calhoun,Alabama,1/22/20,0,0,2020-01-22 00:00:00
90,Chambers,Alabama,1/22/20,0,0,2020-01-22 00:00:00
91,Cherokee,Alabama,1/22/20,0,0,2020-01-22 00:00:00


### Looking at Alameda County

After taking a look at the columns lets cut them down to just the columns we need:

**Admin2, Province_State, New Date, Confirmed, Deaths**

In [15]:
covid_us = covid_us.drop('Country_Region','UID','iso2','iso3','code3','FIPS', 'Lat', 'Long_', 'Combined_Key', 'Date')
# or .select

In [11]:
# help(covid_us.where)

### TEST QUESTIONS

1) **Let's practice .where! Output a filtered version of covid_us with rows only in Alameda County, California**

In [16]:
covid_california = covid_us.where('Province_State', are.equal_to('California'))
covid_alameda_county = covid_us.where('Admin2', are.equal_to('Alameda'))
covid_alameda_county.show(5)

Unnamed: 0,Admin2,Province_State,Confirmed,Deaths,New Date
268,Alameda,California,0,0,2020-01-22 00:00:00
3608,Alameda,California,0,0,2020-01-23 00:00:00
6948,Alameda,California,0,0,2020-01-24 00:00:00
10288,Alameda,California,0,0,2020-01-25 00:00:00
13628,Alameda,California,0,0,2020-01-26 00:00:00


2) **Count the total number of deaths from COVID in Alameda County during the time period of our dataset.**

In [17]:
sum(covid_alameda_county.column("Deaths"))

88.0

3) **What is the last date that our Alameda County dataset had information on? You should return a `datetime` data type, i.e. `2023-01-31 6:30:00`**

In [23]:
sorted_time_alameda = covid_alameda_county.sort("New Date", descending=True)
sorted_time_alameda.column("New Date").item(0)

datetime.datetime(2020, 4, 5, 0, 0)

### Challenge

**Find the county in California with the highest death totals over the entire time period we have access to.**