COMP 215 - LAB 1
----------------
#### Name: Paniz
#### Date:

This lab exercise is mostly to introduce some of the power in Jupyter Notebooks.
Note that a Notebook is composed of "cells" - some are "text", like this one, while others are "code"

We'll also review some basic data types (like `int` and `str`) and data structures (like `list` and `dict`)

**New Python Concepts**:
  * `datetime.date` objects represent a calendar date (these are very powerful)
  * *list comprehension* provides a compact way to represent map and filter algorithms

As will be usual, the fist code cell, below, simply imports all the modules we'll be using...

**Hi Joseph, this assignment was done by me long ago but I forgot to save it and submitted an empty file by accident. I re did the assignment but unfortunately it doesn't run as it used to anymore due to an API system error but the codes ran perfectly this error occured. **

In [1]:
import datetime, json, requests
import matplotlib.pyplot as plt
import  matplotlib.dates as mdates
from pprint import pprint    # Pretty Print - built-in python function to nicely format data structures

### API Query

Now, let's fetch some Covid-19 daily case-count data from the Open Covid API:  https://opencovid.ca/api/

Query:
  - `stat=cases`        # the type of data to fetch
  - `loc=BC`            # the location to fetch data for
  - `after=2023-11-01`  # since the 1st of november (note date format:  yyyy-mm-dd)

In [2]:
query = 'https://api.opencovid.ca/timeseries?stat=cases&loc=BC&after=2023-11-01'

response = requests.request("GET", query, headers={}, data={})
print('Response data type:', type(response.text))
response.text[:1000]

ConnectionError: HTTPSConnectionPool(host='api.opencovid.ca', port=443): Max retries exceeded with url: /timeseries?stat=cases&loc=BC&after=2023-11-01 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7983c0156860>: Failed to resolve 'api.opencovid.ca' ([Errno -2] Name or service not known)"))

Notice that the response looks like a dictionary, but is actually just a string of text (most data is exchanged on the web as plain text!).  This particular data format is called "[JSON](https://en.wikipedia.org/wiki/JSON)"

The `json.loads` function "parses" such text and loads the data into a dictionary...

In [None]:
result = json.loads(response.text)
pprint(result)   # pretty-print the entire data structure we got back...

### Extract data items from a list of dictionaries
Next we use "list comprehension" to extract the list of dates and associated cases into "parallel lists"

Notice how we "parse" the date strings, using `strptime`, into a real date objects so they are easier to work with (format: yyyy-mm-dd)

In [None]:
cases = result['data']['cases']
case_dates = [daily['date'] for daily in cases]     # List Comprehension #1: extract the case date strings
n_cases = [daily['value_daily'] for daily in cases] # List Comprehension #2:  extract the case counts

print('Dates:', case_dates[:10])      # do you recall the "slice" operation?  If not, look it up in the ThinkCsPy textbook!
print('Cases:', n_cases[:10])
print('Zipped:', list(zip(case_dates[:10], n_cases[:10])))  # zip is a very handy function to "zip" 2 lists together like a zipper...

### Datetime.date
Working with date stings is a pain.  So many formats!  Even within Canada, you might see:
"Jan. 9, 2023" or "09-01-2023" or "2023-01-09" or ....
Imagine trying to do a calculation like "how many days between these 2 dates"!!
The build-in `datetime` package makes working with dates much easier.
  * step 1: "parse" the date string data (`strptime` ==  "string-parse-datetime object")
  * step 2: get the date part (i.e., without the time)

In [3]:
# parse a datetime object from a string by supplying the correct "format" string.
datetime_objects = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in case_dates]  # List Comprehension #3

# but we only need the "date" part...
dates = [dt.date() for dt in datetime_objects]
dates[:10]

NameError: name 'case_dates' is not defined

## Exercise 1

In the code cell below, re-write each of the 3 "List Comprehensions" in the code cells above as a loop so you understand how they work.

Notice that a "list comprehension" is a compact way to write a "list accumulator" algorithm (and more efficient too!)

In [None]:
# Ex. 1 your code here
date_list = []
for i in cases:
  date_list.append(i['date'])

case_list = []
for c in cases:
  case_list.append(c['value_daily'])

date_time = []
for a in date_list:
  a = datetime.datetime.strptime(a, '%Y-%m-%d')
  date_time.append(a.date())
print(date_time)

### Generating a plot

Finally, we'll plot the (dates,cases) data as a nice x-y line graph.

The code to format the x-axis labels is taken from https://matplotlib.org/stable/gallery/ticks/date_concise_formatter.html

In [None]:
def format_date_axis(ax):
  """ format the dates shown on the x-axis of given axes, ax  """
  locator = mdates.AutoDateLocator(minticks=10, maxticks=20)
  formatter = mdates.ConciseDateFormatter(locator)
  ax.xaxis.set_major_locator(locator)
  ax.xaxis.set_major_formatter(formatter)

fig, ax = plt.subplots()
format_date_axis(ax)
ax.plot(dates, n_cases, label='Daily Cases')  # Plot some data on the axes.
ax.set(
  title="Covid-19 case counts for BC",  # Add a title to the plot.
  xlabel='Date',                        # Add a label to X axes.
  ylabel='confirmed cases'             # Add a label to Y axes.
)
ax.legend();

## Exercise 2

Repeat the analysis above, but this time only for Vancouver Coastal Health Region.

* Make a copy of just the relevant parts of the code above, leaving out all the explanations and extra data dumps.
* You can get the ***hruid*** location code for each health region here:  https://github.com/ccodwg/CovidTimelineCanada/blob/main/geo/hr.csv
* Generalize this code a little to make it easier to repeat the analysis for different locations.  
  If you get that working, also make it easy to run the analysis for different dates?


In [None]:
# Ex. 2 your code here


url = "https://api.opencovid.ca/timeseries?stat=all&geo=hr&loc=593&after=2023-11-01"
response1 = requests.request("GET", url, headers={}, data={})
result1 = json.loads(response1.text)


caseb20 = result1['data']['cases']
caseb_dates1 = [daily['date'] for daily in caseb20]
num_of_cases1 = [daily['value_daily'] for daily in caseb20]

datetime_case = [datetime.datetime.strptime(date, '%Y-%m-%d') for date in caseb_dates1]
dates_case = [dt.date() for dt in datetime_case]

def format_date_axis(ax):
  """ format the dates shown on the x-axis of given axes, ax  """
  locator = mdates.AutoDateLocator(minticks=10, maxticks=20)
  formatter = mdates.ConciseDateFormatter(locator)
  ax.xaxis.set_major_locator(locator)
  ax.xaxis.set_major_formatter(formatter)

fig, ax = plt.subplots()
format_date_axis(ax)
ax.plot(caseb_dates1, num_of_cases1, label='Daily Cases')
ax.set(
  title="Covid-19 case counts for Vancouver Coast",
  xlabel='Date',
  ylabel='confirmed cases'
)
ax.legend()


## Challenge Exercise - Take your skills to the next level...

## Exercise 3

Notice that the data plot looks quite erratic.  These swings most likely represent artifacts attributable to the reporting process rather than actual changes in infection rates.

 * One way to fix this is is to "smooth" the date with a "7-day rolling average".
Each day, we take the average of the previous 7 days cases.
 * Add new code cell below, compute the 7-day rolling average for each day from the cases list.
 * Create a plot to display the rolling average data and compare your plot with the one produced above.

 Hints: you are free to do this however you like, but a quite elegant solution uses list comprehension, range, and slices

In [None]:
# Ex. 3 (challenge) your code here