# Gateway Exploration

In this final segment you can take what you have learned and try it yourself. This segment is displayed in "Notebook Mode" rather than "Presentation Mode." So you will need to scroll down as you explore more content. Notebook mode will allow you to see more content at once. It also allows you to compare and contrast cells and visualizations. 

Here you are free to explore as much as you want. There are lots of suggestions in the text and in comments in the code cells. Feel free to change attributes, code pieces, etc. If a code cell breaks (e.g., you see an error), then use a search engine to look up the error to see if you can try to solve it yourself. Another way to fix problems is to compare your code to the original code, which you can see here:

https://github.com/hourofci/lessons-dev/blob/master/gateway-lesson/gateway/gateway-exploration.ipynb

Enjoy two explorations to apply what you learned at a deeper level
1. Data Wrangling - View, Clean, Extract, and Merge Data
2. Data Visualization - Making Maps

So start scrolling down. Explore and try it yourself!

In [2]:
# This code cell starts the necessary setup for Hour of CI lesson notebooks.
# First, it enables users to hide and unhide code by producing a 'Toggle raw code' button below.
# Second, it imports the hourofci package, which is necessary for lessons and interactive Jupyter Widgets.
# Third, it helps hide/control other aspects of Jupyter Notebooks to improve the user experience
# This is an initialization cell
# It is not displayed because the Slide Type is 'Skip'

from IPython.display import HTML, IFrame, Javascript, display
from ipywidgets import interactive
import ipywidgets as widgets
from ipywidgets import Layout

import getpass # This library allows us to get the username (User agent string)

# import package for hourofci project
import sys
sys.path.append('../../supplementary') # relative path (may change depending on the location of the lesson notebook)
import hourofci

# Retreive the user agent string, it will be passed to the hourofci submit button
agent_js = """
IPython.notebook.kernel.execute("user_agent = " + "'" + navigator.userAgent + "'");
"""
Javascript(agent_js)

# load javascript to initialize/hide cells, get user agent string, and hide output indicator
# hide code by introducing a toggle button "Toggle raw code"
HTML(''' 
    <script type="text/javascript" src=\"../../supplementary/js/custom.js\"></script>
    
    <input id="toggle_code" type="button" value="Toggle raw code">
''')

## Setup
As always, you have to import the specific Python packages you'll need. You'll learn more about these in the other lessons, so for now let's import all of the packages that we will use for the Gateway Exploration component. If you want to dig deeper, feel free to search each package to understand what it does and what it can do for you.

As before, run this code by clicking the Run button left of the code cell. 

Wait for the code to run. This is shown by the asterisk inside the brackets of <pre>In [ ]:</pre>. When it changes to a number and the print output shows up, you're good to go. 

In [None]:
# Run this code by clicking the Run button on the left to import all of the packages

from matplotlib import pyplot
import pandas
import geopandas

import os
import pprint
import IPython
from shapely.geometry import Polygon
import numpy as np
from datetime import datetime

print("Modules imported")

## Download COVID-19 Data
This optional code cell will download the US county level data released by the New York Times that we demonstrated earlier. It's found here: https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv.

The code below gets the data from the URL and puts it into a local file called "us-counties.csv"

Skip this step if you already downloaded this data in an earlier segment. You can always come back and re-run it if you need to.

In [None]:
# Run this code cell if you have not yet downloaded the Covid-19 data from the New York Times
!wget https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv -O us-counties.csv

## Exploration 1: View, Clean, Extract, and Merge Data

### View the data
Once you have downloaded the data file, you should look at it to make sure it is what you want.

To do that, we'll convert the downloaded file into a format that our Python program can use. Here we're going to use the dataframe format provided by the Pandas package. 

Recall that dataframes can be though of as two dimensional arrays or spreadsheets.

In [None]:
#Read the data that we downloaded from the NYT into a dataframe
covid_counties = pandas.read_csv('./us-counties.csv')

# And let's see what it looks like!
print(covid_counties)

### Clean the Data

In large data like this, there are often a few cells scattered around that may cause you problems. Cleaning data is an important and often complex step, it is one part of **data wrangling.** For now, let's just look for the most common problem - empty cells where a value is expected. These are known as null cells and if a number is expected it will show up as NaN (not a number) in your dataframe.

Let's see if we can find if we have any of these in our data. 

Since we're going to use the "fips" column to group our data, we need to know that there no null cells in that column. (The "FIPS" code is a unique identifier for geographic places. Google it if you want to know more!)

In [None]:
#Are there NaN cells in the fips column?

covid_counties['fips'].isnull().values.any()

In [None]:
#How many null cells are in the fips column?

count_nan = covid_counties['fips'].isnull().sum()
print ('Count of rows with null fips codes: ' + str(count_nan))

Ah ha, we found lots of problems in our data! 

Let's see what these rows containing null cells look like. Here we'll make a temporary dataframe that contains the rows with null fips codes. 

In [None]:
covid_counties_clean = covid_counties[covid_counties['fips'].notnull()]

print(covid_counties_clean)

### Extract Data

Since we have a row for each day in the dataset, we will use the **groupby** function to group _daily cases_ by _county_. Since some county names are found in more than one state, we have to group by _county_ and _state_ (as well as the fips code, to be sure). We will add them all up using the **sum** function.


In [None]:
# In our earlier segment we only looked at cases. 
# What if we also wanted to look at deaths? 

# Here we replaced ['cases'] with ['cases', 'deaths'] below.
# This will group both cases and deaths by fips, county, and state values.

covid_grouped = covid_counties.groupby(['fips','county','state'])['cases', 'deaths']

# Second, add up all the Covid-19 cases using sum
covid_total = covid_grouped.sum()

#View the result, which should include the columns "fips, county, state, cases, deaths"
covid_total

Now we could apply some basic arithmetic for the rows using Pandas.

Let's get the number of deaths per case for each county. This is called the Case Fatality Rather (CFR). We multiply by 100.0 to get the percentage at the end.

Before you run the code, make sure you understand that we are dividing deaths by cases for each row.

In [None]:
covid_total['deathpercase']=covid_total['deaths']/covid_total['cases']*100.0

# Print out the new 'covid_total' dataframe with a new 'deathpercase' column
covid_total

Now that we have our data we can try some basic visualizations. Let's try making a scatter plot of cases on the x-axis and deaths on the y-axis.

In [None]:
covid_total.plot.scatter(x='cases', y='deaths')

Here are a few things you can try adding to the scatter function as parameters (remember to use commas to separate each of them).

```python
# Change the size of the dots
# s=covid_total['deathpercase']
# s=covid_total['deathpercase']*2
```

And, try a hex-bin plot.

In [None]:
covid_total.plot.hexbin(x='cases', y='deaths', gridsize=5)

### Merge data 
Now we'll load "supplementary/counties_geometry.geojson" into a geodataframe. You loaded this same file in an earlier segment on mapping Covid-19. We will (again) use **merge** to merge these two datasets into a **merged** geodataframe.

In [None]:
counties_geojson = geopandas.read_file("./supplementary/counties_geometry.geojson")

# Merge geography (counties_geojson) and covid cases and deaths (covid_total)
merged = pandas.merge(counties_geojson, covid_total, how='left',
                left_on=['NAME','state_name'], right_on = ['county','state'])

# Let's take a quick look at our new merged geodataframe
merged

## 2. More Mapping

Now that we have a merged dataset. We can try to create a few different maps. In this Exploration you can try to improve your first map.

Here is the code from your first map. Run this code and then scroll down.

In [None]:
merged.plot(figsize=(15, 15), column='cases', cmap='OrRd', scheme='fisher_jenks', legend="true", 
                       legend_kwds={'loc': 'lower left', 'title':'Number of Confirmed Cases'})
pyplot.title("Number of Confirmed Cases")

Below is that code chunk again. Now you can try changing the code to improve the look of your map. There are a lot of options to change. 

<u>If you break something, then just copy and paste the original code above to "reset".</u>

- *column* represents the column that is being mapped. Change what you are mapping by replacing 'cases' with 'deaths' or 'deathpercase'

- *cmap* represents the colormap. You can try any number of these by replacing 'OrRd' with: 'Purples' or 'Greens' or 'gist_gray'. There are lot of choices that you can see here: https://matplotlib.org/tutorials/colors/colormaps.html. If you want to learn more about color schemes check out: https://colorbrewer2.org

- *scheme* represents the scheme for creating classes. Try a few other options by replacing 'fisher_jenks' with: 'natural_breaks' or 'quantiles'

- *loc* represents the location of your legend. Move your legend by replacing 'lower left' with 'upper right' or 'upper left'

- *title* represents the text in the legend box. If you changed the column that you are mapping, make sure to change the title too.

Want to try more? Check out here for even more options
https://geopandas.org/mapping.html#choropleth-maps

In [None]:
merged.plot(figsize=(15, 15), column='cases', cmap='OrRd', scheme='fisher_jenks', legend="true", 
                       legend_kwds={'loc': 'lower left', 'title':'Number of Confirmed Cases'})
pyplot.title("Number of Confirmed Cases")

# Congratulations!


**You have finished an Hour of CI!**


But, before you go ... 

1. Please fill out a very brief questionnaire to provide feedback and help us improve the Hour of CI lessons. It is fast and your feedback is very important to let us know what you learned and how we can improve the lessons in the future.
2. If you would like a certificate, then please type your name below and click "Create Certificate" and you will be presented with a PDF certificate.

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="https://forms.gle/JUUBm76rLB8iYppN7">Take the questionnaire and provide feedback</a></font>


In [3]:
# This code cell has a tag "Hide" (Setting by going to Toolbar > View > Cell Toolbar > Tags)
# Code input is hidden when the notebook is loaded and can be hide/show using the toggle button "Toggle raw code" at the top

# This code cell loads the Interact Textbox that will ask users for their name
# Once they click "Create Certificate" then it will add their name to the certificate template
# And present them a PDF certificate
from PIL import Image
from PIL import ImageFont
from PIL import ImageDraw

from ipywidgets import interact

def make_cert(learner_name):
    cert_filename = 'hourofci_certificate.pdf'

    img = Image.open("../../supplementary/hci-certificate-template.jpg")
    draw = ImageDraw.Draw(img)

    cert_font = ImageFont.load_default()
    cert_font = ImageFont.truetype('times.ttf', 150) 
    
    w,h = cert_font.getsize(learner_name)    
    draw.text( xy = (1650-w/2,1100-h/2), text = learner_name, fill=(0,0,0),font=cert_font)
    
    img.save(cert_filename, "PDF", resolution=100.0)   
    return cert_filename


interact_cert=interact.options(manual=True, manual_name="Create Certificate")

@interact_cert(name="Your Name")
def f(name):
    print("Congratulations",name)
    filename = make_cert(name)
    print("Download your certificate by clicking the link below.")
    
    
    


interactive(children=(Text(value='Your Name', description='name'), Button(description='Create Certificate', st…

<font size="+1"><a style="background-color:blue;color:white;padding:12px;margin:10px;font-weight:bold;" href="hourofci_certificate.pdf">Download your certificate</a></font>