# HTTP Practice: 

## Web APIs and project advice

Most of this practice is a walk-through. You only need to write code at the end.

## The Golden Snowball?

There is a little known competition among Upstate NY Cities called the "Golden Snowball" it's awarded to the city with the highest recorded snowfall for the winter.  You can learn about this competition here: 

- (Live Site) https://goldensnowball.com/about-the-snow-contest/
- (Web Archive) https://web.archive.org/web/20231223063219/https://goldensnowball.com/about-the-snow-contest/

Data Set is here:

- https://goldensnowball.com/ 

This program will rank the current Golden Snowball winners and place them on a map. First place is the lighest color blue, all the way down to darker blues and grey which is last place.

Here's a screenshot of the map:

![https://i.imgur.com/MgR5N78.png](https://i.imgur.com/MgR5N78.png)


## 5 Lessons you will learn in this exercise

Throughout this lesson you will learn the following techniques.

1. What to do when `pd.read_html()` fails you!
2. Dataframe cleanup: How to sort, replace headers, slice, and reindex as dataframe
3. Creating your own data
4. How to merge two dataframes on their index.
5. Using `df.apply()` to output a DataFrame instead of a Series



### Modules

You will need to install some modules for this assignment  `html5lib` for reading local HTML, `openpyxl` for reading in Excel files.

In [1]:
!pip install html5lib openpyxl

Collecting html5lib
  Downloading html5lib-1.1-py2.py3-none-any.whl.metadata (16 kB)
Collecting openpyxl
  Downloading openpyxl-3.1.5-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting et-xmlfile (from openpyxl)
  Downloading et_xmlfile-1.1.0-py3-none-any.whl.metadata (1.8 kB)
Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.2/112.2 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading openpyxl-3.1.5-py2.py3-none-any.whl (250 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.9/250.9 kB[0m [31m20.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: html5lib, et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 html5lib-1.1 openpyxl-3.1.5


### CENT Iot API

Login to the CENT IoT portal and get your API Key:  [https://cent.ischool-iot.net](https://cent.ischool-iot.net)

Paste your API Key in the variable `APIKEY` below.

In [None]:
import pandas as pd
import folium
import requests
from IPython.display import display, HTML
APIKEY = "todoyourapihere"

### Lesson 1: What to do when pd.read_html() fails you!

Run the following code. It should generate an HTTP 406 error. There are a multitude of reasons why you cannot read a "live" webpage with `read_html()` Some of the more common ones:

- 406 "Not Acceptable" error. The request in its format is not acceptable. Here we need to add additional header. (in this case we would need to add additional accept headers)
- 429 "Too Many Requests" you are accessing the site or API too often. You should be mindful of the frequency in which you access a site.


In [None]:
# Error Help Me!
tables = pd.read_html("https://goldensnowball.com/", storage_options={"User-Agent": "Mozilla/5.0"} )

#### The solution is Simple!

Getting around these errors is simple. **Download the page from your browser and upload it to jupyterhub!**

1) Open the page in Google Chrome (Instructions in other browesers are similar)
2) Right click on the loaded page and select **Save As..."** from the menu.
3) Save as type **Webpage, Single File** <== this is important!
4) Give the file a simple, meaningful name like `baseballscores.html` or in this case I called it `goldensnowball.html`
5) Save the file.
6) Upload the saved file to jupyterhub. You can also drag and drop it into the Jupyterhub window.

**Look at you! Now you no longer need the website, because you have a local copy of it.** This is not only efficient, but its also **a respectful practice** since you are not making continued, frequent, unnecessary requests to someone's website while you are figuring things out! 



#### Let's read in the local copy, and see what the table looks like

In [None]:
tables = pd.read_html("goldensnowball.html", storage_options={"User-Agent": "Mozilla/5.0"} )
snow = tables[0]
snow

### Lesson 2: Cleaning up a dataframe

This DataFrame is a **mess**!!! 

- No header
- NaN
- Need to sorty by snowfall, highest first
- Need to reindex.


In [None]:
# Set a header 
snow.columns = [ "city", "total_to_date", "avg_to_date", "total_last_season", "normal_season_avg", "all_time_record"]  # new columns
snow

In [None]:
# Drop the `NaN` values.
snow = snow.dropna()
snow

In [None]:
# First row needs to go! DataFrame slice
snow = snow[1:]
snow

In [None]:
# Yes this data is sorted properly, but will it ALWAYS be? EVERY time we load it?
snow = snow.sort_values("total_to_date", ascending=False)
snow

In [None]:
# Reset the index back to zero-based
snow = snow.reset_index(drop=True)
snow

#### Tip: Save your cleaned dataframe.

When you're done cleaning, its ALWAYS a good idea to write out the dataframe. That way you can start from the cleaned data.

In [None]:
snow.to_csv("golden_snowball_cleaned.csv", header=True, index=False)

### Code up to this point

Here's the data cleanup sequence:


In [1]:
# TODO complile this lesson into a series of steps


### Lesson 3: Creating Your own data.

Sometimes, you just don't have the data you need to complete the task. So you create it. **This is a completely acceptable practice.**  

In fact, we do this all the time in code:

For example this "code" represents each place 1st through 5th and which color represents that place. For example `1st` is assigned the color `"lightblue"`

In [None]:
colors = [
    { "place": "1st", "color": 'lightblue'},
    { "place": "2nd", "color": 'blue'},
    { "place": "3rd", "color": 'darkblue'},
    { "place": "4th", "color": 'cadetblue'},
    { "place": "5th", "color": 'gray'}
]

cdf = pd.DataFrame(colors)
cdf   ## THIS IS NOT CODE ITS DATA!!!

#### Advice: Don't confuse code and data

The less code you write, the less bugs you introduce. Don't write code to represent your data! Build datasets and load them into dataframes... LESS CODE IS BETTER.

I made a table of data in Microsoft Excel, saved in and then uploaded to Jupyterhub. Now I load it in with `pandas`.  Easy peasy!


In [None]:
# Better:
colors = pd.read_excel("colors.xlsx")
colors

### Lesson 4: `pd.merge()` using the dataframe index.

If you look at our two data frames:

In [None]:
display(snow, colors)

Observe they **share a common index.**

Meaning:

- in `snow` index 1, the **city** `Syracuse` 
- should match up with `colors` index 1 **place** `2nd`

Syracuse came in 2nd place in the golden snowball.

We use `left_index=True, right_index=True` to join to dataframes together on their matching indexes.

In [2]:
# Merge the dataframes on their index.
snow_color = pd.merge(left=snow, right=colors, left_index=True, right_index=True)
snow_color

NameError: name 'pd' is not defined

### Geocoding

To put the city on a map, we must first geocode it to a latitude and longitude pair.

We can use the CENT IoT api to do this.

In [None]:
def geocode(location, apikey):
    querystring = {"location": location}
    url = "https://cent.ischool-iot.net/api/google/geocode"
    headers = {'X-API-KEY': apikey}
    response = requests.get(url, params=querystring, headers=headers)
    response.raise_for_status()
    data = response.json()
    latlon = data['results'][0]['geometry']['location']
    return [latlon['lat'], latlon['lng']]

geocode("Buffalo, NY", APIKEY)

### Lesson 5: Using `pd.apply()` to return a DataFrame

Here we are returning the lat,lng as a Python `list`. List-like values are required when returing a **DataFrame** from `pd.apply()`.

Normally `pd.apply()` returns a **Series** - one column, but here we have two values so we need `pd.apply()` to return a **DataFrame**.

Adding the named argument `result_type=expand` creates a new dataframe `coords` in this case. 


In [None]:
# Applying a geocode to each city in the dataframe, making a 

coords = snow_color.apply(lambda row: geocode(f"{row['city']}, NY", APIKEY), axis=1, result_type="expand")
coords.columns = ["lat","lng"]
coords

#### Once again we merge on index.

The new `coords` dataframe has the same `index` as the  as the `snow_color` dataframe

In [None]:
display(snow_color)

In [None]:
snow_color_coords = pd.merge(left=snow_color, right=coords, left_index=True, right_index=True)
snow_color_coords

### And we are done! We have evetything we need to make our map!

Remember when making a map finding the `center` and `zoom_start` are going to be some trial and error. There's no easy or automatic way to handle centering your map at the appropriate zoom level.

In [None]:
center = (43.03996, -76.13364)
m = folium.Map(location=center, zoom_start=9)
for index, row in snow_color_coords.iterrows():
    hover = f"{row['city']} {row['place']} {row['total_to_date']}"
    folium.Marker(location=(row['lat'],row['lng']), tooltip=hover, icon=folium.Icon(color=row['color'], icon="cloud")).add_to(m)
    
display(m)

## Putting it all Together

Write a program to read `goldensnowball.html` and create the map above. Here's a summary of the algorithm of the final process. This is an example of what you are expected to create in your project where you need an algorithm that outlines the steps at a high level.

    INPUT: Golden Snowball Data
    USER INPUT: None
    OUTPUT: Map of golden snowball winners

    1) read and prepare in `goldensnowball.html` as snow
        1.1) extract table from HTML
        1.2) Remove Empty / NA Data
        1.3) rename columns
        1.4) Sort "total_to_date" desending so most snowfall is first.
        1.5) Reset the dataframe index
    2) Combine Colors / rankings into snow
        2.1) read in colors.csv as colors
        2.2) merge colors and snow on matching index as snow_colors
    3) Geocoding 
        3.1) apply the geocode function to snow dataframe of lat/lon to create new dataframe coords
        3.2) merge snow_colors and coords on matching index as snow_colors_coords
    4) Map
        4.1) Create map of NYS
        4.2) for each row in snow_color_coords
        4.3)     make a pin for the city displaying their rank (1st, 2nd), and color code the pin.
        4.4) show the map
        

### Practice getting it running from a single cell.

This helps us figure out your coding and thought process.

However, there should **be one single code cell**, clearly identified that will run your entire data story.

The expectation is if you restart the kernel and run just the submission cell, the program will execute.


import folium
import requests
from IPython.display import display, HTML
APIKEY = "todoyourapihere"



In [3]:
# TODO: Write final code here

# imports

# geocode function


# main program

