# What are we doing?
Today we're going over how to 1. query data from an API endpoint 2. how to load the json payload into a pandas dataframe 3. inspect the data, learn the data, clean the data, love the data 4. visualize the data.

### Things to note:
I've included mini "Roadblocks" where I personally had to stop and scratch my head. The goal of this is not only to introduce you to a very rudimentary example/application of Data Science and the Extraction, Transformation, and Loading (ETL) process, but how to break away from the vices of googling everything (à la documentation).

In [None]:
# Commented out rows showcase how to install modules and packages within jupyter

# !{sys.executable} -m pip install 'geocoder'
# !{sys.executable} -m pip install 'geonamescache'
# !{sys.executable} -m pip install 'matplotlib'

In [None]:
# Applicable documentation used

# https://geocoder.readthedocs.io/providers/ArcGIS.html
# https://openweathermap.org/api/hourly-forecast
# https://pandas.pydata.org/docs/
# https://requests.readthedocs.io/en/latest
# https://matplotlib.org/stable/api/index

# Pull current weather from Openweather API for one "city"

In [None]:
# Importing modules

import sys
import requests
import pandas as pd
import numpy as np
import json
import geocoder
import matplotlib.pyplot as plt

In [None]:
API_KEY = 'xxxxxxxxxxxxxxxxxxxxxxxxxxx'
LOCATION = 'Hoboken, NJ'
UNITS = 'imperial'
# https://pro.openweathermap.org/data/2.5/forecast/hourly?lat={loclat}&lon={loclon}&appid={API_KEY}

In [None]:
# Using geocoder to get lat/lng for use with openweather API
g = geocoder.arcgis(LOCATION)
loclat = g.lat
loclng = g.lng

In [None]:
# Creating the request url format using f-string formatting
request_url = f'https://api.openweathermap.org/data/2.5/weather?lat={loclat}&lon={loclng}&units={UNITS}&appid={API_KEY}'

In [None]:
# Making a request to the url which will return a JSON payload
r = requests.get(request_url)

In [None]:
# Response status
# See more at https://developer.mozilla.org/en-US/docs/Web/HTTP/Status
r

In [None]:
# Returned JSON payload
r.json()

In [None]:
hobokenJson = r.json()
hobokenJson.keys()

In [None]:
hobokenJson['clouds']

# Now create a dataframe with multiple cities

In [None]:
# pulled from: https://github.com/agalea91/city_to_state_dictionary/blob/master/city_to_state.py

city_to_state_dict = {"East Rancho Dominguez": "California",
                      "Clinton": "Mississippi",
                      "Nanuet": "New York",
                      "Sand Springs": "Oklahoma",
                      "Middle River": "Maryland",
                      "Carbondale": "Illinois",
                      "Boise": "Idaho",
                      "Las Vegas": "Nevada",
                      "Denver": "Colorado",
                      "Hagerstown": "Maryland",
                      "Venice": "Florida",
                      "Moreno Valley": "California",
                      "Mamaroneck": "New York",
                      "Bartow": "Florida",
                      "Bensonhurst": "New York",
                      "Edgewater": "Florida",
                      "Dallas": "Texas",
                      "Benton": "Arkansas",
                      "Lake Havasu City": "Arizona",
                      "New South Memphis": "Tennessee",
                      "North Glendale": "California",
                      "Santee": "California",
                      "Shawnee": "Oklahoma",
                      "North Augusta": "South Carolina",
                      "Brownwood": "Texas",
                      "Methuen": "Massachusetts",
                        }

#### Cleaning it up and just making it into a list

In [None]:
# Take dict and put into list 
cityStateList = []

for k, v in city_to_state_dict.items():
    cityStateList.append(f'{k}, {v}')
    

In [None]:
cityStateList

In [None]:
bulkWeatherJson = []

def cityTempPull():
    """Function to parse through cityStateList and query the openweather API"""
    for city in cityStateList:
        location = city
        gloc = geocoder.arcgis(location)
        gLat = gloc.lat
        gLng = gloc.lng
        req_url = f'https://api.openweathermap.org/data/2.5/weather?lat={gLat}&lon={gLng}&units={UNITS}&appid={API_KEY}'
        tempR = requests.get(req_url)
        tempCityJson = tempR.json()
        bulkWeatherJson.append(tempCityJson)

        

# First attempt to parse through the dict, using version from above.
# def cityTempPull():
#     """Function to parse through the original city_to_state_dict, don't use"""
#     for k, v in city_to_state_dict.items():
#         location = f'{v}, {k}'
#         gloc = geocoder.arcgis(location)
#         gLat = gloc.lat
#         gLng = gloc.lng
#         req_url = f'https://api.openweathermap.org/data/2.5/weather?lat={gLat}&lon={gLng}&units={UNITS}&appid={API_KEY}'
#         tempR = requests.get(req_url)
#         tempCityJson = tempR.json()
#         bulkWeatherJson.append(tempCityJson)
        
#Call function
cityTempPull()

#### Roadblock
The following step I completely forgot how to make a df out of JSON. My first few attempts I literally just bruteforced it into a dataframe and it was a mess of embedded JSON. I knew there had to be a better way so I went to the pandas documentation and search for JSON, which brought me to the io (input/output) docs and I found my way to the pandas.json_normalize page https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html#pandas.json_normalize

Note: https://openweathermap.org/current#parameter

Forgot to mention looking at the parameters within the API documentation

In [None]:
cityWeatherDF = pd.json_normalize(bulkWeatherJson)

In [None]:
cityWeatherDF.columns

In [None]:
cityWeatherDF.shape

In [None]:
cityWeatherDF.dtypes

In [None]:
cityWeatherDF.head()

In [None]:
cityWeatherDF.tail()

In [None]:
# Embedded dict in weather column
cityWeatherDF['weather'].iloc[0]

# What else could be done to improve the useability?
Looking at the columns and data, there are a handful of things we could do. First being changing the column names to something that everyone can understand (Look at the parameters in the documentation), dropping columns that provide little to no value, and standardizing the data types of each columns.

#### Cleaning up the name column

In [None]:
# Create a series from the cityStateList from earlier
temp_series = pd.Series(cityStateList, name = 'city-state')

# Add the cityStateList as 'city-state'
cityWeatherDF = pd.concat([cityWeatherDF, temp_series], axis=1)

# Create a list of columns and restructure them, moving 'city-state' next to 'name'
cols = cityWeatherDF.columns.tolist()

cols = ['weather',
 'base',
 'visibility',
 'dt',
 'timezone',
 'id',
 'city-state',
 'name',
 'cod',
 'coord.lon',
 'coord.lat',
 'main.temp',
 'main.feels_like',
 'main.temp_min',
 'main.temp_max',
 'main.pressure',
 'main.humidity',
 'wind.speed',
 'wind.deg',
 'wind.gust',
 'clouds.all',
 'sys.type',
 'sys.id',
 'sys.country',
 'sys.sunrise',
 'sys.sunset',
 'main.sea_level',
 'main.grnd_level'
 ]

# Apply the new format to cityWeatherDF
cityWeatherDF = cityWeatherDF[cols]

# Get rid of the name column
cityWeatherDF = cityWeatherDF.drop(columns = ['name'])
cityWeatherDF

#### Roadblock
Nothing major, but I completely blanked on how to access dictionary items within a list. Maintaining the belief that "Google bad" and is counter intuitive to becoming a better programmer, I went to the python docs page (https://docs.python.org/3/contents.html), looked up data structures, and went to dictionaries where I came across my answer.

In [None]:
cityWeatherDF['weather'].iloc[0][0]['main']

#### Preparing the data for visualization
Nothing crazy here, just sorting the values for the max temp in descending order to give our visualization a more palatable/consumable format.

In [None]:
# Sorting the values of the max temp in descending order
cityWeatherDF = cityWeatherDF.sort_values('main.temp_max', ascending=False)

In [None]:
cityWeatherDF[['city-state', 'main.temp_max']]

In [None]:
# Plotting and visualizing the output
cityWeatherDF.plot(x='city-state', y='main.temp_max', kind='bar')