In [1]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf #needed for models in this script
import pylab as pl
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

In [2]:
pd.set_option('html', True) #see the dataframe in a more user friendly manner
%matplotlib inline

## Overview of Web APIs

An application programmer interface (API) is an interface that allows two computer systems to easily exchange information. A client requests information from a server, the server processes the request, and sends a response to the client. The request can be to access data matching a particular criteria, in which case the response will contain the data requested. The request can also be to add data to the server's system. In this case, the response is a simple indication whether the data was successfully added.

You can think of this like a fast food window. You pull up and make a request at the window. You don't need to know what happens in the kitchen (and probably don't want to know). What you requested is delivered to you, and you go about your business. The same is true for an API.Most APIs use a standard URL string to carry the information necessary for the request. This usually involves:

* a service endpoint
* query parameters
* authentication tokens

Each API uses a different service endpoint. This will usually be the base URL for the request. Some APIs may only offer a single service, while others may offer many, requiring you to specify the service you wish to query. For example, the service endpoint for the Twitter search is "https://api.twitter.com/1.1/search/tweets.json". The first part ("https://api.twitter.com/") specifies the URL to access the service. The next part "1.1" specifies the API version being used. The service ("search") is indicated next, as is the target ("tweets"), and the desired return time ("json").

Query parameters usually begin at the end of the service endpoint and start with a "?". Parameters could include a particular user or term to be searched. It could also include a location. For example, given a area, you can search for tweets that were posted at or near that point. In the Foursquare API, that would look like this: "https://api.foursquare.com/v2/venues/search?intent=browse&v=20140625&ll=40.7021015456,-74.0122357117&radius=100&limit=50". In this case, the service endpoint for the venue search service is "https://api.foursquare.com/v2/venues/search". The parameters include:

* "intent=browse" - indicating we're browsing the venues for a given area
* "v=20140625" - indicating the version of the API we're intending to use
* "ll=40.7021015456,-74.0122357117" - indicating the location we want to query around
* "radius=100" - indicating the radius around the point we want to query
* "limit=50" - indicating the number of results to display (in most cases, there is a preset max number that can be returned per query)

Some APIs have a parameter known as "offset" that allow you to page through results. In this case, after we get the first 50 results, we can set the offset to 50 so we get the next 50 results from the service.

Notice how the parameters are combined together using the ampersand ("&"). While this tends to be the preferred method for combining search parameters, always check the documentation and find examples in order to construct your own API calls. You may need to prototype the calls in your text editor and run them in your browser before you run them in your code. This allows you to fix minor formatting errors.

Authentication tokens identify the application accessing the service. In some cases, the token is just a simple identifier so the service knows which user is accessing the system. In other cases, there is a more robust authentication system with multiple authentication tokens to not only identify the user but secure access to the system on that users behalf. Many APIs are free, but some, especially those run by commercial services, charge a fee for access. In some cases, this is based on usage or a flat fee for access.

In the following lesson, you will practice registering for access tokens, composing an API request, and submitting the API request with Python. Often the API requests can be complex and the responses full of information that needs to be parsed. While you can parse the data in a text editor, Python is far more powerful a tool and for making multiple requests, the ability to script tasks and run them in loop, makes the operation far easier than handling the requests by hand. Adding in the storage component in SQLite makes the task even easier.

Good Overview of APIs: https://zapier.com/learn/apis/

## Acquiring Weather Data from an API

For this assignment, we're going to be collecting the max temperature in 5 major cities in the US over the course of a month and find out which city experienced the largest temperature swings.

The five cities I picked to use are:   

* Austin,TX,30.303936,-97.754355
* Boston,MA,42.331960,-71.020173
* Miami,FL,25.775163,-80.208615
* Phoenix,AZ,33.572154,-112.090132
* Seattle,WA,47.620499,-122.350876

How you store the cities and locations is up to you, but a dictionary will allow you to store the values in a way that keeps them associated together:

In [3]:
# create dictionary:
cities = {'Austin':'30.303936,-97.754355',
          'Boston':'42.331960,-71.020173',
          'Miami':'25.775163,-80.208615',
          'Phoenix':'33.572154,-112.090132',
         'Seattle':'47.620499,-122.350876'}

cities

{'Austin': '30.303936,-97.754355',
 'Boston': '42.331960,-71.020173',
 'Miami': '25.775163,-80.208615',
 'Phoenix': '33.572154,-112.090132',
 'Seattle': '47.620499,-122.350876'}

<b>Note:</b> Storing the locations in a string will allow you to easily insert them into your API request.

## Getting Started

We'll be using the forecast.io API to get weather data. This API powers the Dark Sky mobile app and offers an easy to use interface. There are a number of weather APIs available online, including one from the National Oceanic and Atmospheric Adminstration (NOAA), but this one is by far the easiest to access and use. Like some commercial APIs, there are an initial number of free queries (in this case 1000 a day) before the service charges your account ($1 for the next 10,000). The assignment can be completed using the free service and you won't have to supply a credit card if you don't want to.

We're going to be querying an API using Python. To start off with, you'll need to get keys. Go to the registration page (https://developer.forecast.io/register) and request the keys. You should assign the API key to a variable in your code so you can use it in your request.

As every API is different, it's important to look over the API documentation (https://developer.forecast.io/docs/v2). This will give you the format and parameters for your queries.

Note the URL pattern for the API call:

![](files/weather1.jpg)

Note the format for time (https://developer.forecast.io/docs/v2#time_call), either Unix time or the format "YYYY-MM-DDThh:mm:ss".

## Challenge

<b>1.</b> Build the API call by combining the string elements in Python for your first city. You can use the datetime.datetime.now() function from the datetime package for the current datetime. You can use the datetime.timedelta() function to subtract or add time to a date. In this case, we'll be subtracting 30 days from the current date to get our start date and then iterating through until the present day. We do that like this start_date = datetime.datetime.now() - datetime.timedelta(days=30). This will subtract 30 days from the current day.

In [4]:
cities['Austin']

'30.303936,-97.754355'

In [5]:
import datetime
import time

Formating Time for API CALL:
* [YYYY]-[MM]-[DD]T[HH]:[MM]:[SS] (ex: 2013-05-06T12:00:00-0400)

In [6]:
api_key = '20643adc99f908dd1da0d3696eac75d8'

current_time = datetime.datetime.now()
sub_days = datetime.timedelta(days=30)

#create API call:
api_call = 'https://api.forecast.io/forecast/'+api_key+'/'+cities['Austin']+','+(current_time - sub_days).strftime('%Y-%m-%dT%H:%M:%S')

In [7]:
print api_call

https://api.forecast.io/forecast/20643adc99f908dd1da0d3696eac75d8/30.303936,-97.754355,2015-09-29T17:49:00


<b>2.</b> Test the call for your first city and make sure you have it formatted properly. You can start by just printing out the URL and pasting it into your browser before you use the requests package to do the call for you. This can help you troubleshoot any errors (though you can use the text and status_code attributes to also troubleshoot any errors)   
<b>3.</b> Once you have the URL formatted properly, issue the request from your code and inspect the result. How many levels does the data have? Which field do we want to save to get the daily maximum temperature?

In [8]:
import requests

In [9]:
weather_test = requests.get(api_call)

In [10]:
weather_test.json()

{u'currently': {u'apparentTemperature': 89.99,
  u'cloudCover': 0.29,
  u'dewPoint': 61.96,
  u'humidity': 0.4,
  u'icon': u'partly-cloudy-day',
  u'precipIntensity': 0,
  u'precipProbability': 0,
  u'pressure': 1009.64,
  u'summary': u'Partly Cloudy',
  u'temperature': 89.53,
  u'time': 1443566940,
  u'visibility': 10,
  u'windBearing': 48,
  u'windSpeed': 9.16},
 u'daily': {u'data': [{u'apparentTemperatureMax': 93.98,
    u'apparentTemperatureMaxTime': 1443556800,
    u'apparentTemperatureMin': 67.57,
    u'apparentTemperatureMinTime': 1443524400,
    u'cloudCover': 0.13,
    u'dewPoint': 63.77,
    u'humidity': 0.62,
    u'icon': u'partly-cloudy-day',
    u'moonPhase': 0.56,
    u'precipIntensity': 0,
    u'precipIntensityMax': 0,
    u'precipProbability': 0,
    u'pressure': 1010.86,
    u'summary': u'Partly cloudy starting in the afternoon, continuing until evening.',
    u'sunriseTime': 1443529463,
    u'sunsetTime': 1443572452,
    u'temperatureMax': 92.32,
    u'temperatureMaxT

In [11]:
weather_test.json().keys()

[u'hourly',
 u'currently',
 u'longitude',
 u'flags',
 u'daily',
 u'offset',
 u'latitude',
 u'timezone']

In [12]:
print weather_test.json()['daily']['data'][0]

{u'apparentTemperatureMinTime': 1443524400, u'cloudCover': 0.13, u'temperatureMin': 67.57, u'summary': u'Partly cloudy starting in the afternoon, continuing until evening.', u'dewPoint': 63.77, u'apparentTemperatureMax': 93.98, u'temperatureMax': 92.32, u'temperatureMaxTime': 1443556800, u'windBearing': 15, u'moonPhase': 0.56, u'visibility': 10, u'sunsetTime': 1443572452, u'pressure': 1010.86, u'precipProbability': 0, u'apparentTemperatureMin': 67.57, u'precipIntensityMax': 0, u'icon': u'partly-cloudy-day', u'apparentTemperatureMaxTime': 1443556800, u'humidity': 0.62, u'windSpeed': 4.85, u'time': 1443502800, u'precipIntensity': 0, u'sunriseTime': 1443529463, u'temperatureMinTime': 1443524400}


Save the daily max temperature in the 'daily' field, 'temperatureMax'

<b>4.</b> Based on the data sample, create the table in a SQLite database called "weather.db".

In [13]:
import sqlite3 as lite

con = lite.connect('weather.db')
cur = con.cursor()

Create table in 'weather.db':   
<b>Note:</b> I found the instructions very unclear in this lesson. I've finally figured it out by moving ahead in the lesson to determine I'm only supposed to create a table with one variable from within 'daily'. I assumed (incorrectly) that I was supposed to be capturing more data in the database for each city. 

In [14]:
with con:
    cur.execute('CREATE TABLE daily_max_temp (\
    day TIMESTAMP, Austin REAL, Boston REAL, Miami REAL, Phoenix REAL, Seattle REAL)')

In [15]:
print pd.read_sql_query('SELECT * FROM daily_max_temp', con)
# 'SELECT *' will bring the entire table, 'con' at the end references the connection to the database. 

Empty DataFrame
Columns: [day, Austin, Boston, Miami, Phoenix, Seattle]
Index: []


<b>5.</b>Write a script that takes each city and queries every day for the past 30 days (Hint: You can use the datetime.timedelta(days=1) to increment the value by day).  
<b>6.</b> Save the max temperature values to the table, keyed on the date. You can leave the date in Unix time or convert to a string.

In SQL, a row has to be inserted before it can be updated. In order to keep the code clean, we're going to iterate through the values in the range and insert them into the database without any other values:

In [19]:
query_date = current_time - sub_days # as above; returns 30 days. 

with con:
    while query_date < current_time:
        cur.execute('INSERT INTO daily_max_temp(day) VALUES (?)', [query_date.strftime('%Y-%m-%dT%H:%M:%S')],)
        query_date += datetime.timedelta(days=1)

In [55]:
print pd.read_sql_query('SELECT * FROM daily_max_temp', con)

                    day Austin Boston Miami Phoenix Seattle
0   2015-09-29T17:49:00   None   None  None    None    None
1   2015-09-30T17:49:00   None   None  None    None    None
2   2015-10-01T17:49:00   None   None  None    None    None
3   2015-10-02T17:49:00   None   None  None    None    None
4   2015-10-03T17:49:00   None   None  None    None    None
5   2015-10-04T17:49:00   None   None  None    None    None
6   2015-10-05T17:49:00   None   None  None    None    None
7   2015-10-06T17:49:00   None   None  None    None    None
8   2015-10-07T17:49:00   None   None  None    None    None
9   2015-10-08T17:49:00   None   None  None    None    None
10  2015-10-09T17:49:00   None   None  None    None    None
11  2015-10-10T17:49:00   None   None  None    None    None
12  2015-10-11T17:49:00   None   None  None    None    None
13  2015-10-12T17:49:00   None   None  None    None    None
14  2015-10-13T17:49:00   None   None  None    None    None
15  2015-10-14T17:49:00   None   None  N

Now we can loop through our cities and query the API:

In [54]:
for k,v in cities.iteritems(): #for each city and corresponding co-ordinates(lat & long) in the dict:
    url = 'https://api.forecast.io/forecast/'+api_key
    
    query_date = current_time - sub_days # as above; returns 30 days.
    while query_date < current_time:
        #query for the value
        r = requests.get(url + '/' + v + ',' + query_date.strftime('%Y-%m-%dT%H:%M:%S'))

        with con:
            #insert the temperature max to the database
            cur.execute('UPDATE daily_max_temp SET ' + k + ' = ' + str(r.json()['daily']['data'][0]['temperatureMax']) + \
                        ' WHERE day = ' + query_date.strftime('%Y-%m-%dT%H:%M:%S'))

        #increment query_date to the next day for next operation of loop
        query_date += datetime.timedelta(days=1) #increment query_date to the next day


#con.close() # a good practice to close connection to database

OperationalError: unrecognized token: "29T17"