# Weather Analysis Task
In this task, we are provided with a list of locations (CSV) where we need to collect daily weather data. The weather data will be for particular range of dates per location. In order to do this, we are relying on historic weather information from Dark Sky.

In [143]:
import requests
import pandas as pd
import re
import getpass
from datetime import datetime

# read and explore locations.csv
df_locations = pd.read_csv("locations.csv")
df_locations.head()

Unnamed: 0,loc_id,postal_code,date_first,date_last
0,9be000ae23275d57e1273d211a54ffd7,AL3 5DQ,1485907200,1488326400
1,a35e427b4130be7b2a892e286f0ebb91,CB2 3QF,1486944000,1488326400
2,185674a2eb5c14fbdbb1d05a4109ea55,EX1 1GJ,1486425600,1488326400
3,afbddd7f957a1c822293616e95a2d84c,NG2 5GR,1486512000,1488326400
4,81cea1e224ad183b751acce139f4e276,GU21 6XX,1486080000,1488326400


# Google Maps API
As the Dark Sky API requires location data in the form of geo coordinates (long, lat), we will utilize the Google Maps API to convert the postal_code data in the csv. The Google Maps API is able to detect the origin of the postal_code and is able to provide the converted geo coordinate data. However, there was timeout issues with the Google Maps API which caused geo coordinates to be empty. A loop to check for this was added to ensure we have complete information available.

In [154]:
# get lon and lat geo coordinates via Google Maps API
def get_geo_coordinates(row):
    long, lat, has_no_geoloc = 0, 0, True
    
    while has_no_geoloc:
        # get geo coordinates via google api
        resp = requests.get("https://maps.googleapis.com/maps/api/geocode/json?address=" + re.sub("\s","%20",row['postal_code']) + "&json=1")
        if resp.status_code != 200:
            # This means something went wrong.
            raise Exception
        else:
            geo_data = resp.json()
            if len(geo_data['results']) > 0:
                long = geo_data['results'][0]['geometry']['location']['lng']
                lat = geo_data['results'][0]['geometry']['location']['lat']
                has_no_geoloc = False
    return pd.Series([long, lat], index=['long', 'lat'])

In [155]:
df_locations[["long", "lat"]] = df_locations.apply(get_geo_coordinates, axis=1)
df_locations

Unnamed: 0,loc_id,postal_code,date_first,date_last,long,lat
0,9be000ae23275d57e1273d211a54ffd7,AL3 5DQ,1485907200,1488326400,-0.34051,51.751905
1,a35e427b4130be7b2a892e286f0ebb91,CB2 3QF,1486944000,1488326400,0.120337,52.204212
2,185674a2eb5c14fbdbb1d05a4109ea55,EX1 1GJ,1486425600,1488326400,-3.528028,50.723242
3,afbddd7f957a1c822293616e95a2d84c,NG2 5GR,1486512000,1488326400,-1.127144,52.931983
4,81cea1e224ad183b751acce139f4e276,GU21 6XX,1486080000,1488326400,-0.558755,51.319326
5,5f1ce9b7c8cd32c08d98310540fb6604,BS34 5DG,1487289600,1488326400,-2.595357,51.525159
6,5180af03094779de849ca816c9f5b753,N10 1BY,1486944000,1488326400,-0.143414,51.592149
7,7f2aa8e72612f9130e06b32a0d2a58d7,BR1 1DN,1485907200,1488326400,0.016582,51.403891
8,8b049b660e984912c48da213f2f7c650,LS29 9PA,1486425600,1488326400,-1.827509,53.924762
9,fe5d591b3509247487a917d4e8a33f65,SE1 9FU,1486512000,1488326400,-0.101308,51.507117


# Dark Sky API
In order to use the Dark Sky API, Dark Sky requires an api key to be provided. With an api key stored, we explore how the Dark Sky API behaves using various points of data.

In [146]:
# get dark sky api key from user
dark_sky_api_key = getpass.getpass()

········


In [147]:
resp = requests.get("https://api.darksky.net/forecast/" + dark_sky_api_key + "/50.903584,-1.407188,1487808000?exclude=currently,minutely,hourly,flags")
if resp.status_code != 200:
    # This means something went wrong.
    raise Exception
else:
    weather_data = resp.json()
    print(weather_data, "\n")
    try:
        print(weather_data['daily']['data'][0]['precipProbability'])
    except:
        print(-1, "\n")
    
resp = requests.get("https://api.darksky.net/forecast/" + dark_sky_api_key + "/50.903584,-1.407188,1490252400?exclude=currently,minutely,hourly,flags")
if resp.status_code != 200:
    # This means something went wrong.
    raise Exception
else:
    weather_data = resp.json()
    print(weather_data, "\n")
    try:
        print(weather_data['daily']['data'][0]['precipProbability'])
    except:
        print(-1)

{'offset': 0, 'timezone': 'Europe/London', 'daily': {'data': [{'humidity': 0.73, 'time': 1487808000, 'apparentTemperatureMax': 42.39, 'visibility': 6.85, 'dewPoint': 38.42, 'windSpeed': 21.64, 'windBearing': 261, 'temperatureMax': 49.97, 'cloudCover': 0.66, 'precipType': 'rain', 'summary': 'Windy and mostly cloudy until evening.', 'apparentTemperatureMinTime': 1487883600, 'moonPhase': 0.89, 'sunsetTime': 1487871423, 'sunriseTime': 1487833432, 'icon': 'wind', 'temperatureMinTime': 1487887200, 'pressure': 1000.95, 'apparentTemperatureMin': 33.42, 'apparentTemperatureMaxTime': 1487833200, 'temperatureMaxTime': 1487833200, 'temperatureMin': 41.09}]}, 'latitude': 50.903584, 'longitude': -1.407188} 

-1 

{'offset': 0, 'timezone': 'Europe/London', 'daily': {'data': [{'humidity': 0.82, 'time': 1490227200, 'precipIntensity': 0.0082, 'apparentTemperatureMax': 50.79, 'visibility': 6.27, 'dewPoint': 39.17, 'windSpeed': 14.36, 'precipIntensityMaxTime': 1490238000, 'windBearing': 51, 'precipIntensi

# Dark Sky API exploration findings
## Missing daily precipProbability information
Unfortunately, Dark Sky does not keep all historic information for all locations. This can be seen in the short test cases above where test case one does not have any precipProbablity information while the test case two does. An except about this is shown below. Detailed information about this is further explained in Dark Sky API's [FAQ](https://darksky.net/dev/docs/faq).

### Data Availability and Accuracy
#### Are historical reports based on observational data?
> Usually, but we sometimes fill in data from other sources if no observations are available.

#### Why do historical reports sometimes change from the last time I requested them?
> Several of our data sources aren’t real-time, and flow into our system after the fact. As a result, our historical data may continue changing for up to two weeks after a time has occurred. (In most cases, the changes are very minor!)

#### Why are weather conditions sometimes missing from historical responses?
> We omit weather data that we do not have. This may be because there aren't any weather stations near you, or the weather stations that are near you are missing sensors for the kind of weather that you care about, or the weather stations that are near you have those sensors but are producing suspect data (in which case we ignore it).

#### Why isn't precipProbability always 0 or 1 for historical responses?
> We don't have perfect historical data—weather stations don't cover every square inch of the planet!—and so these probabilities reflect our uncertainty about what happened.

#### Can I get your previous forecasts instead of weather station observations?
> Unfortunately, we do not have the storage capacity to store previous forecasts. Sorry!


# Clean-up and preperation of data
The Dark Sky API expects a timestamp, longtitude, latitude. As the original locations.csv data provided only a date range and not the exact daily data, we expand the date ranges into daily rows into a new data frame to process.

In [169]:
rows_list = []
for index, row in df_locations.iterrows():
    i = 0
    epoch = row['date_first']
    while epoch < row['date_last']:
        s = {
            'loc_id' : row['loc_id'],
            'epoch' : '%.0f' % epoch,
            'date' : datetime.fromtimestamp(epoch).strftime('%Y-%m-%d'),
            'long' : '%.6f' % row['long'],
            'lat' : '%.6f' % row['lat'],
        }
        rows_list.append(s)
        i += 1
        epoch += (60*60*24)
    
df_locations_cleaned = pd.DataFrame(rows_list)
df_locations_cleaned

Unnamed: 0,date,epoch,lat,loc_id,long
0,2017-01-31,1485907200,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
1,2017-02-01,1485993600,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
2,2017-02-02,1486080000,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
3,2017-02-03,1486166400,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
4,2017-02-04,1486252800,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
5,2017-02-05,1486339200,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
6,2017-02-06,1486425600,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
7,2017-02-07,1486512000,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
8,2017-02-08,1486598400,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510
9,2017-02-09,1486684800,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510


# Obtaining weather data
To address the occurance of missing data from the Dark Sky API, any API calls that come back with a valid response but missing the precipProbability data will be assigned a value of -1. Otherwise, the precipProbability data will be stored as a percentage value. 

In [176]:
# get weather data via Dark Sky API
def get_weather(row):
    pp = None
    resp = requests.get("https://api.darksky.net/forecast/" + 
                        dark_sky_api_key + "/" + 
                        row['lat'] + "," +
                        row['long'] + "," +
                        row['epoch'] + "?exclude=currently,minutely,hourly,flags")
    if resp.status_code != 200:
        # This means something went wrong.
        raise Exception
    else:
        weather_data = resp.json()
        try:
            pp = weather_data['daily']['data'][0]['precipProbability']
        except:
            pp = -1
    if pp > 0:
        pp = (pp*100)
    print("Getting weather data for %s" % row['loc_id'])
    return pd.Series([
        weather_data['daily']['data'][0]['summary'],
        pp,
        weather_data['daily']['data'][0]['humidity'],
        weather_data['daily']['data'][0]['temperatureMin'],
        weather_data['daily']['data'][0]['temperatureMax']], 
        index=['summary', 'precipProbability', 'humidity', 'temperatureMin', 'temperatureMax'])

In [177]:
df_locations_cleaned[[
    "summary", 
    "precipProbability", 
    "humidity", 
    "temperatureMin", 
    "temperatureMax"]] = df_locations_cleaned.apply(get_weather, axis=1)
df_locations_cleaned

Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weather data for 9be000ae23275d57e1273d211a54ffd7
Getting weathe

Unnamed: 0,date,epoch,lat,loc_id,long,summary,precipProbability,humidity,temperatureMin,temperatureMax
0,2017-01-31,1485907200,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Foggy in the morning.,-1.0,0.96,46.65,50.46
1,2017-02-01,1485993600,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Mostly cloudy throughout the day.,-1.0,0.90,48.01,51.10
2,2017-02-02,1486080000,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Mostly cloudy throughout the day.,-1.0,0.86,39.57,49.72
3,2017-02-03,1486166400,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Partly cloudy until afternoon.,-1.0,0.89,35.52,47.23
4,2017-02-04,1486252800,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Mostly cloudy throughout the day.,-1.0,0.95,37.03,40.84
5,2017-02-05,1486339200,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Mostly cloudy starting in the afternoon.,-1.0,0.90,32.72,44.36
6,2017-02-06,1486425600,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Foggy in the morning.,-1.0,0.89,38.09,48.87
7,2017-02-07,1486512000,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Mostly cloudy throughout the day.,-1.0,0.90,35.42,39.48
8,2017-02-08,1486598400,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Foggy overnight.,-1.0,0.89,33.52,35.11
9,2017-02-09,1486684800,51.751905,9be000ae23275d57e1273d211a54ffd7,-0.340510,Foggy until afternoon.,-1.0,0.94,32.72,34.48


# Pivot data frame as precipitation report
With the precipitation data appended to the cleaned data frame, a pivot of the data frame produces a simple precipitation report.

In [179]:
df_locations_cleaned.pivot(index='loc_id', columns='date', values='precipProbability')

date,2017-01-31,2017-02-01,2017-02-02,2017-02-03,2017-02-04,2017-02-05,2017-02-06,2017-02-07,2017-02-08,2017-02-09,...,2017-02-18,2017-02-19,2017-02-20,2017-02-21,2017-02-22,2017-02-23,2017-02-24,2017-02-25,2017-02-26,2017-02-27
loc_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
07d09eab6c837d4b0c0b17aba37e1dcf,,,,,,,,,,,...,,,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
11aeddbd12e79cae8dddb694e200f00d,,,,,,,,,,,...,,,,,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
16bdd6fe40ed85ce99a456af6d7cff93,,,,,,,98.0,0.0,98.0,98.0,...,0.0,0.0,98.0,0.0,97.0,0.0,98.0,97.0,93.0,98.0
185674a2eb5c14fbdbb1d05a4109ea55,,,,,,,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
5180af03094779de849ca816c9f5b753,,,,,,,,,,,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
51fa5362e943615c7b31d367b461fd2c,,,,,,,,,,,...,,,,,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
58f7d1d3ce8cc4e808bf840b56714b38,,,,,,,,,,,...,0.0,0.0,98.0,0.0,97.0,0.0,98.0,97.0,93.0,98.0
5f1ce9b7c8cd32c08d98310540fb6604,,,,,,,,,,,...,0.0,0.0,98.0,0.0,97.0,0.0,98.0,97.0,93.0,98.0
7f2aa8e72612f9130e06b32a0d2a58d7,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0
81cea1e224ad183b751acce139f4e276,,,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0


# Analysis of precipitation report
In looking at the list of 20 locations, it is evident that precipitation data is only collected at select locations. In the list of 20 locations, only 4 locations had any precipitation data available from Dark Sky.

# To Do
+ Error checking for weather data
+ Collect more weather data from other sources
+ Improve check for precipitation percentages