# Weather variables from World Weather Online API

Using the date, latitude, and longitude from the scrubbed harvest data we will request weather variables for each observation from the World Weather Online (WWO) API.

In order to make requests to the WWO API you must register an account, either a free trail or paid account. The trial account allows access to the Local, Marine, Ski, Time zone, Location Search, and Past Weather API for 60 days. Each account is assigned a key which is used to make requests. The trial account is limited to 500 requests per day.

Import modules

In [15]:
import urllib3
import json
import csv
import os
import time
import ephem
import datetime
import pandas as pd

### Harvest Records

The path to the harvest records from iNaturalist are stored here as `harvest`. The original data download included over 4,000 harvest records. In order to reduce the number of requests to the WWO API the data is divided into multiple csv files containing roughly 350 records per csv file. I simply fed each subsample (12 total) every time I wanted to make requests. There's probably a better way to govorn requests, but I'm a noob.

In [16]:
# input csv file, harvest records of freshwater anglers
harvest = r"C:\Users\steph\Documents\Python\Projects\Silas\Demo\Harvests_Demo_SubSample.csv"

I didn't know pandas until after I finished this part of the project, but I'm going to insert this here because it's fun using new stuff.

In [17]:
# set the harvest data to the object fish
fish = pd.read_csv(harvest)
# create dataframe with weather variables
df = pd.DataFrame(fish)

# drop fields from dataframe that we do not need
df = df.drop(['time_observed_at', 'time_zone', 'out_of_range', 'quality_grade', 
'taxon_id', 'id', 'user_id', 'user_login', 'created_at', 'updated_at', 'license', 
'url', 'image_url', 'id_please', 'captive_cultivated', 'coordinates_obscured', 
'taxon_family_name', 'scientific_name', 'place_state_name', 'description'], axis=1)

# view top 5 rows
print(df.head())

   observed_on   latitude   longitude      common_name
0     20060610  44.606907 -110.383544  Cutthroat Trout
1     20120407  47.403435 -121.898910  Cutthroat Trout
2     20120428  39.170806 -123.009836    Rainbow Trout
3     20080316 -53.854745  -68.251376      Brown Trout
4     20121218 -45.111625  167.954295      Brown Trout


### Use harvest data to request API weather data for each harvest record. 

The API requires dates to be `yyyy-mm-dd`, but our harvest records
dates as `yyyymmdd`. Because CSVs suck balls, we have to break this down the 
hard way -- splice the date field into three variables then combine them 
back together to make the request.

We also need the latitude and longitude from the harbvest records along with the WWO free trial account key to make requests.

When a request is made, the WWO API returns a response that includes the weather variables in json format.

In [None]:
with open(harvest, 'rb') as csvfile:
        # read harvest records in the input file for processing
        rcsv = csv.reader(csvfile, delimiter=',')
        # skip first row
        next(rcsv, None)

        for row in rcsv:
            date = row[0]
            lat = row[1]
            lon = row[2]
            print date, lat, lon
            for i in date:
                # isolate first four digits >> year
                yyyy = date[0:4]
                # isolate 5th and 6th digit >> month
                mm = date[4:6]
                # isolate 7th and 8th digit >> date
                dd = date[6:9]

            # use PoolManager to make requests, bc thread safety
            http = urllib3.PoolManager()

            # request() returns HTTPResponse object.
            # Add json at end of url to return json data
            r = http.request('GET',
                             'http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key'
                             '=_YOUR_KEY_HERE_&q=' + lat + ',' + lon + '&format=json&date=' + 
                             yyyy + '-' + mm + '-' + dd + '&tp=24')
            
            
            '''To get climate data, we first need to know what keys & the dictionary 
            the keys are within. Using the print statement parsed_json/jsondata we
            determnined the keys & dictionaries the climate data are stored:
            The key "data" contains key "weather" which contains a dictionary [] 
            that contains the key date'''
            
            # response = r.status #<<<---commented out, return http response code
            jsondata = r.data  # return json data
            print jsondata
            print "\n"

            # format json data for parsing
            parsed_json = json.loads(jsondata.decode('utf-8'))

            # return the first key in response body
            body = parsed_json['data']


## Aggregate Data
The path to the aggreate data is stored as `aggregate`, and is the output csv file containing information from the harvest records and also weather variables for each harvest date that we request from WWO API. Each time I ran this script, I changed the suffix of `aggregate`to match the `harvest` suffix from above. 

In [18]:
# output csv file, aggregate harvest records and climatic variables
aggregate = r"C:\Users\steph\Documents\Python\Projects\Silas\Demo\Harvest_WWO_aggregate_Demo.csv"

### Create template for aggregate data
The `createcsv` function creates our output csv file. The headers were determined by making requests to the WWO API and reviewing the returned json data. 

In [19]:
# set variable to the output file pathway, used later to skip header
# file_exists = os.path.isfile(outputfile)
# start writing/appending
with open(aggregate, 'ab') as f:
    w = csv.writer(f)
    # set column headers of the csv
    header = ("date", "latitude", "longitude", "time", "uv_index", "cloud_cover",
              "temp_f", "humidity", "wind_direction_degrees", "wind_speed_kmph",
              "pressure_mb", "dewpoint_f", "head_index_f", "percipitation_mm",
              "visualbility_km", "moon_phase", "dayofyear", "species", "angler",
              "angler_id", "harvest_id")
    w.writerow(header)

In [20]:
# print the header of the newly created template
with open(aggregate):
    print header

('date', 'latitude', 'longitude', 'time', 'uv_index', 'cloud_cover', 'temp_f', 'humidity', 'wind_direction_degrees', 'wind_speed_kmph', 'pressure_mb', 'dewpoint_f', 'head_index_f', 'percipitation_mm', 'visualbility_km', 'moon_phase', 'dayofyear', 'species', 'angler', 'angler_id', 'harvest_id')


## Demo: Request data from WWO API

We combine everything here. 
After we run this cell, we then executre the functions in the following cell

In [21]:
def createcsv(outputfile):
    file_exists = os.path.isfile(outputfile)
    with open(aggregate, 'ab') as f:
        w = csv.writer(f)
        header = ("date", "latitude", "longitude", "time", "uv_index", "cloud_cover",
                  "temp_f", "humidity", "wind_direction_degrees", "wind_speed_kmph",
                  "pressure_mb", "dewpoint_f", "head_index_f", "percipitation_mm",
                  "visualbility_km", "moon_phase", "dayofyear", "species", "angler",
                  "angler_id", "harvest_id")
        if not file_exists:
            w.writerow(header)
        rows = gethwc(harvest)
        w.writerows(rows)

def gethwc(path):
    with open(harvest, 'rb') as csvfile:
        rcsv = csv.reader(csvfile, delimiter=',')
        next(rcsv, None)

        for row in rcsv:
            date = row[0]
            lat = row[1]
            lon = row[2]
            utc = row[3]
            spp = row[8] 
            h_id = row[10]
            a_id = row[11]
            angler = row[12]

            '''Calculate Moon's illumination using utc values and ephem. Earth's moon 
            is a standard body in ephem. We ignore the "UTC" string in the utc field 
            (utc[:-3]) and pass it to ephem so it can compute the Moon's position 
            for each utc date. Ephem can return the moon's illumination or moon_phase 
            attribute using the moons position and utc date'''

            for i in utc:
                utcc = utc[:-3]
            m = ephem.Moon()
            m.compute(utcc)
            mp = m.moon_phase
            dt = datetime.datetime.strptime(utcc[:-1], "%Y-%m-%d %H:%M:%S")
            dy = dt.timetuple().tm_yday

            '''Use lat, long, & date to request API weather data for each harvest 
            record. The API requires dates to be yyyy-mm-dd, but our harvest records
            use yyyymmdd. Because CSVs suck balls, we have to break this down the 
            hard way -- splice the date field into three variables then combine them 
            back together to make the request'''

            for i in date:
                yyyy = date[0:4]
                mm = date[4:6]
                dd = date[6:9]

            http = urllib3.PoolManager()

            try:
                r = http.request('GET',
                                'http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key'
                                '=_YOUR_KEY_HERE_&q=' + lat + ',' + lon + '&format=json&date=' + 
                                yyyy + '-' + mm + '-' + dd + '&tp=24')
            except urllib3.exceptions.HTTPError as e:
                print "HTTP error:              ", e
                break

            print "Request Time:        ", time.clock()
            
            '''To get climate data, we first need to know what keys & the dictionary 
            the keys are within. Using the print statement parsed_json/jsondata we
            determnined the keys & dictionaries the climate data are stored:
            The key "data" contains key "weather" which contains a dictionary [] 
            that contains the key "date"'''

            jsondata = r.data
            parsed_json = json.loads(jsondata.decode('utf-8'))
            body = parsed_json['data']

            if "error" in body:
                print "Warning:              " + parsed_json['data']['error'][0]['msg']
                # error message when we reach our rquest limit/day
                limiterror = "API key has reached calls per day allowed limit."
                # if we max out our limit, break the loop and end program
                if limiterror in body:
                    break
                else:
                    # if no data/error, then only yeild three fields
                    seq = (date, lat, lon)
                    yield seq
            else:
                date1 = parsed_json['data']['weather'][0]['date']
                print "Harvest Date:        ", yyyy + '-' + mm + '-' + dd
                print "Json Date:           ", date1
                uv = parsed_json['data']['weather'][0]['uvIndex']
                cc = parsed_json['data']['weather'][0]['hourly'][0]['cloudcover']
                hu = parsed_json['data']['weather'][0]['hourly'][0]['humidity']
                precip = parsed_json['data']['weather'][0]['hourly'][0]['precipMM']
                ap = parsed_json['data']['weather'][0]['hourly'][0]['pressure']
                dew = parsed_json['data']['weather'][0]['hourly'][0]['DewPointF']
                hi = parsed_json['data']['weather'][0]['hourly'][0]['HeatIndexF']
                temp = parsed_json['data']['weather'][0]['hourly'][0]['tempF']
                viz = parsed_json['data']['weather'][0]['hourly'][0]['visibility']
                wd = parsed_json['data']['weather'][0]['hourly'][0]['winddirDegree']
                ws = parsed_json['data']['weather'][0]['hourly'][0]['windspeedKmph']

                '''Organize the copied, computed, and retrieved data into the same sequence 
                as the headers of createcsv(), then return them all so they can be used
                to populate a new csv.'''

                seq = (date, lat, lon, utc, uv, cc, temp, hu, wd, ws, ap,
                       dew, hi, precip, viz, mp, dy, spp, angler, a_id, h_id)

                yield seq

            # sleep in order to limit rate of requests to API
            print "\nNap Time:            ", time.clock()
            time.sleep(6)
            print "\n"

    # close harvest records
    csvfile.close()
    
    print "Valar Morghulis"

In [22]:
if __name__ == '__main__':
    createcsv(aggregate)
    gethwc(harvest)

Request Time:         432.295817429

Nap Time:             432.296717308


Request Time:         438.68589266
Harvest Date:         2012-04-07
Json Date:            2012-04-07

Nap Time:             438.686626457


Request Time:         445.073442798
Harvest Date:         2012-04-28
Json Date:            2012-04-28

Nap Time:             445.074836694


Request Time:         451.443584215

Nap Time:             451.444586353


Request Time:         457.938192166
Harvest Date:         2012-12-18
Json Date:            2012-12-18

Nap Time:             457.940054338


Valar Morghulis


In [23]:
# set the harvest data to the object fish
agg = pd.read_csv(aggregate)
# create dataframe with weather variables
df2 = pd.DataFrame(agg)

df2.head()

Unnamed: 0,date,latitude,longitude,time,uv_index,cloud_cover,temp_f,humidity,wind_direction_degrees,wind_speed_kmph,...,dewpoint_f,head_index_f,percipitation_mm,visualbility_km,moon_phase,dayofyear,species,angler,angler_id,harvest_id
0,20060610,44.60690713,-110.383544,,,,,,,,...,,,,,,,,,,
1,20120407,47.40343475,-121.8989105,,,,,,,,...,,,,,,,,,,
2,20120428,39.17080572,-123.0098361,2012-04-28 21:00:38 UTC,0.0,17.0,66.0,79.0,320.0,17.0,...,47.0,53.0,0.0,10.0,0.4446631345059225,119.0,Oncorhynchus mykiss,williamshailey6,5843.0,72221.0
3,20080316,-53.85474539,-68.25137619,,,,,,,,...,,,,,,,,,,
4,20121218,-45.11162517,167.9542947,2012-12-18 06:15:41 UTC,0.0,60.0,65.0,84.0,177.0,4.0,...,49.0,55.0,9.8,7.0,0.301520320742488,353.0,Salmo trutta,mccain,8074.0,172543.0
