# Week 7 Assignment

This week's assignment you will be working with NOAAs weather API. This API will allow you to retrieve a variety of data from a specific weather station(s), of your choice.

API Documentation: https://www.ncdc.noaa.gov/cdo-web/webservices/v2#gettingStarted

As the API documentation page states, you will need to register for your own credentials. Following the instructions at https://www.ncdc.noaa.gov/cdo-web/token to register.

<div class="alert alert-block alert-danger">
<b>Important::</b> You can remove the following cell and use the commented out cell just below to load your Twitter credentials. The auth2.csv will not be provided to you. Please notice that the individual credential fields are stored as strings.
</div>

In [None]:
### Remove or comment out this cell
import pandas as pd

# loading my specific credentials
data = pd.read_csv('auth2.csv',header=0)
# setting up some variables for Twitter. 
my_token = data['token'][0]

In [None]:
my_token

In [None]:
# ### You should uncomment this cell and use your credentials from NOAA

# # smy credentials for NOAA API. 
#my_token = ''

Now we need to determine a weather station that we would like to retrieve our data for. Use the following link to get the id for a NOAA weather station. https://www.ncdc.noaa.gov/cdo-web/datatools/findstation

Fill out all field based on your preferences. I used:
   * Location: CO
   * Dataset: Daily Summaries
   * Data Range: 2019-11-01 to 20019-11-30
   * Data Category: Air Temperature


<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_query.png" >

#### Click on 'Full Details' to see all the information
<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_result.png" ><br>

From the Find A Station results, we will need to capture the following details:
   * Capture the values within the 'Network' and 'Id' fields (second cell from top, split on ':')

In [None]:
# variables based on my station search
network = 'GHCND'
ID = 'USW00023066'


# station_id = network:ID
station_id = network + ':' + ID
print(station_id)


### What type of data are we looking for?
At this point we need to determine what type of data we want to retrieve. We can actually use the NOAAs API to help determine what is available for this station.

One of the documentation pages https://www.ncdc.noaa.gov/cdo-web/webservices/v2#dataTypes shows us how to query for the available datatypes for the station we have chosen above.

As we saw in the FTE, we can build a dictionary of parameters to be used in our request.

In [None]:
import requests
import json

# building the parameter dictionary
# 'limit = 1000' --> What does this do? Look at the NOAA API documentation
data = {}
data = {'limit':'1000', 'datasetid': network, 'station_id': station_id}

# calling NOAA API to get the available datatypes for this specific station
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes',params = data, headers = {'token':my_token})

Now we need to convert the JSON output from the request to something more readable

In [None]:
# JSON to dictionary
datatypes_dict = json.loads(r.text)

# need the keys from this dictionary
datatypes_dict.keys()


I'm going to guess that the information we are after is stored in the results key.  Let's look at the first 5 and see if we might be right

In [None]:
# datatypes_dict['results']
datatypes_dict

So, the results appear to be a list of dictionaries. 

<div class="alert alert-block alert-warning">
<b>Note:</b>  I'll leave parsing through all of these as an exercise for you to do.  I already did this seperately and determined I will be using the datatype set of 'TAVG' which is average temp and is available for the year of 2018.
</div>

In [None]:
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
# WT03 => Thunder
# SNOW => Snow
data.update({'datatypeid': 'SNOW'})
data.update({'startdate': '2018-01-01'})
data.update({'enddate': '2018-12-31'})
data.update({'units':'standard'})
data

In [None]:
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})

#load the api response as a json
avg_snow_2018_dict = json.loads(r.text)

In [None]:
avg_snow_2018_dict

In [None]:
# look at the first record of our data
avg_snow_2018_dict['results'][:10]

Looks like we have daily data and the 'value' key appears to contain a number that seems reasonable for temperature.

Let's just verify that we got a record for everyday of 2018

In [None]:
# there were 365 days in 2018
len(avg_snow_2018_dict['results'])

In [None]:
# look at the first and last day
print(avg_snow_2018_dict['results'][0])
print(avg_snow_2018_dict['results'][364])

### Requirements for the assignment
Using the NOAA API, retrieve data for a weather station of your choice.  Based on the station you pick, 
   * Determine an appropriate dataset 
   * Determine an appropriatedataset type
   * Pull at least 3 years worth of data.<br>
     Note: if you pick an annual dataset, you will need to pull at least 25 years worth of data.
   * Organize your results into a meaningful representation
   * Store your result in one of the followinf formats:
      - csv file
      - json file
      - relational database






<div class="alert alert-block alert-danger">
<b>Important::</b> You MAY NOT reuse the station or datasettype that was demostrated above. This means the following are off limits: 
    
   * ID = 'USW00023129'
   * datatypeid = 'TAVG'

</div>

<div class="alert alert-block alert-warning">
<b>Hint:</b> The NOAA API will only allow you to pull one year of data at a time.
</div>

I changed the data above to test out the API workflow and parameters. But lets start all over!

In [1]:
import pandas as pd
import requests
import json

# loading my specific credentials
data = pd.read_csv('auth2.csv',header=0)

#setting my token to a variable 
my_token = data['token'][0]

#Setting my Station ID => Grand Junction, Mesa County, Colorado
network = 'GHCND'
ID = 'USW00023066'

# station_id = network:ID
station_id = network + ':' + ID

data = {'limit': '1000',
        'datasetid': network,
        'stationid': station_id,
        'datatypeid': 'SNOW',
        'startdate': '2018-01-01',
        'enddate': '2018-12-31',
        'units': 'standard'}

In [2]:
api_data = []
for x in range(2016, 2020):
    print('working on ' + str(x))
    startupdate = str(x) + '-01-01'
    endupdate = str(x) + '-12-31'
    data.update({'startdate': startupdate})
    data.update({'enddate': endupdate})
    r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})
    api_data.append(json.loads(r.text))


working on2016
working on2017
working on2018
working on2019


In [4]:
with open('api_data.txt', 'w') as outfile:
    json.dump(api_data, outfile)

In [13]:
def get_api(url, startdate, enddate, datatype):
    '''Will return a json object with your selected params provide in your data dictionary for requirments for this 
    api. currently the dict should be set to  limit = 1000' and 'units': 'standard'. The following params can be changed
    'datasetid': network, 'stationid': station_id, 'datatypeid': 'SNOW' to the data you want to return.
    '''
    api_data = []
    data.update({'datatypeid': datatype})
    for x in range(startdate, enddate + 1):
        print('working on ' + str(x))
        startupdate1 = str(x) + '-01-01'
        endupdate1 = str(x) + '-12-31'
        data.update({'startdate': startupdate1})
        data.update({'enddate': endupdate1})
        r = requests.get(url,params = data, headers = {'token':my_token})
        api_data.append(json.loads(r.text))
    return api_data

In [14]:
test2 = get_api('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',2016,2019,'PRCP')

working on 2016
working on 2017
working on 2018
working on 2019
