# <center>Data Wrangling: REST and Streaming API - Stacey Sandy</center>

This week's assignment you will be working with National Oceanic and Atmomspheric Administration (NOAAs) weather API. This API will allow you to retrieve a variety of data from a specific weather station(s), of your choice.

API Documentation: https://www.ncdc.noaa.gov/cdo-web/webservices/v2#gettingStarted

As the API documentation page states, you will need to register for your own credentials. Following the instructions at https://www.ncdc.noaa.gov/cdo-web/token to register.

<div class="alert alert-block alert-danger">
<b>Important::</b> You can remove the following cell and use the commented out cell just below to load your NOAA credentials. The auth2.csv will not be provided to you. Please notice that the individual credential fields are stored as strings.
</div>

In [1]:
### Remove or comment out this cell ###THIS IS AN EXAMPLE!
import pandas as pd

# loading my specific credentials
data = pd.read_csv('auth2.csv',header=0)

# setting up some variables for Twitter. 
my_token = data['token'][0]

In [2]:
#THIS IS AN EXAMPLE!!!
my_token

'iPwSshpjoAHyXGpseDijEqmGOZGLOJBO'

In [1]:
#Use your credentials from NOAA
#These are my credentials for NOAA API. 

my_token = 'enter_API_code_here'

Now we need to determine a weather station that we would like to retrieve our data for. Use the following link to get the id for a NOAA weather station. https://www.ncdc.noaa.gov/cdo-web/datatools/findstation

Fill out all field based on your preferences. I used:
   * Location: CO
   * Dataset: Daily Summaries
   * Data Range: 2019-11-01 to 20019-11-30
   * Data Category: Air Temperature


<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_query.png" >

#### Click on 'Full Details' to see all the information
<img align="left" style="padding-right:10px;" src="figures_7/NOAA_find_station_result.png" ><br>

From the Find A Station results, we will need to capture the following details:
   * Capture the values within the 'Network' and 'Id' fields (second cell from top, split on ':')

In [2]:
# variables based on my station search
network = 'GHCND'
ID = 'USW00023129'

# station_id = network:ID
station_id = network + ':' + ID
print(station_id)

GHCND:USW00023129


### What type of data are we looking for?
At this point we need to determine what type of data we want to retrieve. We can actually use the NOAAs API to help determine what is available for this station.

One of the documentation pages https://www.ncdc.noaa.gov/cdo-web/webservices/v2#dataTypes shows us how to query for the available datatypes for the station we have chosen above.

As we saw in the FTE, we can build a dictionary of parameters to be used in our request.

In [3]:
import requests
import json

# building the parameter dictionary
# 'limit = 1000' --> What does this do? Look at the NOAA API documentation
data = {}
data = {'limit':'1000', 'datasetid': network, 'station_id': station_id}

# calling NOAA API to get the available datatypes for this specific station
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes',params = data, headers = {'token':my_token})

Now we need to convert the JSON output from the request to something more readable

In [4]:
# JSON to dictionary
datatypes_dict = json.loads(r.text)

# need the keys from this dictionary
datatypes_dict.keys()


dict_keys(['metadata', 'results'])

I'm going to guess that the information we are after is stored in the results key.  Let's look at the first 5 and see if we might be right

In [5]:
datatypes_dict['results'][:10]

[{'mindate': '1994-03-19',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness midnight to midnight from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACMC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness midnight to midnight from manual observations',
  'datacoverage': 1,
  'id': 'ACMH'},
 {'mindate': '1994-02-01',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness sunrise to sunset from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACSC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness sunrise to sunset from manual observations',
  'datacoverage': 1,
  'id': 'ACSH'},
 {'mindate': '1982-01-01',
  'maxdate': '2019-12-24',
  'name': 'Average wind speed',
  'datacoverage': 1,
  'id': 'AWND'},
 {'mindate': '1948-08-02',
  'maxdate': '2012-07-23',
  'name': 'Number of days included in the multiday evaporation total (MDEV)',
  'datacoverage': 1,
  'id': 'DAEV'},
 {'mindate': '1832-0

So, the results appear to be a list of dictionaries. 

<div class="alert alert-block alert-warning">
<b>Note:</b>  I'll leave parsing through all of these as an exercise for you to do.  I already did this seperately and determined I will be using the datatype set of 'TAVG' which is average temp and is available for the year of 2018.
</div>

In [6]:
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}


# append additional parameters to data dictionary
data.update({'datatypeid': 'TAVG'})
data.update({'startdate': '2018-01-01'})
data.update({'enddate': '2018-12-31'})
data.update({'units':'standard'})
data

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USW00023129',
 'datatypeid': 'TAVG',
 'startdate': '2018-01-01',
 'enddate': '2018-12-31',
 'units': 'standard'}

In [7]:
# make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})

#load the api response as a json
avg_temp_2018_dict = json.loads(r.text)

In [8]:
# look at the first record of our data
avg_temp_2018_dict['results'][:10]

[{'date': '2018-01-01T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 56.0},
 {'date': '2018-01-02T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 60.0},
 {'date': '2018-01-03T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 61.0},
 {'date': '2018-01-04T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 62.0},
 {'date': '2018-01-05T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 63.0},
 {'date': '2018-01-06T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 62.0},
 {'date': '2018-01-07T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:USW00023129',
  'attributes': 'H,,S,',
  'value': 63.0},
 {'date': '2018-01-08T00:00:00',
  'datatype': 'TAVG',
  'station': 'GHCND:U

Looks like we have daily data and the 'value' key appears to contain a number that seems reasonable for temperature.

Let's just verify that we got a record for everyday of 2018

In [9]:
# there were 365 days in 2018
len(avg_temp_2018_dict['results'])

365

In [10]:
# look at the first and last day
print(avg_temp_2018_dict['results'][0])
print(avg_temp_2018_dict['results'][364])

{'date': '2018-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00023129', 'attributes': 'H,,S,', 'value': 56.0}
{'date': '2018-12-31T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:USW00023129', 'attributes': 'H,,S,', 'value': 55.0}


### Requirements for the assignment
Using the NOAA API, retrieve data for a weather station of your choice.  Based on the station you pick, 
   * Determine an appropriate dataset 
   * Determine an appropriatedataset type
   * Pull at least 3 years worth of data.<br>
     Note: if you pick an annual dataset, you will need to pull at least 25 years worth of data.
   * Organize your results into a meaningful representation
   * Store your result in one of the followinf formats:
      - csv file
      - json file
      - relational database






<div class="alert alert-block alert-danger">
<b>Important::</b> You MAY NOT reuse the station or datasettype that was demostrated above. This means the following are off limits: 
    
   * ID = 'USW00023129'
   * datatypeid = 'TAVG'

</div>

<div class="alert alert-block alert-warning">
<b>Hint:</b> The NOAA API will only allow you to pull one year of data at a time.
</div>

## <b>This is where my Week 7 assignment submittal begins:</b><br>

In [1]:
#These are my credentials for NOAA API, again. 

my_token = 'LpPfhXrxDaaDzrGeNpdmsRlpJFmNspKG'

The background story is that I initially wanted to obtain the weather station data from 2012 when my husband and I celebrated our 10 year marriage anniversary. I figured if I was going to conduct an exploratory data analysis on weather, let's make it fun and interesting... I later learned the hard way that no data from the Hana airport or closer weather station in Hana, HI were recorded on the NOAA API website in 2012. Therefore, I settled on the airport that we flew into in April 2012: Kahului Airport (OGG; airport code) of the island of Maui, Hawaii.<br>
<br>
<br>On a side note....interestingly, Paris, France was my second weather station location of interest. I have dreamt of going to Paris since Sophmore year of high school. Even with speaking very little french language, I have always wanted to EAT, SHOP, and SCROLL the city of Paris. Perhaps if time persists...I can expand upon that interestst.

Here is an outline of my Week 7 deliverables within this Jupyter Notebook:
   * Determine an appropriate dataset: Kahului Airport (Maui), HI, US
   * Determine an appropriatedataset type: AWND (Average Wind Speed)
   * Pull at least 3 years worth of data: 11/1/2013 to 12/31/2015<br>
     Note: if you pick an annual dataset, you will need to pull at least 25 years worth of data.
   * Organize your results into a meaningful representation: EDA, dataframes, averages, and etc...
   * Store your result in one of the following format: csv file

In [3]:
# variables based on my station search
network = 'GHCND'
ID = 'USW00022516'

#My Weather Station selection: Kahului Airport in Maui, Hawaii
#Name Kahului Airport (Maui), HI, US
#NetworkID: GHCND:USW00022516
#Latitude/Longitude: 20.89972°, -156.42861° 
#Elevation: 15.5 m 

# station_id = network:ID
station_id = network + ':' + ID
print(station_id)

GHCND:USW00022516


In [4]:
#Build dictionary of parameters in our request
import requests
import json

# building the parameter dictionary
# 'limit = 1000' --> What does this do? Look at the NOAA API documentation
data = {}
data = {'limit':'1000', 'datasetid': network, 'station_id': station_id}

# calling NOAA API to get the available datatypes for this specific station
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes',params = data, headers = {'token':my_token})

In [5]:
#Convert above JSON output into more readable format

# JSON to dictionary
datatypes_dict = json.loads(r.text)

# need the keys from this dictionary
datatypes_dict.keys()

dict_keys(['metadata', 'results'])

In [6]:
#Look at first few results
datatypes_dict['results'][:10]

[{'mindate': '1994-03-19',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness midnight to midnight from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACMC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness midnight to midnight from manual observations',
  'datacoverage': 1,
  'id': 'ACMH'},
 {'mindate': '1994-02-01',
  'maxdate': '1996-05-28',
  'name': 'Average cloudiness sunrise to sunset from 30-second ceilometer data',
  'datacoverage': 1,
  'id': 'ACSC'},
 {'mindate': '1965-01-01',
  'maxdate': '2005-12-31',
  'name': 'Average cloudiness sunrise to sunset from manual observations',
  'datacoverage': 1,
  'id': 'ACSH'},
 {'mindate': '1982-01-01',
  'maxdate': '2019-12-24',
  'name': 'Average wind speed',
  'datacoverage': 1,
  'id': 'AWND'},
 {'mindate': '1948-08-02',
  'maxdate': '2012-07-23',
  'name': 'Number of days included in the multiday evaporation total (MDEV)',
  'datacoverage': 1,
  'id': 'DAEV'},
 {'mindate': '1832-0

In [7]:
#Let's just look at the AWND (Avereage Wind Speed) from the results
results = datatypes_dict['results']
results[4]

{'mindate': '1982-01-01',
 'maxdate': '2019-12-24',
 'name': 'Average wind speed',
 'datacoverage': 1,
 'id': 'AWND'}

In [8]:
for result in results:
    print(f"{result['id']}:     {result['name']}")

ACMC:     Average cloudiness midnight to midnight from 30-second ceilometer data
ACMH:     Average cloudiness midnight to midnight from manual observations
ACSC:     Average cloudiness sunrise to sunset from 30-second ceilometer data
ACSH:     Average cloudiness sunrise to sunset from manual observations
AWND:     Average wind speed
DAEV:     Number of days included in the multiday evaporation total (MDEV)
DAPR:     Number of days included in the multiday precipitation total (MDPR)
DASF:     Number of days included in the multiday snow fall total (MDSF) 
DATN:     Number of days included in the multiday minimum temperature (MDTN)
DATX:     Number of days included in the multiday maximum temperature (MDTX)
DAWM:     Number of days included in the multiday wind movement (MDWM)
DWPR:     Number of days with non-zero precipitation included in multiday precipitation total (MDPR)
EVAP:     Evaporation of water from evaporation pan
FMTM:     Time of fastest mile or fastest 1-minute wind
FRGB:

In [9]:
#Get first year of data (2013) and set parrameters
data = {}
data = {'limit':'1000', 'datasetid': network, 'stationid': station_id}

#I am going to continue to focus on AWND (Average Wind Speed)

# append additional parameters to data dictionary
data.update({'datatypeid': 'AWND'})
data.update({'startdate': '2013-01-01'})
data.update({'enddate': '2013-12-31'})
data.update({'units':'standard'})
data

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USW00022516',
 'datatypeid': 'AWND',
 'startdate': '2013-01-01',
 'enddate': '2013-12-31',
 'units': 'standard'}

In [10]:
#Request year of data and make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data, headers = {'token':my_token})

#load the api response as a json
avg_OGG_Wind_2013_dict = json.loads(r.text)

In [11]:
# look at the first record of our data
avg_OGG_Wind_2013_dict['results'][:10]

[{'date': '2013-01-01T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 13.9},
 {'date': '2013-01-02T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 15.9},
 {'date': '2013-01-03T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 15.2},
 {'date': '2013-01-04T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 19.5},
 {'date': '2013-01-05T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 20.1},
 {'date': '2013-01-06T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 17.0},
 {'date': '2013-01-07T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,X,',
  'value': 18.6},
 {'date': '2013-01-08T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022

In [12]:
# Verify there is a record for each day as there are 365 days in a year
len(avg_OGG_Wind_2013_dict['results'])

365

In [13]:
# look at the first and last day
print(avg_OGG_Wind_2013_dict['results'][0])
print(avg_OGG_Wind_2013_dict['results'][364])

{'date': '2013-01-01T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 13.9}
{'date': '2013-12-31T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,W,', 'value': 8.3}


In [14]:
#Check the type to ensure it is a dictionary
type(avg_OGG_Wind_2013_dict)

dict

In [15]:
#Chec the 'results' value type
type(avg_OGG_Wind_2013_dict['results'])

list

In [16]:
#Let's pretty print the dictionary for no reason...just to try it.
from pprint import pprint

In [17]:
def printplus(obj):
    """
    Pretty-prints the object passed in.

    """
    # Dict
    if isinstance(obj, dict):
        for k, v in sorted(obj.items()):
            print (u'{0}: {1}'.format(k, v))

    # List or tuple            
    elif isinstance(obj, list) or isinstance(obj, tuple):
        for x in obj:
            print (x)

    # Other
    else:
        print (obj)

In [18]:
#Now we pretty print the dictionary.
printplus(avg_OGG_Wind_2013_dict)

metadata: {'resultset': {'offset': 1, 'count': 365, 'limit': 1000}}
results: [{'date': '2013-01-01T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 13.9}, {'date': '2013-01-02T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 15.9}, {'date': '2013-01-03T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 15.2}, {'date': '2013-01-04T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 19.5}, {'date': '2013-01-05T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 20.1}, {'date': '2013-01-06T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 17.0}, {'date': '2013-01-07T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,X,', 'value': 18.6}, {'date': '2013-01-08T00:00:00', 'datatype': 'AWND', 'station': 'GHCN

Well...that wasn't very pretty...  :\    , but again, we did that just because we wanted to and for no real reason. <br>
Moving on....<br>
<br>
Let's save the first data set on Average Wind Speed (AWND) for Kahului Airport in Maui Hawaii for 2013 to a dataframe for better representation of the data into columns and rows.

In [19]:
#Import pandas and save AWND results data into a dataframe.
import pandas as pd
data = avg_OGG_Wind_2013_dict

AWND_df = pd.DataFrame(data['results'])

In [20]:
#Look at head of dataframe
AWND_df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2013-01-01T00:00:00,AWND,GHCND:USW00022516,",,X,",13.9
1,2013-01-02T00:00:00,AWND,GHCND:USW00022516,",,X,",15.9
2,2013-01-03T00:00:00,AWND,GHCND:USW00022516,",,X,",15.2
3,2013-01-04T00:00:00,AWND,GHCND:USW00022516,",,X,",19.5
4,2013-01-05T00:00:00,AWND,GHCND:USW00022516,",,X,",20.1


In [21]:
#Look at tail of dataframe
AWND_df.tail()

Unnamed: 0,date,datatype,station,attributes,value
360,2013-12-27T00:00:00,AWND,GHCND:USW00022516,",,W,",11.0
361,2013-12-28T00:00:00,AWND,GHCND:USW00022516,",,W,",10.1
362,2013-12-29T00:00:00,AWND,GHCND:USW00022516,",,W,",14.3
363,2013-12-30T00:00:00,AWND,GHCND:USW00022516,",,W,",8.7
364,2013-12-31T00:00:00,AWND,GHCND:USW00022516,",,W,",8.3


In [22]:
#Look at dataframe info
AWND_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 5 columns):
date          365 non-null object
datatype      365 non-null object
station       365 non-null object
attributes    365 non-null object
value         365 non-null float64
dtypes: float64(1), object(4)
memory usage: 14.4+ KB


In [23]:
#Find mean or avereage wind speed for entire 2013 dataframe.
AWND_df.mean()

value    11.447397
dtype: float64

RMEMBER: The NOAA API will only allow you to pull one year of data at a time.<br>
So, let's pull the remaing 2 years of data from 2014 and 2015 for Kahului Airport in Hawaii from the NOAA API.

In [24]:
#Get second set of data for 2014 and set parrameters
data2 = {}
data2 = {'limit':'1000', 'datasetid': network, 'stationid': station_id}

# append additional parameters to data dictionary
data2.update({'datatypeid': 'AWND'})
data2.update({'startdate': '2014-01-01'})
data2.update({'enddate': '2014-12-31'})
data2.update({'units':'standard'})
data2

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USW00022516',
 'datatypeid': 'AWND',
 'startdate': '2014-01-01',
 'enddate': '2014-12-31',
 'units': 'standard'}

In [25]:
#Request year of data and make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data2, headers = {'token':my_token})

#load the api response as a json
avg_OGG_Wind_2014_dict = json.loads(r.text)

In [26]:
# look at the first record of our data
avg_OGG_Wind_2014_dict['results'][:10]

[{'date': '2014-01-01T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 3.8},
 {'date': '2014-01-02T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 15.2},
 {'date': '2014-01-03T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 11.2},
 {'date': '2014-01-04T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 11.2},
 {'date': '2014-01-05T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 9.6},
 {'date': '2014-01-06T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 12.5},
 {'date': '2014-01-07T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 15.4},
 {'date': '2014-01-08T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW0002251

In [27]:
# Verify there is a record for each day as there are 365 days in a year
len(avg_OGG_Wind_2014_dict['results'])

365

In [28]:
# look at the first and last day
print(avg_OGG_Wind_2014_dict['results'][0])
print(avg_OGG_Wind_2014_dict['results'][364])

{'date': '2014-01-01T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,W,', 'value': 3.8}
{'date': '2014-12-31T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,W,', 'value': 8.3}


Now, let's get the remaining data for 2015 to meet our <u>three years</u> of data deliverables.<br>

In [29]:
#Get data and set parrameters
data3 = {}
data3 = {'limit':'1000', 'datasetid': network, 'stationid': station_id}

# append additional parameters to data dictionary
data3.update({'datatypeid': 'AWND'})
data3.update({'startdate': '2015-01-01'})
data3.update({'enddate': '2015-12-31'})
data3.update({'units':'standard'})
data3

{'limit': '1000',
 'datasetid': 'GHCND',
 'stationid': 'GHCND:USW00022516',
 'datatypeid': 'AWND',
 'startdate': '2015-01-01',
 'enddate': '2015-12-31',
 'units': 'standard'}

In [30]:
#Request year of data and make the request to get our year of data
r = requests.get('https://www.ncdc.noaa.gov/cdo-web/api/v2/data',params = data3, headers = {'token':my_token})

#load the api response as a json
avg_OGG_Wind_2015_dict = json.loads(r.text)

In [31]:
# look at the first record of our data
avg_OGG_Wind_2015_dict['results'][:10]

[{'date': '2015-01-01T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 7.4},
 {'date': '2015-01-02T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 16.6},
 {'date': '2015-01-03T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 11.0},
 {'date': '2015-01-04T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 7.6},
 {'date': '2015-01-05T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 6.5},
 {'date': '2015-01-06T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 13.9},
 {'date': '2015-01-07T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516',
  'attributes': ',,W,',
  'value': 12.3},
 {'date': '2015-01-08T00:00:00',
  'datatype': 'AWND',
  'station': 'GHCND:USW00022516

In [32]:
# Verify there is a record for each day as there are 365 days in a year
len(avg_OGG_Wind_2015_dict['results'])

365

In [33]:
# look at the first and last day
print(avg_OGG_Wind_2015_dict['results'][0])
print(avg_OGG_Wind_2015_dict['results'][364])

{'date': '2015-01-01T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,W,', 'value': 7.4}
{'date': '2015-12-31T00:00:00', 'datatype': 'AWND', 'station': 'GHCND:USW00022516', 'attributes': ',,W,', 'value': 7.6}


Let's create a dataframe for the AWND 2014 data for Kahului Airport.

In [34]:
#Create dataframe of 2014 AWND results
data2 = avg_OGG_Wind_2014_dict

AWND_df2 = pd.DataFrame(data['results'])

In [35]:
#Look at 2014 AWND_df2 info
AWND_df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 5 columns):
date          365 non-null object
datatype      365 non-null object
station       365 non-null object
attributes    365 non-null object
value         365 non-null float64
dtypes: float64(1), object(4)
memory usage: 14.4+ KB


In [36]:
#Find mean or avereage wind speed for entire 2014 dataframe.
AWND_df2.mean()

value    11.447397
dtype: float64

Now we will create a data frame for the AWND 2015 data for Kahului Airport.

In [37]:
#Create dataframe of 2015 AWND results
data3 = avg_OGG_Wind_2015_dict

AWND_df3 = pd.DataFrame(data['results'])

In [38]:
#Look at 2015 AWND_df3 info
AWND_df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 365 entries, 0 to 364
Data columns (total 5 columns):
date          365 non-null object
datatype      365 non-null object
station       365 non-null object
attributes    365 non-null object
value         365 non-null float64
dtypes: float64(1), object(4)
memory usage: 14.4+ KB


In [39]:
#Find mean or avereage wind speed for entire 2015 dataframe.
AWND_df3.mean()

value    11.447397
dtype: float64

Now we will append the AWND 2014 data frame 2 and AWND 2015 data frame 3 into the first AWND_df 2013 into a final AWND dataframe with all three years of data.

In [40]:
#Append dataframe 2 (2014) with original dataframe (2013)
AWND2_df = AWND_df.append(AWND_df2, ignore_index=True)

In [41]:
#Append dataframe 3 (2015) with new appended dataframe (2013 and 2014)
allAWND_df = AWND2_df.append(AWND_df3, ignore_index=True)

In [42]:
#Check the length of the dataframe (rows)
len(allAWND_df)

1095

In [43]:
#Look at complete dataframe head
allAWND_df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2013-01-01T00:00:00,AWND,GHCND:USW00022516,",,X,",13.9
1,2013-01-02T00:00:00,AWND,GHCND:USW00022516,",,X,",15.9
2,2013-01-03T00:00:00,AWND,GHCND:USW00022516,",,X,",15.2
3,2013-01-04T00:00:00,AWND,GHCND:USW00022516,",,X,",19.5
4,2013-01-05T00:00:00,AWND,GHCND:USW00022516,",,X,",20.1


In [44]:
#Look at complete dataframe tail
allAWND_df.tail()

Unnamed: 0,date,datatype,station,attributes,value
1090,2013-12-27T00:00:00,AWND,GHCND:USW00022516,",,W,",11.0
1091,2013-12-28T00:00:00,AWND,GHCND:USW00022516,",,W,",10.1
1092,2013-12-29T00:00:00,AWND,GHCND:USW00022516,",,W,",14.3
1093,2013-12-30T00:00:00,AWND,GHCND:USW00022516,",,W,",8.7
1094,2013-12-31T00:00:00,AWND,GHCND:USW00022516,",,W,",8.3


In [45]:
#Look at complete dataframe info
allAWND_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1095 entries, 0 to 1094
Data columns (total 5 columns):
date          1095 non-null object
datatype      1095 non-null object
station       1095 non-null object
attributes    1095 non-null object
value         1095 non-null float64
dtypes: float64(1), object(4)
memory usage: 42.9+ KB


In [46]:
#Now look at the mean/avereage of the value for the complete dataframe.
allAWND_df.mean()

value    11.447397
dtype: float64

Interesting...the mean Average Wind Speed (AWND) for 2013, 2014, and 2015 was the same for each dataframe year and for the completed dataframe with all three years.... <br>
Guess the average wind speed in Kahului, Maui, Hawaii doesn't change much from year to year....

In [47]:
#Let's get a count on null values in the entire AWND data frame.
import numpy as np
print(np.count_nonzero(allAWND_df.isnull()))

0


Okay, <b>great!</b> No Null values to deal with, so yay!<br> 
Let's correct the attributes column with the comma(s) (,,X,) in the data set output.<br>
We will remove those unneeded commas from the data before we finalize the final three years of data.

In [48]:
#Remove comma from attibutes column of data
allAWND_df['attributes'] = allAWND_df['attributes'].str.replace(',', '')
allAWND_df.head()

Unnamed: 0,date,datatype,station,attributes,value
0,2013-01-01T00:00:00,AWND,GHCND:USW00022516,X,13.9
1,2013-01-02T00:00:00,AWND,GHCND:USW00022516,X,15.9
2,2013-01-03T00:00:00,AWND,GHCND:USW00022516,X,15.2
3,2013-01-04T00:00:00,AWND,GHCND:USW00022516,X,19.5
4,2013-01-05T00:00:00,AWND,GHCND:USW00022516,X,20.1


In [49]:
#Look at the data frame tail again to confirm the attributes column.
allAWND_df.tail()

Unnamed: 0,date,datatype,station,attributes,value
1090,2013-12-27T00:00:00,AWND,GHCND:USW00022516,W,11.0
1091,2013-12-28T00:00:00,AWND,GHCND:USW00022516,W,10.1
1092,2013-12-29T00:00:00,AWND,GHCND:USW00022516,W,14.3
1093,2013-12-30T00:00:00,AWND,GHCND:USW00022516,W,8.7
1094,2013-12-31T00:00:00,AWND,GHCND:USW00022516,W,8.3


In [50]:
#Let's look at a pandas summary of the AWND data frame
from pandas_summary import DataFrameSummary

In [51]:
#Create a dfs variable ro output the pandas_summary data
AWND_dfs = DataFrameSummary(allAWND_df)

In [52]:
#Look at column data values based on pandas_summary
AWND_dfs.columns_stats

Unnamed: 0,date,datatype,station,attributes,value
counts,1095,1095,1095,1095,1095
uniques,365,1,1,2,70
missing,0,0,0,0,0
missing_perc,0%,0%,0%,0%,0%
types,categorical,constant,constant,bool,numeric


In [54]:
#This saves the dataframe to a csv file.
allAWND_df.to_csv('2013-2015_OGG_AWND.csv', index=False)

<b>Conclusion and Final Thoughts</b><br>
My initial intent was to look back at a time when my husband and I celebrated our ten year marriage anniversary in Maui. I recall we were there in 2012, but I wanted to see the weather in 2013 to 2015. More importantly because my first son was born in 2013 and my second son was born in 2015. Although the average temperature was not a data type that I could select. The percipitation data type to look at the rain variables durring 2013 to 2015 was also a data type that I wasn't allowed to select. Because the assigment and FTE dialogue of data already examined those data types.<br>
<br>
Therefore, I resulted in looking at the Average Wind Speed (AWND). I figured that with the varying values it could be interesting to look at the wind in Maui...<br>
But as with all lovely island temperatures and almost always consistent weather patterns, Maui, Hawaii proved to have typical average wind speed for all three years of 2013 to 2015. So although it may have resulted to no differeing patterns of average wind speed, I learned a lot. I also found out some intereating ways dealing and working with API's availabel online. That could be of furhter interest in the future. Until next time, thanks for  your patience!