## Fetch Weather Data: API Exploration
### *This notebook fetches weather data from two sources: NOAA and Open Weather*

#### NOAA API
The documentation for the web API for NOAA climate data can be found [here](https://www.ncdc.noaa.gov/cdo-web/webservices/v2).

In [1]:
import requests
import json
import pandas as pd
headers = {"token": "xVEIkLnfHyheHhvoheZSxesUerlyrxGN"}

#### All Available Datasets

In [3]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets"
response = requests.get(url=url, headers=headers)
noaa_all_datasets_json = response.json()
print(noaa_all_datasets_json["metadata"])
noaa_all_datasets_df = pd.DataFrame(noaa_all_datasets_json['results'])
noaa_all_datasets_df

{'resultset': {'offset': 1, 'count': 11, 'limit': 25}}


Unnamed: 0,datacoverage,id,maxdate,mindate,name,uid
0,1.0,GHCND,2018-10-01,1763-01-01,Daily Summaries,gov.noaa.ncdc:C00861
1,1.0,GSOM,2018-08-01,1763-01-01,Global Summary of the Month,gov.noaa.ncdc:C00946
2,1.0,GSOY,2018-01-01,1763-01-01,Global Summary of the Year,gov.noaa.ncdc:C00947
3,0.95,NEXRAD2,2018-10-02,1991-06-05,Weather Radar (Level II),gov.noaa.ncdc:C00345
4,0.95,NEXRAD3,2018-09-30,1994-05-20,Weather Radar (Level III),gov.noaa.ncdc:C00708
5,1.0,NORMAL_ANN,2010-01-01,2010-01-01,Normals Annual/Seasonal,gov.noaa.ncdc:C00821
6,1.0,NORMAL_DLY,2010-12-31,2010-01-01,Normals Daily,gov.noaa.ncdc:C00823
7,1.0,NORMAL_HLY,2010-12-31,2010-01-01,Normals Hourly,gov.noaa.ncdc:C00824
8,1.0,NORMAL_MLY,2010-12-01,2010-01-01,Normals Monthly,gov.noaa.ncdc:C00822
9,0.25,PRECIP_15,2014-01-01,1970-05-12,Precipitation 15 Minute,gov.noaa.ncdc:C00505


#### Daily Summaries Dataset

* For our purposes, we will be working with daily summaries data.
* Fetch all information about the GHCND, Daily Summaries dataset specifically.

In [23]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets/GHCND"
response = requests.get(url=url, headers=headers)
noaa_daily_summaries_json = response.json()

In [26]:
print(noaa_daily_summaries_json)

{'mindate': '1763-01-01', 'maxdate': '2018-09-28', 'name': 'Daily Summaries', 'datacoverage': 1, 'id': 'GHCND'}


#### Datatype Filter

In [27]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets?datatypeid=TOBS"
response = requests.get(url=url, headers=headers)
noaa_tobs_json = response.json()

In [28]:
noaa_tobs_json

{'metadata': {'resultset': {'offset': 1, 'count': 1, 'limit': 25}},
 'results': [{'uid': 'gov.noaa.ncdc:C00861',
   'mindate': '1763-01-01',
   'maxdate': '2018-09-28',
   'name': 'Daily Summaries',
   'datacoverage': 1,
   'id': 'GHCND'}]}

#### Set of Stations

In [32]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datasets?stationid=COOP:310090&stationid=COOP:310184&stationid=COOP:310212"
response = requests.get(url=url, headers=headers)
noaa_stations_json = response.json()
print(noaa_stations_json)

Note: The above example is not working as there is no data available at the given list of stations at this moment.

#### Data Categories

In [40]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datacategories?limit=100"
response = requests.get(url=url, headers=headers)
noaa_data_categories_json = response.json()
print(noaa_data_categories_json["metadata"])
noaa_data_categories_df = pd.DataFrame(noaa_data_categories_json['results'])
print(noaa_data_categories_df.shape)
noaa_data_categories_df

{'resultset': {'offset': 1, 'count': 42, 'limit': 100}}
(42, 2)


Unnamed: 0,id,name
0,ANNAGR,Annual Agricultural
1,ANNDD,Annual Degree Days
2,ANNPRCP,Annual Precipitation
3,ANNTEMP,Annual Temperature
4,AUAGR,Autumn Agricultural
5,AUDD,Autumn Degree Days
6,AUPRCP,Autumn Precipitation
7,AUTEMP,Autumn Temperature
8,COMP,Computed
9,COMPAGR,Computed Agricultural


#### WIND Datacategory

* We will be taking the WIND data category for our analysis in the preliminary phase.
* Fetch all information about the Wind dataset specifically

In [41]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datacategories/WIND"
response = requests.get(url=url, headers=headers)
noaa_wind_json = response.json()

In [43]:
print(noaa_wind_json)

{'name': 'Wind', 'id': 'WIND'}


#### Datatypes

In [44]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes"
response = requests.get(url=url, headers=headers)
noaa_data_types_json = response.json()
print(noaa_data_types_json["metadata"])
noaa_data_types_df = pd.DataFrame(noaa_data_types_json['results'])
print(noaa_data_types_df.shape)
noaa_data_types_df

{'resultset': {'offset': 1, 'count': 1527, 'limit': 25}}
(25, 5)


Unnamed: 0,datacoverage,id,maxdate,mindate,name
0,1.0,ACMC,1996-05-28,1994-03-19,Average cloudiness midnight to midnight from 3...
1,1.0,ACMH,2005-12-31,1965-01-01,Average cloudiness midnight to midnight from m...
2,1.0,ACSC,1996-05-28,1994-02-01,Average cloudiness sunrise to sunset from 30-s...
3,1.0,ACSH,2005-12-31,1965-01-01,Average cloudiness sunrise to sunset from manu...
4,0.95,ALL,2018-10-02,1991-06-05,Base Data
5,1.0,ANN-CLDD-BASE45,2010-01-01,2010-01-01,Long-term averages of annual cooling degree da...
6,1.0,ANN-CLDD-BASE50,2010-01-01,2010-01-01,Long-term averages of annual cooling degree da...
7,1.0,ANN-CLDD-BASE55,2010-01-01,2010-01-01,Long-term averages of annual cooling degree da...
8,1.0,ANN-CLDD-BASE57,2010-01-01,2010-01-01,Long-term averages of annual cooling degree da...
9,1.0,ANN-CLDD-BASE60,2010-01-01,2010-01-01,Long-term averages of annual cooling degree da...


#### Datatype in Wind Category

In [59]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/datatypes?datacategoryid=WIND&limit=56"
response = requests.get(url=url, headers=headers)
noaa_wind_data_types_json = response.json()
print(noaa_wind_data_types_json["metadata"])
noaa_wind_data_types_df = pd.DataFrame(noaa_wind_data_types_json['results'])
print(noaa_wind_data_types_df.shape)
noaa_wind_data_types_df

{'resultset': {'offset': 1, 'count': 27, 'limit': 56}}
(27, 5)


Unnamed: 0,datacoverage,id,maxdate,mindate,name
0,1,AWND,2018-09-27,1982-01-01,Average wind speed
1,1,DAWM,2010-06-21,1935-09-23,Number of days included in the multiday wind m...
2,1,FMTM,2013-03-31,1982-01-01,Time of fastest mile or fastest 1-minute wind
3,1,HLY-WIND-1STDIR,2010-12-31,2010-01-01,Prevailing wind direction (1-8)
4,1,HLY-WIND-1STPCT,2010-12-31,2010-01-01,Prevailing wind percentage
5,1,HLY-WIND-2NDDIR,2010-12-31,2010-01-01,Secondary wind direction (1-8)
6,1,HLY-WIND-2NDPCT,2010-12-31,2010-01-01,Secondary wind percentage
7,1,HLY-WIND-AVGSPD,2010-12-31,2010-01-01,Average wind speed
8,1,HLY-WIND-PCTCLM,2010-12-31,2010-01-01,Percentage calm
9,1,HLY-WIND-VCTDIR,2010-12-31,2010-01-01,Mean wind vector direction


#### Locations

In [60]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/locationcategories"
response = requests.get(url=url, headers=headers)
noaa_location_categories_json = response.json()
print(noaa_location_categories_json["metadata"])
noaa_location_categories_df = pd.DataFrame(noaa_location_categories_json['results'])
print(noaa_location_categories_df.shape)
noaa_location_categories_df

{'resultset': {'offset': 1, 'count': 12, 'limit': 25}}
(12, 2)


Unnamed: 0,id,name
0,CITY,City
1,CLIM_DIV,Climate Division
2,CLIM_REG,Climate Region
3,CNTRY,Country
4,CNTY,County
5,HYD_ACC,Hydrologic Accounting Unit
6,HYD_CAT,Hydrologic Cataloging Unit
7,HYD_REG,Hydrologic Region
8,HYD_SUB,Hydrologic Subregion
9,ST,State


#### Location Category: County Level Information

In [62]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/locationcategories/CNTY"
response = requests.get(url=url, headers=headers)
noaa_cnty_json = response.json()

In [63]:
print(noaa_cnty_json)

{'name': 'County', 'id': 'CNTY'}


#### Available Locations for Daily Summaries Data

In [64]:
url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/locations?datasetid=GHCND"
response = requests.get(url=url, headers=headers)
noaa_daily_summaries_locations_json = response.json()
print(noaa_daily_summaries_locations_json["metadata"])
noaa_daily_summaries_locations_df = pd.DataFrame(noaa_daily_summaries_locations_json['results'])
print(noaa_daily_summaries_locations_df.shape)
noaa_daily_summaries_locations_df

{'resultset': {'offset': 1, 'count': 28304, 'limit': 25}}
(25, 5)


Unnamed: 0,datacoverage,id,maxdate,mindate,name
0,0.9977,CITY:AE000001,2018-09-27,1983-01-02,"Abu Dhabi, AE"
1,0.9992,CITY:AE000002,2018-09-27,1944-03-20,"Ajman, AE"
2,0.9992,CITY:AE000003,2018-09-27,1944-03-20,"Dubai, AE"
3,0.9992,CITY:AE000006,2018-09-27,1944-03-20,"Sharjah, AE"
4,0.5542,CITY:AF000007,2018-09-27,1966-03-02,"Kabul, AF"
5,0.3774,CITY:AF000008,2018-09-27,1973-01-02,"Kandahar, AF"
6,1.0,CITY:AG000001,2018-09-27,1877-04-07,"Algiers, AG"
7,0.9252,CITY:AG000002,2018-09-27,1909-11-23,"Annaba, AG"
8,0.8654,CITY:AG000003,2018-09-27,1973-04-03,"Batna, AG"
9,1.0,CITY:AG000004,2018-09-27,1957-01-09,"Bechar, AG"


#### Getting Stations for General Electric Project

In [88]:
import numpy as np

In [90]:
def get_weather_stations(lat_center, long_center, square_diagonal, top_n = 5):
    base_url = "https://www.ncdc.noaa.gov/cdo-web/api/v2/stations?limit=1000&extent="
    epsilon = round(square_diagonal/np.sqrt(2)/2,4)
    a = [long_center+epsilon, lat_center+epsilon]
    b = [long_center+epsilon, lat_center-epsilon]
    c = [long_center-epsilon, lat_center-epsilon]
    d = [long_center-epsilon, lat_center+epsilon]
    w = (long_center+epsilon, lat_center+epsilon)
    x = (long_center+epsilon, lat_center-epsilon)
    y = (long_center-epsilon, lat_center-epsilon)
    z = (long_center-epsilon, lat_center+epsilon)
    url_a = round(lat_center-epsilon,4)
    url_b = round(long_center-epsilon,4)
    url_c = round(lat_center+epsilon,4)
    url_d = round(long_center+epsilon,4)
    url = base_url + str(url_a) + "," + str(url_b) + "," + str(url_c) + "," + str(url_d)
    response = requests.get(url=url, headers=headers)
    all_stations = response.json()
    all_stations_results = pd.DataFrame(all_stations['results'])
    print('Summary of Request: ')
    print(all_stations['metadata'])
    print('- . - . - . -')
    n_stations = all_stations_results.shape[0]
    {"type": "Polygon", 
     "coordinates": [[
       a, b, c, d
     ]]}
    co = {"type": "Polygon", "coordinates": [
        [w, x, y, z]]}
    lon, lat = zip(*co['coordinates'][0])
    from pyproj import Proj
    pa = Proj("+proj=aea +lat_1=37.0 +lat_2=41.0 +lat_0=39.0 +lon_0=-106.55")
    x, y = pa(lon, lat)
    cop = {"type": "Polygon", "coordinates": [zip(x, y)]}
    from shapely.geometry import shape
    final_area = shape(cop).area  # 268952044107.43506 square meters
    print('Square Meters Area Queried: ')
    print(final_area)
    print('- . - . - . -')
    if n_stations >= top_n:
        print('Gathered Sufficient Stations')
    else:
        print('Pass Bigger Area Range')
        return
    return all_stations_results

In [91]:
lat_center

47.5204

In [92]:
long_center

-122.2047

In [95]:
#print(lat_center)
#print(long_center)
get_weather_stations(lat_center, long_center, square_diagonal = 0.2, top_n = 5)

Summary of Request: 
{'resultset': {'offset': 1, 'count': 14, 'limit': 25}}
- . - . - . -
Square Meters Area Queried: 
167419847.03502843
- . - . - . -
Gathered Sufficient Stations


Unnamed: 0,datacoverage,elevation,elevationUnit,id,latitude,longitude,maxdate,mindate,name
0,1.0,26.5,METERS,GHCND:US1WAKG0005,47.5859,-122.2509,2018-10-01,2008-06-01,"MERCER ISLAND 1.5 NW, WA US"
1,1.0,115.5,METERS,GHCND:US1WAKG0010,47.4814,-122.1641,2013-03-21,2008-06-01,"RENTON 1.5 E, WA US"
2,0.8732,199.6,METERS,GHCND:US1WAKG0016,47.5503,-122.1503,2016-09-18,2008-06-01,"EASTGATE 1.7 SSW, WA US"
3,0.9996,240.8,METERS,GHCND:US1WAKG0024,47.5604,-122.151,2017-09-03,2010-05-01,"EASTGATE 1.1 SW, WA US"
4,1.0,104.2,METERS,GHCND:US1WAKG0042,47.5211,-122.1613,2018-10-01,2008-06-01,"NEWPORT HILLS 1.9 SSE, WA US"
5,0.9998,199.9,METERS,GHCND:US1WAKG0049,47.5465,-122.1435,2009-11-22,2008-06-01,"NEWPORT HILLS 1.4 E, WA US"
6,0.9995,60.4,METERS,GHCND:US1WAKG0074,47.5081,-122.2413,2014-01-06,2008-07-01,"BRYN MAWR SKYWAY 0.9 N, WA US"
7,0.9892,61.6,METERS,GHCND:US1WAKG0077,47.5755,-122.2134,2018-09-25,2008-08-01,"MERCER ISLAND 0.9 ENE, WA US"
8,1.0,64.6,METERS,GHCND:US1WAKG0081,47.4752,-122.2019,2018-10-01,2008-09-01,"RENTON 0.5 SSW, WA US"
9,0.9737,85.6,METERS,GHCND:US1WAKG0136,47.5461,-122.2685,2018-09-24,2010-01-01,"SEATTLE 5.1 SE, WA US"


#### Query Stations for GE Projects

* This block fetches the center latitudes and longitudes of the projects for General Electric.

In [5]:
usgs_data = pd.read_csv('./uswtdbCSV/uswtdb_v1_1_20180710.csv')
usgs_ge_data = usgs_data[usgs_data["t_manu"] == "GE Wind"]
ge_projects_df = pd.pivot_table(usgs_ge_data, values=["xlong", "ylat"], columns="p_name", aggfunc="mean").transpose()
ge_projects_df.reset_index(inplace=True)
ge_projects_df.columns = ['p_name', 'center_long', 'center_lat']
ge_projects_df

Unnamed: 0,p_name,center_long,center_lat
0,6th Space Warning Squadron,-70.543552,41.753341
1,AFCEE MMR Turbines,-70.546550,41.758590
2,AG Land 1,-93.325691,42.206390
3,AG Land 2,-93.428093,42.146091
4,AG Land 3,-93.431992,42.145592
5,AG Land 4,-93.354897,41.904194
6,AG Land 5,-93.632095,42.335491
7,AG Land 6,-93.636795,42.335491
8,ARPA (Prower and Baca County) (Lamar),-102.650894,37.362293
9,Alta I,-118.371622,35.037530


In [29]:
usgs_data = pd.read_csv('./uswtdbCSV/uswtdb_v1_1_20180710.csv')
usgs_ge_data = usgs_data[usgs_data["t_manu"] == "GE Wind"]
ge_by_project_df = usgs_ge_data.groupby("p_name")[["xlong", "ylat"]].agg(["min", "max", "mean"])
ge_by_project_df.reset_index(inplace=True)
ge_by_project_df.columns = ["p_name", "long_min", "long_max", "long_mean", "lat_min", "lat_max", "lat_mean"]
ge_by_project_df["long_range"] = ge_by_project_df["long_max"] - ge_by_project_df["long_min"]
ge_by_project_df["lat_range"] = ge_by_project_df["lat_max"] - ge_by_project_df["lat_min"]
ge_by_project_df.sort_values(by="long_range", axis=0, ascending=False, inplace=True)
ge_by_project_df

Unnamed: 0,p_name,long_min,long_max,long_mean,lat_min,lat_max,lat_mean,long_range,lat_range
186,Klondike (Wasco),-149.432159,-120.545242,-122.249387,45.552494,60.126122,46.421268,28.886917,14.573628
202,Leaning Juniper (Arlington),-120.262535,-99.765419,-119.909113,37.887753,45.673595,45.538273,20.497116,7.785842
373,Thunder Ranch,-97.495476,-97.091446,-97.306297,36.527344,36.632317,36.572365,0.404030,0.104973
287,Peetz Table,-103.529190,-103.137192,-103.322228,40.896992,40.998993,40.961955,0.391998,0.102001
164,Horse Hollow,-100.327034,-99.958275,-100.173965,32.180569,32.305607,32.253115,0.368759,0.125038
39,Brady Wind I,-102.953804,-102.605217,-102.756389,46.632294,46.719784,46.656348,0.348587,0.087490
126,Flat Ridge 2,-98.425491,-98.078690,-98.252666,37.323296,37.410694,37.371157,0.346801,0.087398
303,Prairie Breeze,-98.236794,-97.929291,-98.110132,41.880394,41.981194,41.932171,0.307503,0.100800
280,Panhandle Wind 1,-101.452370,-101.151588,-101.293173,35.377693,35.456196,35.430909,0.300782,0.078503
323,Rush Springs Wind Energy Center,-97.948318,-97.670288,-97.804808,34.655949,34.779400,34.700151,0.278030,0.123451
