# Getting archived weather data for US station data 

This script defines a function that downloads daily weather data from weather stations from the Global Historical Climate Network - Daily(GHCND).
The data server that allows us a convenient access to the daily data is located at the Applied Climate Information System (ACIS). ACIS is developed, maintained, and operated by the NOAA Regional Climate Centers (RCCs). [http://data.rcc-acis.org](http://data.rcc-acis.org)

Here we demonstrate how we send a data request to the server.
The server responds to our request and sends the data back in a format that we need to 'unpack' put our data into numpy arrays. Fortunately, all that has been solved already and code exists that we can simply reuse here.

In the code below we have selected the station Albany Airport (KALB). The GHCN station idenifier is 

__'USW00014735'__

We can get for example: 
- avgt: daily average temperature (F)
- mint: daily minimum temperature (F)
- maxt: daily maximum temperature (F)

Try it yourself in a browser window use the http-link with the data query string appended:

[http://data.rcc-acis.org/StnData?sid=USW00014735&&sdate=2018-01-1&&edate=2018-12-31&interval=dly&elems=avgt](http://data.rcc-acis.org/StnData?sid=USW00014735&&sdate=2018-01-1&&edate=2018-12-31&interval=dly&elems=avgt)

This will return daily mean temperatures from Albany Airport for the year 2018.


## Code development

__You may wonder: "How can I do such a thing with Python?"__

__The answer lies in the package urllib3! This package is allowing us to send and receive data via the http-protocol directly and without a web-browser!__

In addition, people have developed a package that helps to deal with the JSON text data format that the server side uses to return the data. JSON is becoming more and more used for data exchange via http/ internet.



In [1]:
# request a station time series
# from Applied Climate Information System
# http://www.rcc-acis.org/index.html
# Author: OET
# code designed for ATM315/ENV315 Python introduction


In [None]:
import numpy as np
import urllib3
import json
import datetime as dt
#########################################################################################################
# defining a function to allow us to make more than just one specific data request
# it allows flexibilty in terms of station, variable, start and end year
# It first creates the http-string and uses the urllib3 functions to transmit the request to the server
# Then it receives the data in the JSON text format and converts it into a nested list object
# That's where we had to put some work in to extract the dates and data values and put them into lists
# two lists are returned, one with the dates (NEW object type 'datetime'!) and teh numerical values.
#########################################################################################################
def get_stationdata(sid,var='avgt',startyear=2017,endyear=2017):
    """Sends request to regional climate center ACIS and gets daily data for one station.
    Input parameters: 
        sid (string): a station id
        var (string): a variable name (e.g. 'avgt', 'mint', 'maxt')
    Optional parameters:
        startyear and endyear (integers): for selecting the year range e.g. 1950 and 2017
    
    Returned objects:
        list with dates (datetime objects)
        list with the data 
    """    
    # the http address of the data server
    host="http://data.rcc-acis.org/StnData"
    # forming the query string for the host server
    sdate='&sdate='+str(startyear)+'-01-1'
    edate='&edate='+str(endyear)+'-12-31'
    query='?sid='+sid+'&'+sdate+'&'+edate+'&interval=dly&'\
    +'elems='+var
    # try to connect and to get the requested data
    # in format ready to export to a csv file
    print (">send data request to "+host+query)
    print ("> still waiting for response ...")
    try:
        http= urllib3.PoolManager()
        response = http.request('GET',host+query)
        # convert json-string into dictionary
        content =  json.loads(response.data.decode('utf-8'))
        meta=content['meta']
        data=content['data']
        time=[]
        value=[]
        for item in data:
            #print (item)
            time.append(dt.datetime.strptime(item[0],"%Y-%m-%d"))
            if (item[1]!='M'):
                value.append(float(item[1]))
            else:
                value.append(np.NAN)
    except Exception as e:
        print ("error occurred:", e)
        return
    print(">... done")
    return time,value

In [None]:
print(help(get_stationdata))

In [None]:
x,avgt=get_stationdata("USW00014735",'avgt',startyear=1950,endyear=2018)
x,mint=get_stationdata("USW00014735",'mint',startyear=1950,endyear=2018)
x,maxt=get_stationdata("USW00014735",'maxt',startyear=1950,endyear=2018)

In [None]:
# write the data into a spreadsheet table (CSV year month day , avgt, mint,maxt format)
yr=[i.year for i in x]
month= [ i.month for i in x]
day= [ i.day for i in x]
index=np.arange(0,len(x),1)
export_data=np.zeros(shape=[len(avgt),6])
export_data[:,0]=int(yr)
export_data[:,1]=int(month)
export_data[:,2]=int(day)
export_data[:,3]=avgt
export_data[:,4]=mint
export_data[:,5]=maxt
np.savetxt("USW00014735_temp_1950-2018_daily.csv",export_data,delimiter=',')
import pandas as pd
df=pd.DataFrame(export_data, columns=['year','month','day','avgt','mint','maxt'])


In [None]:
df.to_csv("USW00014735_temp_1950-2018_daily.csv")

### Further References:
- [GHCND](https://www.ncdc.noaa.gov/ghcn-daily-description)
- FTP site with station ids etc: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/