# Example calculation of 95% confidence intervals
## Monthly mean 30yr average temperature climatology

The statistical and programming objective is: 

- Calculate long-term mean for the average temperatures years during the years 1951-1980 and 1989-2018:
    For each month: January, February, March , ... November, December

- For each monthly climatological mean determine the 95% confidence interval using the equations described in the reading assignment

- Summarize the results in one or two figures that allow us to observe changes in the mean climatologies.

### Please refer to the textbooks for the calculations steps involved in estimating the 95% confidence interval for the population mean!
- Collaborative Statistics textbook Chapter 8.3 (p337)
- Practical Statistics Book Chapter 2 Box 2.2 pages 18-19.




## 1. Code development

### 1.1 Import packages and function definitions

Supporting function to send request to the server and get the data. This is a function similar to the previous version in which we downloaded GHCN daily temperature data from the ACIS server.
The function returns two lists that we assign to the 'standard variables' x and y.
x contains the list with dates (type datetime), y the list with temperatures (type float).

We also import here all necessary packages. For the confidence interval calculation we make use the SciPy package stats.


In [1]:
# request a station time series
# from Applied Climate Information System
# http://www.rcc-acis.org/index.html
# Author: OET
# code designed for ATM315/ENV315 Python introduction

import numpy as np
import matplotlib.pyplot as plt
import urllib3
import json
import datetime as dt
# for confidence interval calculation
from scipy import  stats


#########################################################################################################
# function to get the monthly data from the server
#########################################################################################################
def get_stationdata_monthly(sid,var='avgt',startyear=2017,endyear=2017):
    """Sends request to regional climate center ACIS and gets monthly data for one station.
    Input parameters: 
        sid (string): a station id
        var (string): a variable name (e.g. 'avgt', 'mint', 'maxt')
    Keyword parameters:
        startyear and endyear (integers): for selecting the year range e.g. 1950 and 2017
    
    Returned objects:
        list with dates (datetime objects)
        list with the data 
    """    
    # the http address of the data server
    host="http://data.rcc-acis.org/StnData"
    # forming the query string for the host server
    sdate='&sdate='+str(startyear)+'-01-1'
    edate='&edate='+str(endyear)+'-12-31'
    query='?sid='+sid+'&'+sdate+'&'+edate+'&interval=mly&'\
    +'elems='+"mly_mean_"+var
    # try to connect and to get the requested data
    # in format ready to export to a csv file
    print (">send data request to "+host+query)
    print ("station id:",sid)
    print ("year range: %4d - %4d" % (startyear,endyear))
    print ("> still waiting for response ...")
    try:
        http= urllib3.PoolManager()
        response = http.request('GET',host+query)
        # convert json-string into dictionary
        content =  json.loads(response.data.decode('utf-8'))
        meta=content['meta']
        data=content['data']
        time=[]
        value=[]
        for item in data:
            #print (item)
            time.append(dt.datetime.strptime(item[0],"%Y-%m"))
            if (item[1]!='M'):
                value.append(float(item[1]))
            else:
                value.append(np.NAN)
    except Exception as e:
        print ("error occurred:", e)
        return
    print(">... done")
    return time,value

### 1.2 Main section: Downloading a single station data set (monthly mean data)

In [2]:
station_id="USW00014735"
varname="avgt"
x,y=get_stationdata_monthly(station_id,varname,startyear=1900,endyear=2018)

>send data request to http://data.rcc-acis.org/StnData?sid=USW00014735&&sdate=1900-01-1&&edate=2018-12-31&interval=mly&elems=mly_mean_avgt
station id: USW00014735
year range: 1900 - 2018
> still waiting for response ...
>... done


### 1.3 Preparing the data for statistical calculations

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 1: Convert variable x from type list into variable x with function np.array 
<BR>
<BR>
</P>

Do the same with variable y. 
Plot the time series to validate that the data are okay: x is recognized as a numpy array of 'dates' (datetime objects), and y as array with float numbers.
 

In [None]:
# your code

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 2: Check what is the first year and month in the data?
<BR>
<BR>
</P>

In [None]:
# your code

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 3: Select the data from x and y for the two 30 climate period: 1951-1980 and 1989-2018
<BR>
<BR>
</P>
    
Assign the results to new variables and check with np.shape the dimensions and size of the data array. You should have 360 data left in the arrays. 

Tip: 
- Apply np.logical_and function and two Boolean array operations to the array x.
- For this make use of the *dt.datetime* function to define a reference date for comparison as shown in the code below:


In [None]:
xtest=np.array([dt.datetime(1981,7,17),dt.datetime(1990,11,30),dt.datetime(2011,12,6)])
testdate=dt.datetime(1990,12,31)
after_date=xtest>testdate
print (xtest[after_date])

In [None]:
# your code for 30 year subsampled data arrays

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 4: Reshape the array or find other ways to process the data now separated by months
<BR>
<BR>
</P>
    
Tip: 

- We have used np.reshape in class before to get a 1-dim array into 2-dim array.

- We have to remember the 'filling-rule'. Once we have the temperature data arranged by months (months will go into columns), 
- You can apply np.mean, np.std functions with the keyword parameter axis. See help(np.mean) for info. The axis specifies on which axis dimension you want to apply mean. For example in 2-dim array with 30 rows and 12 columns, *axis=0* takes the rows as samples and returns 12 averages. See example below.

- Alternatively, you can work with a loop (over month lists, the append methods, but this may involve more programming)


In [None]:
# example for np.mean() with keyword parameter axis
# first extract month from datetime objects and form a 30x12 data matrix
m=[d.month for d in x[0:360]]
m=np.array(m)
m=np.reshape(m,newshape=[30,12])
print(np.mean(m,axis=0))


In [None]:
# your code

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 5: Calculate the mean and standard deviation for each month
<BR>
<BR>
</P>
    


In [None]:
# your code

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 6: Finally, get the 95% confidence range using supporting function 
    for the t-distribution from SciPy package stats 
<BR>
<BR>
</P>

In [None]:
# example code
n=10
df=n-1
alpha=0.95 # for both samples
tint=np.array(stats.t.interval(alpha,df)) # function itself returns a lists
ystd=np.array([6,5,4,3,2,1,1,2,3,4,5,6])
cf=np.zeros(shape=[12,2])
for i in range(12):
    cf[i,:]=tint*ystd[i]/np.sqrt(n)
    print (i,cf[i,0],cf[i,1])
    i+=1

## 2. Results

<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 7: Present the climatologies in a graph or two that make it easy to see the differences between the two periods, and how much the confidence intervals overlap.
<BR>
<BR>
</P>
    
- Tip: Check the function plt.errorbar

In [None]:
# your code


<P style="background-color:purple;color:gold;font-size:130%">
<BR>
Task 8: Optional: Summary of the quantitavie results
<BR>
<BR>
</P>
    
Write all the results in form of a data table as shown below (illustrated with some random numbers). 


In [None]:
# optional code 

## 3 Summary and conclusion


### 3.1 Changes in the mean temperatures

### 3.2 Significance of the changes based on the analysis of the confidence intervals

### Further References:
- Scipy stats package: 
    - support for [normal distribution](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.norm.html#scipy.stats.norm)
    - support for [t distribution](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.t.html) (including confidence intervals)
- Matplotlib.pyplot 
    - [plt.errorbar examples](https://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html) 
- [GHCND](https://www.ncdc.noaa.gov/ghcn-daily-description)
- FTP site with station ids etc: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/