## 3 Application of quantiative descriptive statistics 

### 3.1 Creating a 30-yr climatology (temperature climate graph)

We continue our work with the data file that contains daily temperature data observed at Albany Airport (KALB):

*USW00014735_temp_1950-2021_daily.csv*. 

Make sure you know where it is located in your folder system. You may have to update the local path string in the code cell (see variable _local_path_)

 


In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Import the new package Pandas
import pandas as pd


# Tip: You can change the style of the plots by choosing from 
# the matplotlib styles. 
# More help can be found through a quick google search
from matplotlib import style 
style.use('ggplot') #'classic' 


### 3.2 Reading the data and creating a dataframe object

Many Python coders use in their code a variable name such as _df_ when they work with a dataframe object. We apply the Pandas function _read_csv()_  to read the data table from the csv file and assign it to the variable df.

In [None]:
local_path='../data/'
filename=local_path+'USW00014735_temp_1950-2021_daily.csv'
df=pd.read_csv(filename,delimiter=',',skiprows=0)

In [None]:
df

<p style="color:gold;background-color:purple;font-size:130%">
    <BR>Task 3.3: Standard statistics for quantitative data samples<BR><BR>

</p>

#### 3.3.1 Select (subsample the data): climate norm period 1981-2010
        
In this exercise we make use of the dataframe df and the query function to get the data samples for the period 1981-2010. 
###### (Note: NOAA updated their climate norm period to 1991-2020. Both periods are okay to use here in this exercise.)

- Start working with the dataframe in variable _df_ and apply the query method to select the years 1981-2010 (_df.query()_) 
        
- Assign the result to a new variable (e.g. 'dfclim')

In [None]:
# get all the data from that fall into the 30 year climate norm period
dfclim=df.query("year >=1981 and year <= 2010")
dfclim

### 3.3.2 Select (subsample the data): month January
        
Remember that we have to use the numerical values 1.0 to search for the rows 
that belong to a day in month January.

- Start working with the dataframe in variable _dfclim_ and apply the query method to select month (_df.query()_) 
        
- Assign the result to a new variable (e.g. '_dfjan_' or '_dfhelp_')

### 3.3.3 Statistical calculations

We have extracted (subsampled) the data to the year range 1981-2010, and selected a single month (January). Now, calculate the following statistical values:

- mean
- median
- minimum
- maximum
- standard deviation
- variance


(See Reading assignment, there you'll find more information on the definition of these statistical properties)

We have several options how to do that. One option is that we go back to the functions that the numpy package provides to us. The advantage of working with the numpy functions is that you can later use the learned methods for all sorts of data.

The second option is that we apply Pandas methods. 


#### (A) The Numpy method

We can convert the data columns into numpy arrays. 
Once we have done that you can apply the following numpy functions:

- _np.mean()_ 
- _np.std()_
- _np.min()_
- _np.max()_
        
Note, that these functions don't do well when np.nan values are in your data arrays.You can use related functions named _np.nanmean()_  etc., instead. 



#### (B) Direct calculation using the Pandas dataframe method


#### Calculate the other statistical parameters and assign them to variables. Use print function calls to summarize the results.




### 3.4 Creating a climate graph for the climate norm period.

---
### Optional code solution (for advanced Python coders):

---
### References:

John Townend “[Practical Statistics for Environmental and Biological Scientists](https://www.wiley.com/en-us/Practical+Statistics+for+Environmental+and+Biological+Scientists-p-9780471496656)” (hereafter “PracStat”)

    Chapter 1: 1.1, 1.2, 1.5
    Chapter 2: 2.1-2.4
    Chapter 5: 5.1-5.7

E-book: [Collaborative Statistics](https://open.umn.edu/opentextbooks/textbooks/11) by Barbara Illowsky and Susan Dean

    Chapter 1: 1.1, 1.3, 1.4, 1.6
    Chapter 2: 2.1-2.9

---
### Appendix:
    

In [None]:
#To give you an example graph:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "http://www.albany-ny.climatemps.com/albany-ny-climate-graph.gif", width=600)

---
