# Class activity: Application of descriptive statistics using Python

Download the data file named *USW00014735_temp_1950-2020_daily.csv* from the 
[Github repository](https://github.com/oet808/ATMENV315/tree/master/data). Upload it to your local data directory on the Jupyter Hub. Make sure the file name is correct and ends with '.csv'

This text file is in CSV format and can be loaded with the numpy function _np.loadtext()_. 
It contains temperature data from Albany Airport: daily average, minimum, and maximum temperature data from 1950 to 2020. The data are stored in a 2-dimensional numpy array. 

In [None]:
# insert code to import the required packages numpy and pyplot and assign the names np and plt to the packages.

import numpy as np
import matplotlib.pyplot as plt

# new: you can change the style of the plots by choosing from 
# the matplotlib styles. More help can be found through a quick google search
import matplotlib as mpl
mpl.style.use('ggplot') #'classic' 



local_path='../data/'
filename=local_path+'USW00014735_temp_1950-2020_daily.csv'

dataset=np.loadtxt(filename,delimiter=',',skiprows=1)

year  = dataset[:,1] # extract the column 2 (Python index 1) 
month = dataset[:,2] 
day   = dataset[:,3]
avgt  = dataset[:,4]
mint  = dataset[:,5] 
maxt  = dataset[:,6]


In [None]:
print ("some info on the data arrays shape and size")
print(type(year))
print(len(year))
print(year.shape)

<p style="color:gold;background-color:purple;font-size:130%">
    <BR>Task 1: Working with Albany Airport daily temperature data<BR><BR>
</p>

### 1.1 First check that the daily average temperatures are in the expected temperature range by plotting the full time series 
### 1.2 Create a histogram of the daily Albany temperature data using all March 21st days from 1950 to 2020

<p style="color:black;background-color:lightgreen;font-size:130%">
    <BR>
    Tip:  You have to subsample the temperature data first! 
    <BR>
    <BR>
</p>

Create a new empty list and then create a loop over all data values in the data set. If the month value is the correct value (3 for March) and if the day value is the right day (21) then append the temperature value to your list. Else, pass the appending statement and repeat your loop.

Then you can convert the list it into a numpy array (see tip below) and plot the histogram.
The function is called _plt.hist()_

NOTE: Students who feel comfortable with the numpy Boolean operations are encouraged to use the numpy methods (e.g. _np.logical_and()_)to select the data from the arrays.

In [None]:
# remember x and y are lists with the all dates and all temperature, respectively 
i=0
index=[]
ylist=[] # use to store the temperatures

# fill in your code 

# converting a list to numpy array to use in the following code cells:
# y1=np.array(ylist)

In [None]:
# histogram plot


### 1.3. For the same data create a box-whisker plot


In [1]:
# boxplot

In [None]:
# You can create also horizontal Box-whisker plots
# With some more effort you can also place several box-whisker 
# next to each other: The trick is to form a list with two or more 
# numpy arrays.

# For more options (e.g. how to fill the IQR box with color)
# see https://matplotlib.org/gallery/statistics/boxplot_color.html



# Make sure you have your data array in variable y1 for this example!
y2=1.5*y1-5
plt.boxplot([y1,y2],vert=False)
plt.yticks([1,2],["sample 1","sample 2"])
plt.title("Boxplot example illustration: compare two (or more) data sets)")
plt.show()

<p style="color:gold;background-color:purple;font-size:130%">
    <BR>Task 2: Quantitative statistics with numpy<BR><BR>

</p>

### 2.1 Now copy your code into the cell below and modify your code: Select all daily data from March months 1950-2020 and apply the numpy functions:


- _np.mean()_ 
- _np.std()_
- _np.min()_
- _np.max()_
        
Note, that these functions don't like it when np.nan values are in your data arrays.
You can use functions called _np.nanmean()_ instead.

<p style="color:black;background-color:lightgreen;font-size:130%">
    <BR>
    Tip: When doing calculations with numpy functions, convert your list into a numpy array first.
    <BR>
    <BR>
</p>
        
Assign the array to a new variable: e.g. if *ylist* is your list use
     
     y=np.array(ylist)

Keep track of the data and create a summary table in your Notebook.



In [2]:
i=0
index=[] 
ylist=[] # use to store the temperatures

unit="F"

# Adjust the print functions to print the numerical results!
print("Summary statistics for the temperature sample:")
print(50*"-")
print("mean    ")
print("stddev  ")
print("min     ")
print("max     ")
print(50*"-")


Summary statistics for the temperature sample:
--------------------------------------------------
mean    
stddev  
min     
max     
--------------------------------------------------


### Optional: You can use markdown cells to create tables with data.

Table 1: Summary statistic average daily temperature for all March temperatures 1950-2020 in Fahrenheit

| Month  | Mean |  Std |  Min  |  Max  | 
|--------|------|------|-------|-------|
| Mar    | xx   | xx   |  xx   |  xx |



<p style="color:gold;background-color:purple;font-size:130%">
    <BR>Task 3: Apply your Python skills and create a climatology plot<BR><BR>
        
The figure below has more information in it than we want. Let's make a cleaner version just for temperatures.
        
Think about a way to produce a summary graph for the Albany temperature (min, max and daily mean temperatures) climatology **using the years 1981-2010**.

Tip: Use if-statements to check if the year value is in the range 1981-2010. Use the month value to branch the code and create a list that contains all daily average temperature values for one calendar month (e.g. January).  Then average the temperatures to obtain a monthly mean climatology. Repeat for all 12 months and save the monthly average vales in a numpy array. 



In [None]:
#To give you an example graph:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "http://www.albany-ny.climatemps.com/albany-ny-climate-graph.gif", width=600)

In [None]:
# another loop is needed 
# arrays to store the results
monthlist=[1,2,3,4,5,6,7,8,9,10,11,12]
avgtmean=np.zeros(12)
for m in monthlist:
    # insert code for monthly average calculation


In [None]:
# example code for creating a climatology plot
# 
#plt.plot(monthlist,avgtmean,color='black',label='daily mean')
#plt.title("Albany monthly mean climatology 1981-2010")
#plt.ylabel("temperature in F")
#plt.xlabel("month")
#plt.xticks([1,3,5,7,9,11], ['Jan','Mar','May','Jul','Sep','Nov'])
#plt.legend()
#plt.show()

---


## Optional Tasks:


 - Extract minimum temperature values for all days within the -3 to +3 days before and after the specific date (that is March 18,19,20,21,22,23,24)
 - sort all these values using _np.sort()_. 
 - How likely is it to have temperatures higher than 70F around this time of the year? Use only the years 1981-2010 for this exercise.
 - Then, repeat the procedure using years 1951-1980. What can be inferred from the analysis?
 
- Inform yourself about the function _np.argmax()_ or _np.argmin()_. Can you print the date of the lowest and highest recorded minimim maximum daytime temperatures?

 
    