# Lab 2: Characterizing UBC Weather Station Climate

## *Lab Overview*

In this lab, you will learn various pre-processing and processing that can be applied to a climate 
time-series. We will use temperature data from the UBC weather station in order to characterize
the climate of the Vancouver area. You will plot the daily mean, monthly mean, and monthly 
mean anomaly temperature time-series, as well as the seasonal cycle. The seasonal cycle will 
be characterized by monthly means, standard deviations, and minimum and maximum values 
for the daily mean temperature.

You should have the data downloaded and ready to go from the pre-lab.

## *Learning Goals*

After completing this lab successfully, you will be able to:

- Inspect a data set to ensure the data has been imported correctly and identify some expected and unexpected signals
- Generate a monthly mean annual cycle from a larger time series, including the monthly standard deviation, and the minimum and maximum values
- reduce a daily time series to a monthly one by calculating monthly averages
- construct an anomaly time series

## *To hand in*

1. One figure containing plots of the daily air temperature, the monthly air temperature, and the monthly air temperature anomaly.
2. A figure showing the monthly means, one standard deviation, minimum and maximumof daily temperature.
3. Captions for each of the figures
4. The worksheet for this lab

## 1. Load the data

Look at the time series you downloaded during the pre-lab in a text editor: it should start 
sometime in the late 1950’s or early 1960’s and have data recorded once per day. If this is not 
the case, you likely downloaded the data incorrectly.

Load the time series using the pandas <code> .read_csv </code> function. Remember that you need to 
convert the date strings, which you initially loaded into Python into the ‘datetime’ format. The easiest way to check that your data has loaded correctly is to plot it.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

#loading data
#regex a bit different to deal with spacing
#using parse_dates to save as date time
data = pd.read_csv('DailyAirTempUBC19592018.txt', sep='\s+ ',header=None,parse_dates=[0],names=['date','temp'])

#checking data
print(data.head(0))
print(data)
#checking parsing worked for dates
print(data.dtypes)


#plotting data
plt.plot(data['date'],data['temp'])
plt.xlabel('Date')
plt.ylabel('Temperature (Celsius)')
plt.title('Daily Temperature at UBC from 1959-2017')
plt.show()

## 2. Monthly averages, standard deviations, minimums and maximums.

You will now generate monthly average of the air temperature for each of the 12 month over the 
entire period. We are asking you to do the following: 
**1) for each of the twelve months of the year, calculate the average of all the temperatures recorded during that month for the entire ~60 years for which you have data.**
**2) for all the records taken during that specific month, calculate the standard deviation of those values, as well as the minimum recorded value and maximum recorded value.**

What you are doing is collapsing fifty years worth of daily measurements into twelve data points,
one for each month. The standard deviation will give you a measure of the variability around 
each of those data points, and the min and max values will give you an upper and lower bound.

On a single figure, plot the following five curves characterizing the seasonal cycle: the average 
temperature, the average temperature minus – and plus – one standard deviation, the minimum 
temperature and the maximum temperature. Each curve should have 12 datapoints (one for 
each month). Include a legend with the function <code>legend()</code> to specify what you are plotting. In the
Lab 2 folder on Connect, you will find an example of what the plot might look like. The example 
figure uses soil temperature measurements at UBC from the years 2000 to 2011. Don’t forget a 
caption for your figure - even though we haven’t included one in the example plot.

When you are done, please hand in your final figure. ***Make sure you spend a few minutes 
thinking about your figure:*** does it look correct? Why? Do the result make sense given that 
the temperature data is recorded in Vancouver?

A few hints that will help you accomplish your task:
1. You will need to uses the <code>groupby()</code> function. This is where having our date colum saved as datetime datatype comes in handy. To get you started heres an example of how to calculate the average by month:

In [None]:
mean_by_month=data.groupby(data.date.dt.month)['temp'].mean()
print(mean_by_month)

For more examples on how to use the groupby functions, visit the Python documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html

2. Similar to the function above, make use of the <code> .std() </code>, <code> .min </code>, <code> .max </code> functions

extrapolte and polt the rest of the data below

In [None]:
std_by_month = data.groupby(data.date.dt.month)['temp'].std()
min_by_month = data.groupby(data.date.dt.month)['temp'].min()
max_by_month = data.groupby(data.date.dt.month)['temp'].max()

plt.plot(mean_by_month,'b-s',label="mean")
plt.plot(mean_by_month+std_by_month,'r--',label="mean+std")
plt.plot(mean_by_month-std_by_month,'r--',label="mean-std")
plt.plot(min_by_month,'k:',label="min")
plt.plot(max_by_month,'k:',label="max")
plt.title('UBC Totem Station Mean Monthly Air Temperature 1959-2016')
plt.ylabel('Air Temperature (\circC)')
plt.xlabel('Month of Year')
plt.legend()
plt.show()

Environment Canada publishes normal climate values for Vancouver international airport 
(VANCOUVER INT’L A) on their website; you can use this data to check if your seasonal 
values looks correct.
http://climate.weather.gc.ca/climate_normals/index_e.html

## 3. Create a monthly time-series

Daily time series are of course useful if one want to learn about weather variability, but 
characterizing climate generally involves a look at what happens at longer timescales, typically 
from seasonal timescale to hundreds of million years. We will thus calculate the average air 
temperature for each month of each year for which we have daily measurement available. The 
next step will be to calculate monthly temperature anomalies, i.e., calculate the difference 
between this monthly mean time-series and the monthly average, or normal, calculated across 
all years for which we have data in part 2.

To calculate the avrage temperature for each month of the year and save it in a new dataframe, we can use the <code> groupby() </code> method again with slightly different parameters:

In [None]:
lab2_data = pd.DataFrame(data.groupby(data.date.dt.to_period("M"))['temp'].mean())

#next two lines are essential for dealing with datatype issues and plotting.
lab2_data.index = lab2_data.index.astype(str)
lab2_data.index = pd.to_datetime(lab2_data.index)

print(lab2_data)

Again, the easiest way to check that your data has loaded correctly is to plot it. Can you think of
a way to verify that the plot you see now is correct? What are the main differences between the 
plot of the daily mean and the monthly mean temperature at UBC whether station? Based on 
these time series, would you say that there is any trend in surface temperature recorded at 
UBC? Try to discuss any feature you find surprising in this plot with your classmates and/or with 
me.

In [None]:
lab2_data.plot()

## 4. Generate anomaly time series

Temperature variations in the daily mean and monthly mean temperature time series generated 
in part 1 and 3 are largely dominated by the seasonal cycle, characterized in part 2. Thus, 
these two time-series do not tell much about climate variations They simply indicate that
temperature variations, in Vancouver, are dominated by the orbit of the Earth, which repeats 
every year in (almost) the same way. Therefore, to examine variability in the climate, we are 
now going to remove the seasonal cycle. A long time series of data which has the annual cycle 
removed is known as an anomaly time series, since it measures how anomalous a given month 
is compared to the average annual cycle.

To remove the mean annual cycle from a monthly time series, you have to first calculate the mean annual cycle -- you did this in part 2 using the groupby functions. Once you have the mean annual cycle, we want to subtract it from the monthly data to find the temperature anomaly by month.

The easiest way to accomplish this might be to create an annual cycle list as long as your monthly mean temperature time-series, in which the 12 month annual cycle repeats as many times as the number of years in the monthly mean time-series.

This can be done using a for loop, but first we need to turn our mean anuual cycle time series in to a list:

In [None]:
mean_by_month_list = mean_by_month.to_list()

Now that we have our values as a list we will want to lengthen the list to the size of our new dataframe, you will need a nested for loop to do so. Remember, we can add values to a list with the <code>.append()</code> function. Lengthen the list in the cell below:

In [None]:
length_monthly_mean_time_series=len(lab2_data['temp'])
years=length_monthly_mean_time_series//12
#lengthen list avg_month_list to be the length of lab2_data
for i in range(years):
    for j in range(12):
        mean_by_month_list.append(mean_by_month_list[j])

Once you have this annual cycle list, add it as a column to your lab2_data dataframe. Since this new column and the existing column are the same length, you can simply create a third colum of temperature anomalies by subtracting one by the other. Do so in the cell below:

In [None]:
#add monthly mean to data frame
lab2_data["monthly mean"]= mean_by_month_list[:709]

#add difference to data frame
lab2_data["diff"]=lab2_data["temp"]-lab2_data["monthly mean"]
print(lab2_data)


Once again, plot the monthly mean temperature anomaly time-series that you obtained and 
verify that the data looks correct. What are the main differences between the plot of the monthly 
mean temperature and the monthly mean temperature anomaly at UBC weather station? Based 
on the temperature anomaly time-series, is there any trend in monthly mean temperature 
anomalies recorded at UBC? Try to discuss any feature you find surprising in this plot with your 
classmates and/or with me.

In [None]:
plt.plot(lab2_data.index.values,lab2_data["monthly mean"])

## 5. Plot the data using subplot=True
During this lab, you generated 3 different temperature time series: the daily mean temperature, the monthly mean temperature, and the monthly mean temperature anomaly. You will now produce the second figure required for this lab by plotting each of the three time series in a separate subplot of one figure. Use the subplot=True parameter to do this. You should make a proper figure for submission, with appropriate axes labels, a title, easy to read font-size, etc... You will have to decide whether it is more logical to place the individual subplots side-by-side (horizontally) or one on top of the other (vertically). Remember to include a figure caption. 

explore the pandas documentation to see how you can style subplots made from a dataframe: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

hint: use the name of your dataframe in stead of "DataFrame" as used in the link above.

In [None]:

#plotting everything together as subplots
lab2_data.plot(subplots=True)
