# Exploring AWS data with python and pandas

<img src="http://acinn.uibk.ac.at/sites/all/themes/imgi/images/acinn_logo.png" width="20%"  align="right">

This is a simple [jupyter notebook](http://jupyter.org/) created to illustrate its enormous potential for interactive data exploration and teaching. In this example we are going to explore meteorological data obtained at Zhadang Glacier, Tibetan Plateau.

**Author**: [Fabien Maussion](http://fabienmaussion.info/)

**Date**: 17.06.2016



## The automatic weather station (AWS)

<img src="https://dl.dropboxusercontent.com/u/20930277/do_not_delete/aws_tibet.jpg?raw=1" width="50%"  align="right">

The station is installed on the Zhadang glacier surface since 2009. 

- Location: 30.47153°N, 90.64534°E
- Altitude: 5665 m a.s.l.
- Variables: SWin, SWOut, LWOut, NetRad, Temp, RH, Wind speed & direction, SR50, Pressure, Precipitation, Ice temp


Related publications: 

[Maussion et al., (2011)](https://dl.dropboxusercontent.com/u/20930277/do_not_delete/Maussion_etal_2011.pdf), [Mölg et al., (2012)](http://www.the-cryosphere.net/6/1445/2012/tc-6-1445-2012.pdf), [Zhang et al., (2013)](http://www.cryoscience.net/pub/pdf/2013jg_zhang.pdf), [Huintjes et al., (2015)](http://www.bioone.org/doi/abs/10.1657/AAAR0014-073).


## Exploring the data 

We are going to use the [pandas](http://pandas.pydata.org/) library for the IO and the data crunching, [matplotlib](http://matplotlib.org/) and [seaborn](https://web.stanford.edu/~mwaskom/software/seaborn/) for the visualisation:

In [None]:
# import the modules we need
import numpy as np
import pandas as pd  
import matplotlib.pyplot as plt
import seaborn as sns
# some cosmetic defaults
sns.set_style('ticks')
sns.set_context('talk')
pd.options.display.max_rows = 14

In [None]:
# Read the data
df = pd.read_csv('aws_data_zhadang_UTC+6.csv', index_col=0, parse_dates=True)

df is a new variable we just created. It a short name for [DataFrame](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). A dataframe is a table, very similar to the data model of Excel (but muche more flexible and powerful). Let's simply print it:

In [None]:
df

The columns of a DataFrame can be extracted and plotted very easily: 

In [None]:
df['TEMP'].plot();
plt.ylabel('°C');
plt.title('2m air temperature');
plt.show();

In [None]:
df[['SWIN', 'SWOUT']].plot();
plt.ylabel('W m$^{-2}$');

It is also possible to make more elaborated plots:

In [None]:
f, ax = plt.subplots(figsize=(5, 5));
df.plot(x='SWIN', y='SWOUT', kind='scatter', ax=ax);
ax.set_xlim([0, 1500]);
ax.set_ylim([0, 1500]);
ax.set_aspect('equal');

Note the clear clusters corresponding to the albedos of fresh snow and of ice, which also have a clearly defined physical limit. There are several outliers, related to the covering of the inciming SW sensor by snowfall.

With the help of seaborn, making even more sophisticated plots is very easy:

In [None]:
ax = sns.jointplot(x='WINDDIR', y='WINDSPEED', data=df, stat_func=None, xlim=[0, 360], ylim=[0, 17]);

We detect at least two preferred wind directions (north-westerly -from North West to South East-, and southerly -from South to North, or downglacier-). The highest wind speeds are observed when the winds are north-westerly. 

## Data crunching

Pandas excels at selecting, grouping, and analysing data. Let's start with building daily averages of our hourly records:

In [None]:
df_avg = df.resample('D').mean();
# Compute the daily albedo and filter out spurious values
df_avg['ALBEDO'] = df_avg['SWOUT'] / df_avg['SWIN']
df_avg['ALBEDO'] = np.where(df_avg['ALBEDO'] < 0.9, df_avg['ALBEDO'], np.NaN)
df_avg[['SWIN', 'SWOUT', 'ALBEDO']].plot(secondary_y='ALBEDO');

Compute monthly averages is as easy:

In [None]:
df_avg = df.resample('MS').mean();
df_avg['NETRAD'].plot();
plt.title('Net radiation')
plt.ylabel('W m$^{-2}$');

And so is the computation of the daily cycle for a specific month of the year:

In [None]:
df_mon = df.loc[df.index.month == 7]
df_cycle = df_mon.groupby(df_mon.index.hour).mean()
df_cycle[['SWIN', 'SWOUT', 'TEMP']].plot(secondary_y='TEMP');
plt.title('Daily cycle of SW fluxes and Temperature in July');

## More examples

You will find much more examples and tutorials in the repository of the "Cryopshere and the Climate System" lecture [here](https://github.com/fmaussion/teaching/tree/master/ss_2016).