# Exercises 03: playing around with station data (corrections)

Here I provide only the technical solutions to the excercises. The analysis will be discussed in class.

### Exercise 1

Unfortunately, the data preparator has forgotten to provide you the *metadata* of this file (this happens more often then you think). the only thing you now is that the station is located at [30.47153N, 90.64534E], on the Zhadang glacier in Tibet.

**Read the data with pandas and "explore" them. How long are the time series? Which temporal resolution? Try to find out which variable is which. Discuss their units. Once you got them all, rename the columns accordingly.**

In [None]:
# imports and defaults
import pandas as pd  
%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
pd.options.display.max_rows = 14
import seaborn as sns
sns.set_style('ticks')
sns.set_context('talk')

In [None]:
# read the data
df = pd.read_csv('DataGame.csv', index_col=0, parse_dates=True)

In [None]:
# plot all vars
for c in df.columns:
    # empty figure and plot
    plt.figure()
    df[c].plot(title=c)

In [None]:
# rename the columns
df.columns = ['PRESSURE', 'ANGLE', 'WINDSPEED', 'NETRAD', 'RH', 'SWIN', 'SR50', 'TEMP', 'SWOUT', 'WINDDIR', 'SURFTEMP']

### Exercise 2

Now that you've got all this sorted out, take the incoming shortwave radiation variable. Plot one summer day of data (any summer day). If it's not a clear sky day, pick another day with clear sky conditions.

**Discuss the daily cycle of radiation. Does it make sense? What it the probable time zone of the data? By how many hours should you shift the data?**

Once you have an idea, check if this corresponds to the actual solar time at this location, for example by using an online solar time calculator: http://www.esrl.noaa.gov/gmd/grad/solcalc/. Compare this time with the China Standard Time (CST).

**Redefine the index with a new time shifted with the right number of hours, so that the solar noon matches the data (approximately).**

In [None]:
# any day
df['SWIN'].loc['2011-08-18'].plot();

In [None]:
# approx 6 hours, confirmed by the webpage
df.index = df.index + pd.DateOffset(hours=6)

In [None]:
# check if its better
df['SWIN'].loc['2011-08-18'].plot();

### Exercise 3

We now have plenty of data at hand (this data costed the german taxpayers quite a lot of money). Let's focus on one of the most complicated variable first: wind. The wind direction is given with an angle value, which reports the direction **from where the wind is blowing**:
- North: 0°
- South: 180°
- West: 270°
- East: 90°

For example, if the wind direction is 45 degrees, the winds are coming out of the northeast and blowing towards the southwest. This would be called a north-easterly wind.

**Discuss the possible implications that a dummy averaging of the wind direction would have. Does it actually make sense to plot the wind direction as a connected time serie? Plot the wind-direction as an histogram. Choose to use the number of bins so that the size of a bin is 10°.**

By entering the coordinates of the station in http://www.bing.com/maps/ you can have a look at the geographical situation of the glacier. Analyse the dominant wind directions in this context. From which directions does the wind barely never blow?

**Plot the wind speed and direction as a scatter plot, with the wind-direction as x-axis and the wind-speed as y-axis.**

From where are the strongest winds coming from? Explain why it could be like this.

**Now reproduce the scatter plot from above, but once with data from August only, and once with data from January only. Discuss.**

Bonus question: plot the average diurnal cycle of wind speed in August, and in January. Discuss

In [None]:
# histogram plot
df['WINDDIR'].plot(kind='hist', bins=36);

In [None]:
# scatter plots
fig, (ax1, ax2, ax3) = plt.subplots(3, 1, figsize=(12, 16))
df.plot(kind='scatter', x='WINDDIR', y='WINDSPEED', title='All times', ax=ax1);
ax1.set_ylim([0, 20]);
df.loc[df.index.month == 1].plot(kind='scatter', x='WINDDIR', y='WINDSPEED', title='January', ax=ax2);
ax2.set_ylim([0, 20]);
df.loc[df.index.month == 8].plot(kind='scatter', x='WINDDIR', y='WINDSPEED', title='August', ax=ax3);
ax3.set_ylim([0, 20]);
plt.tight_layout()

In [None]:
# bonus
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
dfsel = df['WINDSPEED'].loc[df.index.month == 1]
dfsel.groupby(dfsel.index.hour).mean().plot(ax=ax1, title='January');
ax1.set_ylim([2, 6]);
dfsel = df['WINDSPEED'].loc[df.index.month == 8]
dfsel.groupby(dfsel.index.hour).mean().plot(ax=ax2, title='August');
ax2.set_ylim([2, 6]);

### Exercise 4

One of the oldest (and still widely used) model of ice and snow melt is the [degree day model](http://www.antarcticglaciers.org/glaciers-and-climate/numerical-ice-sheet-models/modelling-glacier-melt/). It relies on the assumption that melt occurs when air temperature is above the melting point.

**Compute the daily averages of air temperature and select the days with temperature above zero. When do they occur? Count the number of days with average temperature above zero. Discuss.**

In [None]:
# Averages
dailytemp = df['TEMP'].resample('D').mean()

In [None]:
# For the plot its better to crop than select (you couldn't know that)
dailytemp.loc[dailytemp < 0] = np.NaN
dailytemp.plot();

In [None]:
# count
ndays = len(dailytemp.loc[dailytemp > 0])
totdays = len(dailytemp)
print('{} days out of {} have a daily average above 0°C'.format(ndays, totdays))