# Python 101 @ SzISz X.
---

## Today: Time series

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
try:
    import seaborn as sns
except:
    pass

---
### Act I: Read OTP stock pritces (lack of creativity, I know)

Read data from the xls file.

In [None]:
BASE = '../data/'

In [None]:
#                  xls file uri          sheet name
df = pd.read_excel(BASE+'otp_stock.xls', 'Sheet 1')
# df stands for [d]ata[f]rame
df.head()

Remove empty column

In [None]:
del df['Dividend']
df.head()

Convert date strings into dates, and set them as index (also sort the column).

In [None]:
#               convert a column into date
df['Date'] = pd.to_datetime(df['Date'])
#       set it as index   sort the index
df = df.set_index('Date').sort_index()
df.head()

Get the basic statistics about the data.

Minimum

In [None]:
df.min()

Maximum

In [None]:
df.max()

Mean

In [None]:
df.mean()

Get everything at once

In [None]:
df.describe()

Let's focus on closing values!

In [None]:
df['Close'].plot()

In [None]:
try:
    sns.boxplot(df['Close'])
except:
    df.boxplot('Close')

Create a new column called 'diff' which is the difference between the opening and the closing prices. Plot it!

In [None]:
df['diff'] = df['Close'] - df['Open'] 
df['diff'].plot()

Let's have a closer look at that huge drop from summer 2011!

In [None]:
df['2011-07':'2011-09']['Close'].plot()

In [None]:
try:
    sns.boxplot(df['2011-07':'2011-09']['diff'])
except:
    df['2011-07':'2011-09'].boxplot('diff')

In [None]:
try:
    sns.boxplot(df['2011-07':'2011-09']['Close'])
except:
    df['2011-07':'2011-09'].boxplot('Close')

---
### Act II: Advanced operations

Save the closeing price to a new dataframe!

In [None]:
# [c]loseing [p]rice [f]rame
cpf = df['Close']

#### Subact: Moving statistics

The most common operation is the moving average (rolling mean). Feel free to experiment with the window size!

In [None]:
#                      column,      window size
mavg = pd.rolling_mean(cpf, 40)
#    shows the last five row
mavg.tail()

We plot the last 5 because:

In [None]:
mavg.head()

Why?

---

After the answer, plot the moving average! 

In [None]:
mavg.plot()

Put the two plot into the same axis system.

In [None]:
cpf.plot(label='Close')
mavg.plot(label='Close AVG')
plt.legend()

We can apply several other types of functions with moving window, the complete list can be found [here](http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments).  

---

#### Subact: Shifted computations

How can we compute something based on yesterday's data?  
Use the `shift` function!  
Let's compute the difference between the days:
$${d_t} = {p_t} - {p_{t-1}}$$

In [None]:
df['d'] = cpf - cpf.shift(1)
df['d'].head()

In [None]:
df['d'].describe()

In [None]:
df['d'].plot()

In [None]:
df[df['d'] > 0]['Close'].describe()

---
### Act III: Resampling

You can resample data to different time frequency. Two main parameters for the `resample` method is the time period you resemple to and the method that you use. By default the method is mean. The list time frames is accessable from [here](http://pandas.pydata.org/pandas-docs/dev/timeseries.html#offset-aliases).   

In [None]:
# Monthly mean
cpf.resample('M').plot()

In [None]:
# Weekly median
cpf.resample('W', how='median').plot()

In [None]:
# 3 weekly minimum
cpf.resample('3W', how='min').plot()

In [None]:
# quarter year max
cpf.resample('Q', how='max').plot()

---
### Final Act: your turn

Compute the moving sum on the "groving" days (when the price of the stock went up)!  

Compute the moving standard deviation function
$${sd} = \sqrt{{E}[({x}-\mu)^2]}$$
on the dataset with 60 window size! Plot it, and plot the resampled (quarterly, mean) dataframe into the same coordinate system (use the `style='--g'` argument in the `plot` method)!

In [None]:
def msd(x):
    return np.sqrt(np.square(x - x.mean()).mean())