## Part 02: Time-series properties of data and currency baskets

### In this section you will:


*   Read in data from the file you saved per Part 01on the web.
*   Create a pandas dataframe to add rows and columns to the data.
*   Use Seaborn to plot the data to check it has loaded correctly.

### Before you begin:

*   Make sure you have saved the file "returns.pkl" to your directory 


### Import necessary libraries and open saved pickle file

In [None]:
import cloudpickle as cp
import numpy as np, pandas as pd
from datetime import datetime, timedelta
import pickle
import urllib.request, urllib.parse, urllib.error
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from pandas.plotting import autocorrelation_plot
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [None]:
with open('./returns.pkl', 'rb') as f:
    returns = pickle.load(f)
    f.close()

### Data Exploration
Let's get a sense of the dataset, check for missing values and generate plots to see interactions between features.

### Step 1 - General info on dataframe and check for missing data (good practice)

In [None]:
# Get a sense of the data you're working with by running describe() and info() functions in pandas
returns.describe() # You could get the same result graphically with a boxplot for each feature - due to size fo dataset, takes longer to run, so we skip for this tutorial

In [None]:
# Checking for missing values, and extracting count if applicable
returns.info()
returns.isnull().sum() #returns count of missing value per feature(column)

### Step 2 - Calculate autocorrelations with multiple lags using basic functions

#### Calculate autocorrelation(using tips from this stackoverflow post: https://stackoverflow.com/questions/26083293/calculating-autocorrelation-of-pandas-dataframe-along-each-column)

In [None]:
def df_autocorr(df, lag=1, axis=0):
    """Compute full-sample column-wise autocorrelation for a DataFrame."""
    return df.apply(lambda col: col.autocorr(lag), axis=axis)

def df_rolling_autocorr(df, window, lag=1):
    """Compute rolling column-wise autocorrelation for a DataFrame."""

    return (df.rolling(window=window)
        .corr(df.shift(lag))) # could .dropna() here

In [None]:
autocorr_vec = lambda lag: df_autocorr(returns**2,lag=lag)
returns_autocorr = pd.DataFrame(list(map(autocorr_vec, range(1,21))),index=range(1,21))
returns_autocorr

### Step 3 - Calculate DXY index (which is calculated by taking a weighted average of currencies with weights:
Euro (EUR), 57.6% weight
Japanese yen (JPY) 13.6% weight
Pound sterling (GBP), 11.9% weight
Canadian dollar (CAD), 9.1% weight
Swedish krona (SEK), 4.2% weight
Swiss franc (CHF) 3.6% weight


In [None]:
dxy_weight = [0, 0.119, 0.036, 0, 0.136, 0.576, 0, 0, 0.091]
dxy = returns.dot(dxy_weight)
# print(dxy)
returns['DXY'] = dxy.copy()
print(returns.head())

### Step 4- Calculate autocorrelation and partial autocorrelation functions with the statsmodels package to get error bounds on plots

#### An autocorrelation function (ACF) is a plot of the calculated correlation of the time series observations with values of the same series at previous times (lag). The ACF indicates how well a value relates to its previous lagged values (both direct and indirect relationships of dependence)

In [None]:
plot_acf(returns['DXY']**2, lags=20)

#### A partial autocorrelation function (PACF) is a plot of is a summary of the relationship between an observation in a time series with observations at prior time steps with the relationships of intervening observations removed. The PACF indicates how well a value relates to its last lagged values (only direct relationship of dependence)

In [None]:
plot_pacf(returns['DXY']**2, lags=20)

### Step 5 - More exploration. Let's see how each currency correlates to one another.
#### Run seaborn plot and look at marginal distributions (histograms) and joint distributions (scatter plots) to see correlation between features. If you plot with a regression line,you will see the strength of correlation between features.

In [None]:
sns.pairplot(returns, kind = "reg")

In [None]:
X = pd.concat([dxy, dxy.cumsum(), dxy**2], axis=1)
X.columns = ['DXY','DXY (Cumulative)','DXY squared']
X /= X.max()
X['DXY squared'] -= 1
X.plot(figsize = (10,8))
plt.axis('off')