## Python Brno Project - Working with Open Financial Data

### Python Brno Project - Part 1

The first part of the project will involve reading about and collecting the data we will use for the project.

1. Install the quandl package
  - https://www.quandl.com/tools/python
- Create a new jupyter notebook called 'Python Brno Project'
- Read through the quandl documentation in the link above to find out how to download time series into pandas dataframes
- Download and store the following timeseries in new variables
  - S&P 500 index: `YAHOO/INDEX_GSPC`
  - Effective federal funds rate: `FRED/FEDFUNDS`
- Read about what these series are
- Run a `head()` to see the strucutre of the data
- Run an `info()` on each dataframe
  - What time period do these series cover?
  - What are the maximum and minimum data points?
  - Is there any missing data?
- Plot these dataframes
- Run the describe function on each dataframes
- Do a google serach to determine the pandas function for 'percentage change'
  - Run this function on the S&P 500 index Close value
    - What is the average daily return and standard deviation for the index Close?
    - What is the average daily return and standard deviation for the index Open?
- Look at the index values for each dataframe
  - What is the type of the index values?
  - Can we easily compare these two data series in their current form?
  - What do we need to do to the index values to make them comparable
  - What should we do when modifying the indices to ensure we avoid lookahead bias

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import pandas as pd
pd.options.display.max_rows = 12

import quandl

In [None]:
# fed_funds = quandl.get('FRED/FEDFUNDS')
# sp500_index = quandl.get('YAHOO/INDEX_GSPC')
# fed_funds.to_hdf('data.h5', 'fed_funds')
# sp500_index.to_hdf('data.h5', 'sp500_index')

In [None]:
fed_funds = pd.read_hdf('data.h5','fed_funds')
sp500_index = pd.read_hdf('data.h5','sp500_index')

In [None]:
fed_funds

In [None]:
sp500_index

In [None]:
sp500_index.Open.pct_change().dropna().mean()*100

In [None]:
sp500_index.Close.pct_change().dropna().mean()*100

In [None]:
fed_funds.index

In [None]:
sp500_index.index

In [None]:
len(fed_funds.index)

In [None]:
len(sp500_index.index)

### Python Brno Project - Part 2

### Replication

https://blogs.cfainstitute.org/investor/2015/11/16/how-does-monetary-policy-impact-market-performance/

- Normalizing: interest rates are rising, companies have to pay higher rates to take out loans
- Accomodating: interest rates are falling, companies can take out loans more cheaply 


We want to test a strategy:

- invest in the stock market when interest rates are falling
- keep your money in cash when interest rates are rising

### Lookahead Bias

- primary source of all exceptional trading strategy returns / exceptional machine learning model performance
- information from the future which would not have been known at decision time is accidentally used


In our case, we are going to use the interest rate data as our trading signal. So we must ensure that our trades are executed after the date specified in the interest rate series.

We want to reindex our the S&P 500 data by the fed_funds index plus a 1 day offset

In [None]:
fed_funds.index

In [None]:
fed_funds.index + pd.Timedelta('1 day')

In [None]:
fed_index_plus_1d = fed_funds.index + pd.Timedelta('1 day')
sp500_index.loc[fed_index_plus_1d,:]

When we try to reindex the s&p 500 data, we get some null values because the market was closed on these days


Luckily pandas has a beautiful `reindex` method which we can use to backfill futures values to this date. Remember, we are only able to backfill because we have specifically offset the trade date to a day after our interest rate observation.


Let's try it an check our work.

In [None]:
fed_index_plus_1d = fed_funds.index + pd.Timedelta('1 day')
sp500_index_by_fed_funds_index = sp500_index.reindex(fed_index_plus_1d, method='backfill')
sp500_index_by_fed_funds_index

Check that the null values from `1954-10-02` and `2016-10-02` match what we'd expect

In [None]:
# sp500_index.loc['1954-10']
# sp500_index.loc['2016-10']

Now we can join the tables together. First check that the indexes are equal.

In [None]:
# assert all(fed_funds.index == sp500_index_by_fed_funds_index.index)

We need to reindex the sp500 table again because we forced the index values 1 day ahead.

In [None]:
sp500_index_by_fed_funds_index.index = (sp500_index_by_fed_funds_index.index - pd.Timedelta('1 day'))

In [None]:
sp500_index_by_fed_funds_index

Check again

In [None]:
assert all(fed_funds.index == sp500_index_by_fed_funds_index.index)

In [None]:
assert len(fed_funds) == len(sp500_index_by_fed_funds_index)

data_merged = fed_funds.join(sp500_index_by_fed_funds_index)
data_merged
# assert len(fed_funds) == len(sp500_index_by_fed_funds_index) == len(data_joined)

In [None]:
data_merged = data_merged.rename(columns={'VALUE' : 'fed_rate'})
data_merged

### Homework

1. In the original sp500 table create a new column called `ma200` and set its value to the 200 day moving average.
2. Drop null values from the table
3. Assert that the new table is 200 rows less than the old one
4. Repeat the steps above to merge the table with the moving average and the fed rate series
5. Create a new column called `rate_decreasing` which is true if the change in the rate over the last period is negative
6. Create a new column called `long` which has the same value as `rate_decreasing` but shifted up one row (use `df.shift`)
7. Create a new column called `period_return` which and set its value equal to one plus the change in the value of `Close`
8. Create a new column called `portfolio_return` and set it to `period_return` if `long` is true. Otherwise set it to 1.

#### Bonus:
- combine the ma200 indicator with the rate_increasing indicator
- if the s&p 500 is above it's 200d moving average, go long, otherwise stay out of the market

### Python Brno Project - Part 3