<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Python for Finance

**Online Bootcamp**

&copy; Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [training@tpq.io](mailto:trainin@tpq.io) | [@dyjh](http://twitter.com/dyjh)

## Financial Data

In [None]:
import numpy as np
import pandas as pd
from pylab import plt
plt.style.use('seaborn-v0_8')
%config InlineBackend.figure_format = 'svg'

In [None]:
url = 'https://certificate.tpq.io/findata.csv'

In [None]:
print(pd.read_csv(url))

In [None]:
raw = pd.read_csv(url, index_col=0)

In [None]:
raw.info()

In [None]:
raw = pd.read_csv(url, index_col=0, parse_dates=True)

In [None]:
raw.info()

In [None]:
raw['AAPL.O'].plot();

## Market Prediction with Machine Learning

### Data Preprocessing

In [None]:
symbol = 'EUR='

In [None]:
data = pd.DataFrame(raw[symbol])

In [None]:
data['r'] = np.log(data / data.shift(1))

In [None]:
data['d'] = np.sign(data['r'])

In [None]:
data.tail()

In [None]:
lags = 7
cols = list()
for lag in range(1, lags + 1):
    col = f'lag_{lag}'
    data[col] = data['r'].shift(lag)
    cols.append(col)

In [None]:
cols

In [None]:
data.head(8)

In [None]:
data.dropna(inplace=True)

In [None]:
data.head()

### Prediction based on Supervised Learning (Binary Classification)

In [None]:
# %conda install scikit-learn -y

In [None]:
from sklearn.naive_bayes import GaussianNB  # supervised classification

In [None]:
model = GaussianNB()  # 1. step = model instantiation

In [None]:
model.fit(data[cols], data['d'])  # 2. step = model fitting

In [None]:
model.predict(data[cols])  # 3. step = label prediction

In [None]:
data['p'] = model.predict(data[cols])

In [None]:
from sklearn.metrics import accuracy_score

In [None]:
accuracy_score(data['d'], data['p'])

## Vectorized Backtesting

In [None]:
data[symbol].plot();

In [None]:
data['s'] = data['p'] * data['r']  # strategy returns under simplifying assumptions

In [None]:
data[['r', 's']].sum()  # sum of log returns

In [None]:
data[['r', 's']].sum().apply(np.exp)  # gross returns

In [None]:
data[['r', 's']].cumsum().apply(np.exp).plot();  # gross returns over time

**REMARK: This is a very simplistic, illustrative example under simplifying assumptions. It ignores aspects like transaction costs and, among other things, it implements an in-sample analysis only (= no train-test split).**

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="mailto:training@tpq.io">training@tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> 