<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Python for Algorithmic Trading

**Appendix A &mdash; Python, NumPy, pandas**

# Python Basics

## Data Types

In [None]:
a = 3  # defining an integer object

In [None]:
type(a)

In [None]:
a.bit_length()  # number of bits used

In [None]:
b = 5.  # defining a float object

In [None]:
type(b)

In [None]:
c = 10 ** 100  # googol number

In [None]:
c # long integer object

In [None]:
c.bit_length()  # number of bits used

In [None]:
3 / 5.  # division

In [None]:
a * b  # multiplication

In [None]:
a - b  # difference

In [None]:
b + a  # addition

In [None]:
a ** b  # power

In [None]:
import math  # importing the library into the namespace

In [None]:
math.log(a)  # natural logarithm

In [None]:
math.exp(a)  # exponential function

In [None]:
math.sin(b)  # sine function

Another important basic data type are string objects.

In [None]:
s = 'Python for Algorithmic Trading.'

In [None]:
type(s)

This object type has multiple methods attached.

In [None]:
s.lower()  # converting to lower case characters

In [None]:
s.upper()  # converting to upper case characters

In [None]:
s[0:6]

In [None]:
st = s[0:6] + s[-9:-1]

In [None]:
print(st)

In [None]:
repl = 'My name is %s, I am %d years old and %4.2f m tall.'

In [None]:
# replace %s by a string, %d by an integer and
# %4.2f by a float showing 2 decimal values
print(repl % ('Peter', 35, 1.88))

In [None]:
repl = "My name is {:s}, I am {:d} years old and {:4.2f} m tall."

In [None]:
print(repl.format('Peter', 35, 1.88))

## Data Structures

In [None]:
t1 = (a, b, st)

In [None]:
t1

In [None]:
type(t1)

In [None]:
t2 = st, b, a

In [None]:
t2

In [None]:
type(t2)

In [None]:
t = (t1, t2)

In [None]:
t

In [None]:
t[0][2]  # take 3rd element of 1st element

In [None]:
l = [a, b, st]

In [None]:
l

In [None]:
type(l)

In [None]:
l.append(s.split()[3])  # append 4th word of string

In [None]:
l

In [None]:
l = list(('Z', 'Q', 'D', 'J', 'E', 'H', '5.', 'a'))

In [None]:
l

In [None]:
l.sort()  # in-place sorting

In [None]:
l

In [None]:
d = {'int_obj': a, 'float_obj': b, 'string_obj': st}

In [None]:
type(d)

In [None]:
d

In [None]:
d['float_obj']  # look-up of value given key

In [None]:
d['long_obj'] = 10 ** 20  # adding new key value pair

In [None]:
d

Keys and values of a dictionary object can be retrieved as list objects.

In [None]:
d.keys()

In [None]:
d.values()

## Control Structures

Iterations are very important operations in programming in general and financial analytics in particular. Many Python objects are iterable which proves rather convenient in many circumstances. Consider the special list object constructor `range`.

In [None]:
range(5)  # all integers from zero to 5 excluded

In [None]:
range(3, 15, 2)  # start at 3, step with 2 until 15 excluded

Such a list object constructor is often used in the context of a `for` loop.

In [None]:
for i in range(5):
    print(i ** 2, end=' ')

In [None]:
for i in range(3, 15, 2):
    print(i, end=' ')

However, you can iterate over any sequence.

In [None]:
# iteration over list object
l = ['a', 'b', 'c', 'd', 'e']

In [None]:
for _ in l:
    print(_)

In [None]:
s = 'Python Trading'

In [None]:
# iteration over string object
for c in s:
    print(c + '|', end='')

In [None]:
i = 0  # initialize counter

In [None]:
while i < 5:
    print(i ** 0.5, end=' ')  # output
    i += 1  # increase counter by 1

## Special Python Idioms 

Python in many places relies on a number of special idioms. Let us start with a rather popular one, the list comprehension.

In [None]:
lc = [i ** 2 for i in range(10)]

In [None]:
lc

In [None]:
type(lc)

In [None]:
f = lambda x: math.cos(x) # returns cos of x

In [None]:
f(5)

In [None]:
list(map(lambda x: math.cos(x), range(10)))

In [None]:
def f(x):
    return math.exp(x)

In [None]:
f(5)

In [None]:
def f(*args):  # multiple arguments
    for arg in args:
        print(arg)
        # do something with arguments
    return None  # return result(s) (not necessary)

In [None]:
f(l)

In [None]:
import random  # import random number library

In [None]:
a = random.randint(0, 1000)  # draw random number between 0 and 1000

In [None]:
print("Random number is %d" % a)

In [None]:
def number_decide(number):
    if a < 10:
        return "Number is single digit."
    elif 10 <= a < 100:
        return "Number is double digit."
    else:
        return "Number is triple digit."

In [None]:
number_decide(a)

# NumPy

## Regular ndarray Objects

In [None]:
import numpy as np

In [None]:
a = np.array(range(24))

In [None]:
a

In [None]:
b = a.reshape((4, 6))

In [None]:
b

In [None]:
c = a.reshape((2, 3, 4))

In [None]:
c

In [None]:
b = np.array(b, dtype=np.float)

In [None]:
b

## Vectorized Operations

In [None]:
2 * b

In [None]:
b ** 2

You can also pass `ndarray` objects to lambda or standard Python functions.

In [None]:
f = lambda x: x ** 2 - 2 * x + 0.5

In [None]:
f(a)

In [None]:
a[2:6]  # 3rd to 6th element

In [None]:
b[2, 4]  # 3rd row, final (5th)

In [None]:
b[1:3, 2:4]  # middle square of numbers

## Boolean Operations

In [None]:
# which numbers are larger than 10?
b > 10

In [None]:
# only those numbers (flat) that are larger than 10
b[b > 10]

## ndarray Methods and Universal Functions

In [None]:
a.sum()  # sum of all elements

In [None]:
b.mean()  # mean of all elements

In [None]:
b.mean(axis=0)  # mean along 1st axis

In [None]:
b.mean(axis=1)  # mean along 2nd axis

In [None]:
c.std()  # standard deviation for all elements

In [None]:
np.sum(a)  # sum of all elements

In [None]:
np.mean(b, axis=0)  # mean alond 1st axis

In [None]:
np.sin(b).round(2)  # sine of all elements (rounded)

In [None]:
np.sin(4.5)  # sine of Python float object

In [None]:
%time l = [np.sin(x) for x in range(1000000)]

In [None]:
import math

In [None]:
%time l = [math.sin(x) for x in range(1000000)]

In [None]:
%time a = np.sin(np.arange(1000000))

In [None]:
import sys

In [None]:
sys.getsizeof(a)

In [None]:
a.nbytes

## ndarray Creation

Here, we use the ndarray object constructor `arange` which yields an `ndarray` object of integers -- below a simple example.

In [None]:
ai = np.arange(10)

In [None]:
ai

In [None]:
ai.dtype

In [None]:
af = np.arange(0.5, 9.5, 0.5)  # start, end, step size

In [None]:
af

In [None]:
af.dtype

In [None]:
np.linspace(0, 10, 12)  # start, end, number of elements

## Random Numbers

In [None]:
np.random.standard_normal(10) 

In [None]:
np.random.poisson(0.5, 10)

In [None]:
np.random.seed(1000)  # fix the rng seed value

In [None]:
data = np.random.standard_normal((5, 100))

In [None]:
data[:, :3]

In [None]:
data.mean()  # should be 0.0

In [None]:
data.std()  # should be 1.0

In [None]:
data = data - data.mean()  # correction for the 1st moment

In [None]:
data.mean()  # now really close to 0.0

In [None]:
data = data / data.std()  # correction for the 2nd moment

In [None]:
data.std()  # now really close to 1.0

# matplotlib

In [None]:
import matplotlib.pyplot as plt  # import main plotting library

In [None]:
plt.style.use('seaborn')  # set seaborn standards

In [None]:
import matplotlib

In [None]:
matplotlib.rcParams['font.family'] = 'serif'
%matplotlib inline

In [None]:
data = np.random.standard_normal((5, 100))

In [None]:
plt.figure(figsize=(10, 6))  # size of figure
plt.plot(data.cumsum())  # cumulative sum over all elements
# plt.savefig('../../images/chA/plot_01.png');

In [None]:
plt.figure(figsize=(10, 6));  # size of figure
# plotting five cumulative sums as lines
plt.plot(data.T.cumsum(axis=0), label='line');
plt.legend(loc=0);  # legend in best location
plt.xlabel('data point');  # x axis label
plt.ylabel('value');  # y axis label
plt.title('random series');  # figure title
# plt.savefig('../../images/chA/plot_02.png');

In [None]:
plt.figure(figsize=(10, 6))  # size of figure
plt.hist(data.flatten(), bins=30)
# plt.savefig('../../images/chA/plot_03.png');

In [None]:
plt.figure(figsize=(10, 6))  # size of figure
plt.bar(np.arange(1, 12) - 0.25, data[0, :11], width=0.5)
# plt.savefig('../../images/chA/plot_04.png');

In [None]:
x = np.arange(len(data.cumsum()))

In [None]:
y = data.cumsum()

In [None]:
rg1 = np.polyfit(x, y, 1)  # linear OLS

In [None]:
rg2 = np.polyfit(x, y, 2)  # quadratic OLS

In [None]:
rg3 = np.polyfit(x, y, 3)  # cubic OLS

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(x, y, 'r', label='data')
plt.plot(x, np.polyval(rg1, x), 'b--', label='linear')
plt.plot(x, np.polyval(rg2, x), 'b-.', label='quadratic')
plt.plot(x, np.polyval(rg3, x), 'b:', label='cubic')
plt.legend(loc=0)
# plt.savefig('../../images/chA/plot_05.png');

# pandas

## pandas DataFrame class

In [None]:
np.random.seed(1000)

In [None]:
raw = np.random.standard_normal((10, 3)).cumsum(axis=0)

In [None]:
index = ['2017-1-31', '2017-2-28', '2017-3-31',
         '2017-4-30', '2017-5-31', '2017-6-30',
         '2017-7-31', '2017-8-31', '2017-9-30',
         '2017-10-31']

In [None]:
columns = ['no1', 'no2', 'no3']

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(raw, index=index, columns=columns)

In [None]:
df

DataFrame objects have built in a multitude of basic, advanced and convenience methods, a few of which are illustrated without much commentary below.

In [None]:
df.head()  # first five rows

In [None]:
df.tail()  # last five rows

In [None]:
df.index  # index object

In [None]:
df.columns  # column names

In [None]:
df.info()  # meta information

In [None]:
df.describe()  # typical statistics

## Numerical Operations

In [None]:
df * 2  # vectorized multiplication

In [None]:
df.std()  # standard deviation by column

In [None]:
df.mean()

In [None]:
df.mean(axis=1)  # mean by index value

In [None]:
np.mean(df)  # mean via universal function

## Data Selection

In [None]:
df['no2']  # 2nd column

In [None]:
df.iloc[0]  # 1st row

In [None]:
df.iloc[2:4]  # 3rd & 4th row

In [None]:
df.iloc[2:4, 1]  # 3rd & 4th row, 2nd column

In [None]:
df.no3.iloc[3:7]  # dot look-up for column name

In [None]:
df.loc['2017-3-31']  # row given index value

In [None]:
df.loc['2017-5-31', 'no3']  # single data point

In [None]:
df['no1'] + 3 * df['no3']  # vectorized arithmetic operations

## Boolean Operations

In [None]:
df['no3'] > 0.5

In [None]:
df[df['no3'] > 0.5]

In [None]:
df[(df.no3 > 0.5) & (df.no2 > -0.25)]

In [None]:
df[df.index > '2017-4-30']

In [None]:
df.plot(figsize=(10, 6))
# plt.savefig('../../images/chA/plot_06.png');

In [None]:
df.index = pd.DatetimeIndex(df.index)

In [None]:
df.index

In [None]:
df.plot(figsize=(10, 6))
# plt.savefig('../../images/chA/plot_07.png');

In [None]:
df.hist(figsize=(10, 6))
# plt.savefig('../../images/chA/plot_08.png');

## Input-Output Operations

In [None]:
df.to_csv('data.csv')  # exports to CSV file

In [None]:
with open('data.csv') as f:  # open file
    for line in f.readlines():  # iterate over all lines
        print(line)  # print line

In [None]:
from_csv = pd.read_csv('data.csv',  # filename
                      index_col=0,  # index column
                      parse_dates=True)  # date index

In [None]:
from_csv.head()

In [None]:
h5 = pd.HDFStore('data.h5', 'w')  # open for writing

In [None]:
h5['df'] = df  # write object to database

In [None]:
h5

In [None]:
from_h5 = h5['df']  # reading from database

In [None]:
h5.close()  # closing the database

In [None]:
from_h5.tail()

In [None]:
!rm data.csv data.h5 # remove the objects from disk

## Financial Analytics Examples

In [None]:
h5 = pd.HDFStore('../data/equities.h5', 'r')

In [None]:
spx = pd.DataFrame(h5['data']['^GSPC'])

In [None]:
spx = spx[(spx.index > '2010-1-1') & (spx.index < '2017-1-1')]

In [None]:
spx.info()

In [None]:
vix = pd.DataFrame(h5['data']['^VIX'])

In [None]:
vix = vix[(vix.index > '2010-1-1') & (vix.index < '2017-1-1')]

In [None]:
vix.info()

Let us combine the respective `Close` columns into a single `DataFrame` object. Muliple ways are possible to accomplish this goal. 

In [None]:
# construction via join
spxvix = pd.DataFrame(spx['^GSPC']).join(vix['^VIX'])

In [None]:
spxvix.info()

In [None]:
# construction via merge
spxvix = pd.merge(pd.DataFrame(spx['^GSPC']),
                  pd.DataFrame(vix['^VIX']),
                  left_index=True,  # merge on left index
                  right_index=True,  # merge on right index
                 )

In [None]:
spxvix.info()

In [None]:
# construction via dictionary object
spxvix = pd.DataFrame({'SPX': spx['^GSPC'],
                       'VIX': vix['^VIX']},
                       index=spx.index)

In [None]:
spxvix.info()

In [None]:
spxvix.plot(figsize=(10, 6), subplots=True)
# plt.savefig('../../images/chA/example_01.png');

In [None]:
rets = np.log(spxvix / spxvix.shift(1))

In [None]:
rets = rets.dropna()

In [None]:
rets.head()

In [None]:
rg = np.polyfit(rets['SPX'], rets['VIX'], 1)

In [None]:
rets.plot(kind='scatter', x='SPX', y='VIX',
          style='.', figsize=(10, 6));
plt.plot(rets['SPX'], np.polyval(rg, rets['SPX']), 'r.-')
# plt.savefig('../../images/chA/example_02.png');

In [None]:
ret = rets.mean() * 252  # annualized return

In [None]:
ret

In [None]:
vol = rets.std() * math.sqrt(252)  # annualized volatility

In [None]:
vol

In [None]:
(ret - 0.01) / vol  # Sharpe ratio with rf = 0.01

In [None]:
plt.figure(figsize=(10, 6))
spxvix['SPX'].plot(label='S&P 500')
spxvix['SPX'].cummax().plot(label='running maximum')
plt.legend(loc=0)
# plt.savefig('../../images/chA/example_03.png');

In [None]:
adrawdown = spxvix['SPX'].cummax() - spxvix['SPX']

In [None]:
adrawdown.max()

In [None]:
rdrawdown = ((spxvix['SPX'].cummax() - spxvix['SPX']) /
              spxvix['SPX'].cummax())

In [None]:
rdrawdown.max()

In [None]:
temp = adrawdown[adrawdown == 0]

In [None]:
periods_spx = (temp.index[1:].to_pydatetime() -
               temp.index[:-1].to_pydatetime())

In [None]:
periods_spx[50:60]  # some selected data points

In [None]:
max(periods_spx)

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>

**Python Quant Platform** |
<a href="http://quant-platform.com">http://quant-platform.com</a>

**Python for Finance** |
<a href="http://python-for-finance.com" target="_blank">Python for Finance @ O'Reilly</a>

**Derivatives Analytics with Python** |
<a href="http://derivatives-analytics-with-python.com" target="_blank">Derivatives Analytics @ Wiley Finance</a>

**Listed Volatility and Variance Derivatives** |
<a href="http://lvvd.tpq.io" target="_blank">Listed VV Derivatives @ Wiley Finance</a>

**Python Training** |
<a href="http://training.tpq.io" target="_blank">Python for Finance University Certificate</a>