## Vectorized Backtesting: Vectorization with NumPy and pandas

*[Coding along with Python for Algorithmic Trading, Yves Hilpisch, O'Reilly, 1st edition November 2020, ISBN-13: 978-1492053354]*

Testing ideas and hypothesis for an algorithmic trading program is the highly technical part of developing a trading strategy. 

> *Vectorized backtesting is a method of testing trading strategies where all trades are calculated simultaneously using arrays/matrices of historical market data, rather than processing trades one by one in a loop. This approach is typically much faster than event-driven backtesting since it leverages efficient array operations, though it may not capture some real-world trading complexities as accurately (claude.ai).*

### Vectorization with NumPy

The NumPy package for numerical computing brings vectorization into the Python ecosystem. According to its [website](https://numpy.org/), *"the NumPy vectorization, indexing, and broadcasting concepts are the de-facto standards of array computing today."* NumPy allows for vectorization techniques based on the regular array class `ndarray`.

In [40]:
import numpy as np

In [41]:
v = [1, 2, 3, 4, 5] # python list object

In [42]:
a = np.array(v) # instantiating and ndarray object based on a list array
a

array([1, 2, 3, 4, 5])

In [43]:
type(a) # checking the type ot the object

numpy.ndarray

In [44]:
# scalar multiplication in vectorized fashion
2 * a

array([ 2,  4,  6,  8, 10])

In [45]:
0.5 * a + 2 # linear transformation in vectorized fashion

array([2.5, 3. , 3.5, 4. , 4.5])

__Transition from a one-dimensional array (a vector) to a higher-dimensional structure (a matrix)__

In [46]:
# arange() creates one-dimensional ndarray object
# reshape() reshapes it to 2-dimensions
a = np.arange(12).reshape([3,4])
a

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [47]:
# scalar multiplication in vectorized fashion
2 * a

array([[ 0,  2,  4,  6],
       [ 8, 10, 12, 14],
       [16, 18, 20, 22]])

In [48]:
# calculating the square on vectorized fashion
a ** 2

array([[  0,   1,   4,   9],
       [ 16,  25,  36,  49],
       [ 64,  81, 100, 121]])

__Methods that allow vectorized operations (the "universal functions" of NumPy)__

In [49]:
a.mean() # the mean of all elements by method call

np.float64(5.5)

In [50]:
np.mean(a) # the mean of all elements by universal function

np.float64(5.5)

In [51]:
a.mean(axis=0) # the mean along the first axis by method call

array([4., 5., 6., 7.])

In [52]:
np.mean(a, axis=1) # the mean along the second axis by universal function

array([1.5, 5.5, 9.5])

### Vectorization with pandas

In [53]:
import pandas as pd

__pandas allows vectorization over time series data.__

> Vectorization in pandas leverages NumPy's efficient array operations under the hood to perform calculations on entire DataFrame columns simultaneously, rather than using slower element-by-element operations through Python loops. The DataFrame class is designed to take advantage of this vectorized approach, allowing operations like mathematical calculations, filtering, and aggregations to be performed rapidly across large datasets by treating columns as NumPy arrays and applying operations to them as a unit (claude.ai).

In [54]:
# example, beginning with a two-dimensional array object
a = np.arange(15).reshape(5, 3)
a

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [55]:
columns = list('abc') # generating a list object with column names
columns

['a', 'b', 'c']

In [56]:
# pandas DatetimeIndex object with a business-day frequency
index = pd.date_range('2021-7-1', periods=5, freq='B')
index

DatetimeIndex(['2021-07-01', '2021-07-02', '2021-07-05', '2021-07-06',
               '2021-07-07'],
              dtype='datetime64[ns]', freq='B')

In [57]:
# stitching together the dataFrame
# which means initiating a a DataFrame object based on a (ndarray), columns as column labels
# and index as index values
df = pd.DataFrame(a, columns=columns, index=index)
df

Unnamed: 0,a,b,c
2021-07-01,0,1,2
2021-07-02,3,4,5
2021-07-05,6,7,8
2021-07-06,9,10,11
2021-07-07,12,13,14


__Examples for vectorization; aggregation operations default to column-wise results__

In [58]:
2 * df

Unnamed: 0,a,b,c
2021-07-01,0,2,4
2021-07-02,6,8,10
2021-07-05,12,14,16
2021-07-06,18,20,22
2021-07-07,24,26,28


In [59]:
df.sum() # column-wise results

a    30
b    35
c    40
dtype: int64

In [60]:
df.mean() # mean per column

a    6.0
b    7.0
c    8.0
dtype: float64

In [61]:
np.mean(df) # mean over all data with numpy

np.float64(7.0)

__Column wise operations__

In [62]:
# column wise operations can be implemented by referencing the respective column names
df['a'], df['c']

(2021-07-01     0
 2021-07-02     3
 2021-07-05     6
 2021-07-06     9
 2021-07-07    12
 Freq: B, Name: a, dtype: int64,
 2021-07-01     2
 2021-07-02     5
 2021-07-05     8
 2021-07-06    11
 2021-07-07    14
 Freq: B, Name: c, dtype: int64)

In [63]:
df['a'] + df['c'] # bracked notation

2021-07-01     2
2021-07-02     8
2021-07-05    14
2021-07-06    20
2021-07-07    26
Freq: B, dtype: int64

In [64]:
0.5 * df.a + 2 * df.b + df.c # dot notation

2021-07-01     4.0
2021-07-02    14.5
2021-07-05    25.0
2021-07-06    35.5
2021-07-07    46.0
Freq: B, dtype: float64

In [65]:
# conditions yielding boolean results vectors
df['a'] > 5

2021-07-01    False
2021-07-02    False
2021-07-05     True
2021-07-06     True
2021-07-07     True
Freq: B, Name: a, dtype: bool

In [66]:
# conditions yielding SQL-like selections
df[df['a'] > 5] # select all rows where the element in column a is greater than five

Unnamed: 0,a,b,c
2021-07-05,6,7,8
2021-07-06,9,10,11
2021-07-07,12,13,14


__Important for vectorized backtesting: comparisions between two columns__

In [67]:
df['c'] > df['b']

2021-07-01    True
2021-07-02    True
2021-07-05    True
2021-07-06    True
2021-07-07    True
Freq: B, dtype: bool

In [68]:
0.15 * df.a + df.b > df.c

2021-07-01    False
2021-07-02    False
2021-07-05    False
2021-07-06     True
2021-07-07     True
Freq: B, dtype: bool