# NumPy Operations
NumPy is a Python package that provides lots of the underlying functionality of pandas. In fact we encounter NumPy every time we see a NaN value. Pandas uses NumPy under-the-hood to optimise several of its internal computations too.

Before we start, let's load `pandas`, `numpy` and our dataset.  Notice that NumPy also has a preferred shortform.

In [4]:
import pandas as pd
import numpy as np




In [5]:
df = pd.read_csv("https://raw.githubusercontent.com/ImperialCollegeLondon/efds-ta-python/refs/heads/main/data/AAPL_2024_clean.csv")
df.Date = pd.to_datetime(df.Date)
df.set_index("Date", inplace=True)
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-01-02,187.149994,188.440002,183.889999,185.639999,185.403412,82488700
2024-01-03,184.220001,185.880005,183.429993,184.250000,184.015198,58414500
2024-01-04,182.149994,183.089996,180.880005,181.910004,181.678177,71983600
2024-01-05,181.990005,182.759995,180.169998,181.179993,180.949097,62303300
2024-01-08,182.089996,185.600006,181.500000,185.559998,185.323517,59144500
...,...,...,...,...,...,...
2024-05-23,190.979996,191.000000,186.630005,186.880005,186.880005,51005900
2024-05-24,188.820007,189.979996,188.039993,189.979996,189.979996,36294600
2024-05-28,191.509995,193.000000,189.100006,189.990005,189.990005,52280100
2024-05-29,189.610001,192.250000,189.509995,190.289993,190.289993,53068000


## Log Returns

Logarithmic returns are often used in finance due to their nice statistical properties. They are additive over time and this makes them ideal for historical returns over multiple periods. Consider the following case:

- You invest £100
- On the first day the simple daily return is 10%. Your investment is now worth £110
- On the second day your simple daily return is -10%. But now your investment is only worth £99

If you were just looking at simple returns, you might assume we had a net zero change. Logarithmic returns account for this compounding - adding the two log returns in this example would give -0.91%, close to the real -1% loss. The formula for calculating log returns is below.

$$ ln(\frac{price_{current}}{price_{original}}) $$

For daily returns, we'll use `shift()` to get the *original* price (i.e. the close price the day before). We'll store it in a new column to make the process easier to see. Then we'll need to use NumPy's `log` function to calculate the log returns.

In [17]:
df["Prev_Close"] = df["Adj Close"].shift(1)
# shift 1 row down

np.log(df["Adj Close"] / df.Prev_Close)

np.log(df["Adj Close"]) - np.log(df.Prev_Close)

df["LogReturns"] = np.log(1 + df["Adj Close"].pct_change())
df


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Prev_Close,LogReturns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2024-01-02,187.149994,188.440002,183.889999,185.639999,185.403412,82488700,,
2024-01-03,184.220001,185.880005,183.429993,184.250000,184.015198,58414500,185.403412,-0.007516
2024-01-04,182.149994,183.089996,180.880005,181.910004,181.678177,71983600,184.015198,-0.012781
2024-01-05,181.990005,182.759995,180.169998,181.179993,180.949097,62303300,181.678177,-0.004021
2024-01-08,182.089996,185.600006,181.500000,185.559998,185.323517,59144500,180.949097,0.023887
...,...,...,...,...,...,...,...,...
2024-05-23,190.979996,191.000000,186.630005,186.880005,186.880005,51005900,189.615005,-0.014529
2024-05-24,188.820007,189.979996,188.039993,189.979996,189.979996,36294600,186.880005,0.016452
2024-05-28,191.509995,193.000000,189.100006,189.990005,189.990005,52280100,189.979996,0.000053
2024-05-29,189.610001,192.250000,189.509995,190.289993,190.289993,53068000,189.990005,0.001578


### Exercise: Cumulatively Comparing

The sum of the log returns is the natural logarithm of the cumulative return. To calculate the cumulative return from the log returns, sum the log returns over the period and exponentiate (NumPy has an `exp` function for this) the sum.

Calculate the cumulative daily return based on the adjusted close, and compare this to the cumulative return calculated from the log return.

In [None]:
np.exp(df.LogReturns.sum()) - 1 # Log Returns
# basically when I use log, I just need to sum up each Log return and check for that return as of that moment
# can change it to .cumsum() to get each intermediate step

(1 + df["Adj Close"].pct_change()).prod() - 1 # simple returns
# same as yesterday (see data 2)
# from the return, we multiply day to day to get cumulative return / sum of all returns

df


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Prev_Close,LogReturns
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2024-01-02,187.149994,188.440002,183.889999,185.639999,185.403412,82488700,,
2024-01-03,184.220001,185.880005,183.429993,184.250000,184.015198,58414500,185.403412,-0.007516
2024-01-04,182.149994,183.089996,180.880005,181.910004,181.678177,71983600,184.015198,-0.012781
2024-01-05,181.990005,182.759995,180.169998,181.179993,180.949097,62303300,181.678177,-0.004021
2024-01-08,182.089996,185.600006,181.500000,185.559998,185.323517,59144500,180.949097,0.023887
...,...,...,...,...,...,...,...,...
2024-05-23,190.979996,191.000000,186.630005,186.880005,186.880005,51005900,189.615005,-0.014529
2024-05-24,188.820007,189.979996,188.039993,189.979996,189.979996,36294600,186.880005,0.016452
2024-05-28,191.509995,193.000000,189.100006,189.990005,189.990005,52280100,189.979996,0.000053
2024-05-29,189.610001,192.250000,189.509995,190.289993,190.289993,53068000,189.990005,0.001578


## Other useful functions

Another useful NumPy function is `np.where()`, often used for populating columns with a signal or indicator, depending on if a condition is met. Let's create a column to colour code our trading days. Days will have a different colour depening on if the market closes higher (green) or lower (red) than the opening.

In [None]:
df["Colour"] = np.where(df.Close > df.Open, "green", "red")
# condition, true, false

There is more to NumPy that is well worth exploring on your own!