# NumPy Operations
NumPy is a Python package that provides lots of the underlying functionality of pandas. In fact we encounter NumPy every time we see a NaN value. Pandas uses NumPy under-the-hood to optimise several of its internal computations too.

Before we start, let's load `pandas`, `numpy` and our dataset.  Notice that NumPy also has a preferred shortform.

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv("data/AAPL_2024_clean.csv")
df["Date"] = pd.to_datetime(df["Date"])
df = df.set_index("Date").sort_index().drop_duplicates()
df

## Log Returns

Logarithmic returns are often used in finance due to their nice statistical properties. They are additive over time and this makes them ideal for historical returns over multiple periods. Consider the following case:

- You invest £100
- On the first day the simple daily return is 10%. Your investment is now worth £110
- On the second day your simple daily return is -10%. But now your investment is only worth £99

If you were just looking at simple returns, you might assume we had a net zero change. Logarithmic returns account for this compounding - adding the two log returns in this example would give -0.91%, close to the real -1% loss. The formula for calculating log returns is below.

$$ ln(\frac{price_{current}}{price_{original}}) $$

For daily returns, we'll use `shift()` to get the *original* price (i.e. the close price the day before). We'll store it in a new column to make the process easier to see. Then we'll need to use NumPy's `log` function to calculate the log returns.

In [None]:
# Shift to get previous close on the same row
df["Prev Close"] = df["Adj Close"].shift(1)
df

# Apply the formula
df["Log Returns"] = np.log(df["Adj Close"] / df["Prev Close"])
df

# Because log(a/b) = log(a) - log(b)
df["Simpler Log Returns"] = np.log(df["Adj Close"]) - np.log(df["Prev Close"])
df

# Finding the difference between two rows is such a common operation, we have diff()
df["Even Simpler Log Returns"] = np.log(df["Adj Close"]).diff()
df

# We can also find the log of 1 + the simple daily return
df["From simple to log"] = np.log(1 + df["Adj Close"].pct_change())
df

### Exercise: Cumulatively Comparing

The sum of the log returns is the natural logarithm of the cumulative return. To calculate the cumulative return from the log returns, sum the log returns over the period and exponentiate (NumPy has an `exp` function for this) the sum.

Calculate the cumulative daily return based on the adjusted close, and compare this to the cumulative return calculated from the log return.

## Other useful functions

Another useful NumPy function is `np.where()`, often used for populating columns with a signal or indicator, depending on if a condition is met. Let's create a column to colour code our trading days. Days will have a different colour depening on if the market closes higher (green) or lower (red) than the opening.

In [None]:
df["Colour"] = np.where(df["Close"] > df["Open"], "green", "red")
df



There is more to NumPy that is well worth exploring on your own!