In [35]:
import pandas as pd
import numpy as np

Historical volatility is commonly taken to be the standard deviation of daily returns for the last calendar month.  There are other definitions one could use, however.  We will use the common definition and take one month to be 20 trading days.

If you look up the formula for standard deviation, you’ll find two forms given.

The first:
$$\sqrt{ \frac{1}{N} \sum_{i=1}^N(x_i - \mu)^2}$$
where $N$ is the number in the population, and $\mu$ is the population mean.

And:
$$\sqrt{ \frac{1}{n-1} \sum_{i=1}^n(x_i - \bar{x})^2}$$
where $n$ is the number of samples, and $\bar{x}$ is the sample mean.

Notice one divides by the number of points and other by one subtracted from the number of points.  I don’t want to get into the reason for this now, but unfortunately, Numpu and Pandas default to a different divisor.  The default behavior of both Pandas and Numpy can be overwritten by setting the <I>ddof</I> option.  Setting it to zero gives a divisor of $N$.  Setting it to one gives $N-1$, etc.  We will use $ddof=1$.

#  Pandas

The std function in Pandas sets ddof to one by defaul.

In [36]:
#  Import SPY data as a pandas data frame
SPY = pd.read_csv('SPY.csv')

#  Calculate the percent change of the price
SPY['PCT'] = SPY['Adj Close'].pct_change()

#  Remove the first value which is NaN
SPY_pct = SPY['PCT'].iloc[1:]

#  Calculate the volatility using the rolling function of pandas
#  ddof defaults to a value of 1 for pandas so we don't technically need it
vol_pandas = SPY_pct.rolling(20).std(ddof = 1).dropna()

#  Print the results
print(vol_pandas)

20      0.006299
21      0.006525
22      0.006515
23      0.006465
24      0.006674
          ...   
1253    0.007308
1254    0.007318
1255    0.007272
1256    0.007342
1257    0.007588
Name: PCT, Length: 1238, dtype: float64


#  Numpy

By default, Numpy's std function sets ddof to zero.

In [37]:
#  Turn the pandas pct frame into a numpy array
spy_pct = SPY_pct.to_numpy()

#  Calculate the vol for the first 20 points.  We need to set ddof to one here as it defaults to zero
np.std( spy_pct[0:20], ddof = 1)

0.00629862186666375

We can use a loop to get the rest of the points

In [38]:
period = 20
vol_numpy = np.zeros( (spy_pct.size - period + 1, ))

for i in range(spy_pct.size - period + 1):
    vol_numpy[i] = np.std( spy_pct[i:i + period], ddof = 1)
    
print(vol_numpy)

[0.00629862 0.00652467 0.00651489 ... 0.00727162 0.00734156 0.0075885 ]


In [39]:
#  Are both pandas and numpy methods the same?
np.allclose(vol_pandas.to_numpy(), vol_numpy)

True