<div style="background-color:#000;"><img src="pqn.png"></img></div>

## Library installation

Install the runtime dependencies so this notebook runs anywhere without manual setup. We include yfinance, pandas, numpy, and matplotlib for data, arrays, and plots.

In [None]:
!pip install yfinance pandas numpy matplotlib

We install pandas even if we do not import it directly because yfinance returns pandas DataFrames and Series. A single cell like this removes environment friction and makes the notebook portable. No special system packages are typically required for these libraries.

## Imports and setup

We import math for constants and square roots, numpy for fast elementwise log operations, yfinance to fetch OHLCV bars, and matplotlib.pyplot to plot the close and our volatility estimates.

In [None]:
import math

In [None]:
import numpy as np
import yfinance as yf
import matplotlib.pyplot as plt

Keeping imports minimal helps focus on measurement rather than tooling. yfinance returns pandas objects, so we can use rolling windows without importing pandas explicitly. These are the only modules we need to build, compare, and visualize the estimators.

## Download OHLCV data for testing

Download daily OHLCV for AAPL over a multi-year span to feed each estimator with realistic bar information. This includes Open, High, Low, Close that capture intraday ranges and overnight gaps we care about.

In [None]:
data = yf.download("AAPL", start="2017-01-01", end="2022-06-30")

Bar data avoids the close-only trap by retaining the range inside each session and the discontinuity between sessions. A longer sample improves stability of rolling statistics, but we will still see small-sample effects in short windows. You can swap the ticker or dates later to pressure-test how robust your sizing metric is across regimes.

## Define realized volatility estimators clearly

Implement six realized volatility estimators on OHLC bars, all annualized to trading_periods and computed with rolling windows that avoid lookahead by using only past data. Each function returns a Series aligned with the input index and can drop warm-up NaNs via clean=True.

In [None]:
def standard_deviation(price_data, window=30, trading_periods=252, clean=True):
    log_return = (price_data["Close"] / price_data["Close"].shift(1)).apply(np.log)

    result = (
        log_return.rolling(window=window, center=False).std()
        * math.sqrt(trading_periods)
    )

    if clean:
        return result.dropna()
    else:
        return result

In [None]:
def parkinson(price_data, window=30, trading_periods=252, clean=True):
    rs = (1.0 / (4.0 * math.log(2.0))) * (
        (price_data["High"] / price_data["Low"]).apply(np.log)
    ) ** 2.0

    def f(v):
        return (trading_periods * v.mean()) ** 0.5

    result = rs.rolling(window=window, center=False).apply(func=f)

    if clean:
        return result.dropna()
    else:
        return result

In [None]:
def garman_klass(price_data, window=30, trading_periods=252, clean=True):
    log_hl = (price_data["High"] / price_data["Low"]).apply(np.log)
    log_co = (price_data["Close"] / price_data["Open"]).apply(np.log)

    rs = 0.5 * log_hl ** 2 - (2 * math.log(2) - 1) * log_co ** 2

    def f(v):
        return (trading_periods * v.mean()) ** 0.5

    result = rs.rolling(window=window, center=False).apply(func=f)

    if clean:
        return result.dropna()
    else:
        return result

In [None]:
def hodges_tompkins(price_data, window=30, trading_periods=252, clean=True):
    log_return = (price_data["Close"] / price_data["Close"].shift(1)).apply(np.log)

    vol = (
        log_return.rolling(window=window, center=False).std()
        * math.sqrt(trading_periods)
    )

    h = window
    n = (log_return.count() - h) + 1

    adj_factor = 1.0 / (
        1.0 - (h / n) + ((h ** 2 - 1) / (3 * (n ** 2)))
    )

    result = vol * adj_factor

    if clean:
        return result.dropna()
    else:
        return result

In [None]:
def rogers_satchell(price_data, window=30, trading_periods=252, clean=True):
    log_ho = (price_data["High"] / price_data["Open"]).apply(np.log)
    log_lo = (price_data["Low"] / price_data["Open"]).apply(np.log)
    log_co = (price_data["Close"] / price_data["Open"]).apply(np.log)

    rs = log_ho * (log_ho - log_co) + log_lo * (log_lo - log_co)

    def f(v):
        return (trading_periods * v.mean()) ** 0.5

    result = rs.rolling(window=window, center=False).apply(func=f)

    if clean:
        return result.dropna()
    else:
        return result

In [None]:
def yang_zhang(price_data, window=30, trading_periods=252, clean=True):
    log_ho = (price_data["High"] / price_data["Open"]).apply(np.log)
    log_lo = (price_data["Low"] / price_data["Open"]).apply(np.log)
    log_co = (price_data["Close"] / price_data["Open"]).apply(np.log)

    log_oc = (price_data["Open"] / price_data["Close"].shift(1)).apply(np.log)
    log_oc_sq = log_oc ** 2

    log_cc = (price_data["Close"] / price_data["Close"].shift(1)).apply(np.log)
    log_cc_sq = log_cc ** 2

    rs = log_ho * (log_ho - log_co) + log_lo * (log_lo - log_co)

    close_vol = (
        log_cc_sq.rolling(window=window, center=False).sum()
        * (1.0 / (window - 1.0))
    )
    open_vol = (
        log_oc_sq.rolling(window=window, center=False).sum()
        * (1.0 / (window - 1.0))
    )
    window_rs = (
        rs.rolling(window=window, center=False).sum()
        * (1.0 / (window - 1.0))
    )

    k = 0.34 / (1.34 + (window + 1) / (window - 1))
    result = (
        (open_vol + k * close_vol + (1 - k) * window_rs).apply(np.sqrt)
        * math.sqrt(trading_periods)
    )

    if clean:
        return result.dropna()
    else:
        return result

standard_deviation is the close-to-close baseline; hodges_tompkins debiases its small-sample downward bias from overlapping windows. parkinson and garman_klass use high–low (and open–close) ranges for higher efficiency; rogers_satchell is drift-robust, and yang_zhang blends overnight gap and intraday range to handle open–close discontinuities. These options let us see how bar-based measures react to spikes and gaps that closes miss, then pick a stable sizing input.

## Visualize and compare volatility estimates

Plot the close and all six estimators together to inspect when they agree and when they diverge around gaps, trend days, and stress. We care about relative timing and spikes more than exact levels on this axis.

In [None]:
data["Close"].plot()
standard_deviation(data).plot()
parkinson(data).plot()
garman_klass(data).plot()
hodges_tompkins(data).plot()
rogers_satchell(data).plot()
yang_zhang(data).plot()
plt.show()

The curves will not share units with price, so focus on how each estimator jumps and mean-reverts when regimes shift. In practice we would validate a candidate metric by comparing it to realized next-day moves or high–low ranges and then size positions off the most stable forecast. Watch for cases where close-vol looks calm while range-based measures surge; that mismatch is the failure mode we want to eliminate.

<a href="https://pyquantnews.com/">PyQuant News</a> is where finance practitioners level up with Python for quant finance, algorithmic trading, and market data analysis. Looking to get started? Check out the fastest growing, top-selling course to <a href="https://gettingstartedwithpythonforquantfinance.com/">get started with Python for quant finance</a>. For educational purposes. Not investment advice. Use at your own risk.