# Skew

The options on futures notebook already plotted implied vols
against strike, i.e., option skew. This notebook focuses on
how to model those implied vols.

## Why do we need skew models?

Skew models are functions from strike or delta space to implied volatility.
They depend on the other option pricing parameters as well,
plus additional skew modeling parameters.

Why should we add parameters when we already produced skew curves like this without parameters?

In [1]:
import databento as db
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from finm37000 import (
    OptionType,
    add_underlying,
    add_vol_plot,
    add_vol_range,
    aggregate_ohlcv,
    calc_black,
    calc_call_price_implied_density,
    calculate_option_vols,
    filter_otm,
    fit_polynomial_skew,
    fit_raw_svi,
    fit_spline_skew,
    fit_weighted_piecewise_polynomial_skew,
    get_databento_api_key,
    get_options_chain,
    get_top_of_book,
    layout_total_variance,
    layout_vol,
    temp_env,
    tz_chicago,
)

with temp_env(DATABENTO_API_KEY=get_databento_api_key()):
    client = db.Historical()



In [2]:
interest_rate = 0.04
parent_option = "LO"
underlying_symbol = "CLZ5"
option_label = "LOZ5"

In [3]:
time_range = (
    (
        pd.Timestamp("2025-10-22T08:59:00", tz=tz_chicago),
        pd.Timestamp("2025-10-22T09:00:00", tz=tz_chicago),
    ),
    (
        pd.Timestamp("2025-10-23T12:59:00", tz=tz_chicago),
        pd.Timestamp("2025-10-23T13:00:00", tz=tz_chicago),
    ),
)
volume_start = pd.Timestamp("2025-10-22T07:00:00", tz=tz_chicago)
volume_end = pd.Timestamp("2025-10-22T16:00:00", tz=tz_chicago)

In [4]:
options_chain = get_options_chain(
    parent=parent_option,
    underlying=underlying_symbol,
    start=time_range[0][0],
    client=client,
)
top_prices = {}
for start, end in time_range:
    top_prices[end] = get_top_of_book(
        symbols=[underlying_symbol, *options_chain["raw_symbol"]],
        start=start,
        end=end,
        client=client,
    )

trades = client.timeseries.get_range(
    dataset=db.Dataset.GLBX_MDP3,
    symbols=[underlying_symbol, *options_chain["raw_symbol"]],
    start=volume_start,
    end=volume_end,
    schema="trades",
).to_df()
trade_volume = aggregate_ohlcv(trades)

In [5]:
def get_underlying_high_low(df, underlying_symbol):
    underlying_ohlcv = df.loc[underlying_symbol]
    return underlying_ohlcv["high"], underlying_ohlcv["low"]

In [6]:
underlying_price = {}
otm_options = {}
for timestamp, top_price in top_prices.items():
    with_vol, underlying_price[timestamp] = calculate_option_vols(
        top_price, underlying_symbol, options_chain, interest_rate
    )
    otm_options[timestamp] = filter_otm(with_vol, underlying_price[timestamp])

Cannot find OptionType.PUT vol between lb=1e-05 and ub=4 at strike 0.5: lower_vol=-0.0 target=0.02 upper_vol=4.610428701669767e-06
  F=58.724999999999994 T=0.07186263318112633 r=0.04 mid=0.02
Cannot find OptionType.PUT vol between lb=1e-05 and ub=4 at strike 1.0: lower_vol=-0.0 target=0.02 upper_vol=0.00012341543103482975
  F=58.724999999999994 T=0.07186263318112633 r=0.04 mid=0.02
Cannot find OptionType.PUT vol between lb=1e-05 and ub=4 at strike 1.5: lower_vol=-0.0 target=0.02 upper_vol=0.0007077330711198853
  F=58.724999999999994 T=0.07186263318112633 r=0.04 mid=0.02
Cannot find OptionType.PUT vol between lb=1e-05 and ub=4 at strike 2.0: lower_vol=-0.0 target=0.02 upper_vol=0.0022610414669471984
  F=58.724999999999994 T=0.07186263318112633 r=0.04 mid=0.02
Cannot find OptionType.PUT vol between lb=1e-05 and ub=4 at strike 2.5: lower_vol=-0.0 target=0.02 upper_vol=0.005327964160344435
  F=58.724999999999994 T=0.07186263318112633 r=0.04 mid=0.02
Cannot find OptionType.PUT vol between l

In [7]:
def format_t(ts):
    return ts.strftime("%m/%d %H:%M")


near_atm = (40.0, 150.0)
fig = go.Figure()
for timestamp, otm_df in otm_options.items():
    add_vol_plot(
        fig=fig,
        vol_df=otm_df,
        name=f"OTM vol - {format_t(timestamp)}",
        y_col="iv_midprice",
        strike_range=near_atm,
        mode="lines",
    )
    add_vol_range(fig, vol_df=otm_df, strike_range=near_atm)

add_underlying(
    fig=fig,
    underlying_price=underlying_price[time_range[0][1]],
    text=format_t(time_range[0][1]),
    position="top left",
)
add_underlying(
    fig=fig,
    underlying_price=underlying_price[time_range[1][1]],
    text=format_t(time_range[1][1]),
)
layout_vol(fig=fig, label=option_label, detail="by time")
fig.show()

### Risk assessment

When `CLZ5` moves from \\$58.72/3 on 10/22 to \\$61.68/9 on 10/23, the skew moves as well.


In [8]:
near_atm = (40.0, 150.0)
fig = go.Figure()
for timestamp, otm_df in otm_options.items():
    add_vol_plot(
        fig=fig,
        vol_df=otm_df,
        name=f"OTM vol - {format_t(timestamp)}",
        y_col="iv_midprice",
        strike_range=near_atm,
        mode="lines",
    )

t0 = time_range[0][1]
underlying_change = underlying_price[time_range[1][1]] - underlying_price[t0]
otm_options[t0]["shifted_strike"] = otm_options[t0]["strike_price"] + underlying_change
add_vol_plot(
    fig=fig,
    vol_df=otm_options[t0],
    name=f"{format_t(t0)} - shifted",
    x_col="shifted_strike",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)

add_underlying(
    fig=fig,
    underlying_price=underlying_price[time_range[0][1]],
    text=format_t(time_range[0][1]),
    position="top left",
)
add_underlying(
    fig=fig,
    underlying_price=underlying_price[time_range[1][1]],
    text=format_t(time_range[1][1]),
)
layout_vol(fig=fig, label=option_label, detail="by time")
fig.show()

#### Forward-looking P&L
For all option positions in your book, calculate prospective P&L for hypothetical underlying price moves.
If the skew moves with the underlying price, then these prospective P&Ls lose accuracy.
Not to say that the skew model/movement will be perfectly accurate, but it will make these
estimates more accurate.

#### Greek estimation

Measure risk with greeks, but greeks from option pricing algorithms like Black or trees do not
account for the change in skew.

If you have a model for skew $\sigma(S, K)$ that is plugged into your option pricing function $C(S, K, \sigma(S, K), T, r)$,
$$
\frac{d C}{d S} = \frac{\partial C}{\partial S} + \frac{\partial C}{\partial \sigma} \frac{\partial \sigma}{\partial S}
$$

### Dimensional reduction

Using implied volatility at each strike gives a parameter per strike.
Reducing this to a smaller set of parameters can achieve
* data compression
* interpretability
* trader control


### Pricing

This is intentionally last as a skew model is not a required part of generating prices because prices
are typically the input to the skew model. Typically, theoretical option prices for trading
are generated from a skew model, but the role of the skew model is not to figure out where
the market price should be.

By passing prices through the skew generation process, fitted prices are generated that may
differ from the targets used to imply vol.

The modeling step should be distinct from the trading logic/implementation,
and the trading implementation should be playing a significant role in deciding
how much to quote or trade and when. Trading software should be agnostic
as to whether its theoretical value is coming
from a skew model or simply a reflection of current market prices.

Prices from skew models do have some advantages:
* Avoid overfitting (assuming you are ok with not being on the market)
* Prospective risk prices/software pipeline matches trading.
* Extrapolating to strikes without markets.


## Common skew models

### Curve-fitting style models

Actual skew models used in trading often have few of these theoretical niceties.
They typically make use of some curve fitting procedure to approximate implied
vol targets.

#### Polynomials

A common approach to fitting is to fit one or more polynomials to the skew curve.
This is not based on principle, but some ways to make it palatable is to
by using different basis functions for the polynomials and splitting
them into pieces, e.g., calls and puts.

In [9]:
vol_target = "iv_midprice"
selected_time = time_range[0][1]
df = otm_options[selected_time]
poly_skew = {}
degrees = range(2, 15)
for degree in degrees:
    poly_skew[degree] = fit_polynomial_skew(
        df["strike_price"], df[vol_target], degree=degree
    )
    df[f"poly_skew_deg{degree}"] = poly_skew[degree](df["strike_price"])

#### Degree 2

In [10]:
degree = 2
near_atm = (40.0, 150.0)
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Degree {degree} Polynomial Skew",
    y_col=f"poly_skew_deg{degree}",
    strike_range=near_atm,
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="Quadratic polynomial skew")
fig.show()

#### Degree 4

In [11]:
degree = 4
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Degree {degree} Polynomial Skew",
    y_col=f"poly_skew_deg{degree}",
    strike_range=near_atm,
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="Low degree polynomial skew")
fig.show()

#### Degree 11

In [12]:
degree = 11
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Degree {degree} Polynomial Skew",
    y_col=f"poly_skew_deg{degree}",
    strike_range=near_atm,
    mode="lines",
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="High degree polynomial skew")
fig.show()

An alternative to increasing the degree is to use two smaller degree polynomials
joined ATM. There are at least two ways to force the join:
* polynomial regression with a fixed point
* weighted polynomial regression.

In [13]:
vol_target = "iv_midprice"
atm = underlying_price[selected_time]
degrees = [4, 5]
piecewise_skew = {}
for degree in degrees:
    piecewise_skew[degree] = fit_weighted_piecewise_polynomial_skew(
        df["strike_price"], df[vol_target], atm=atm, degree=degree
    )
    df[f"piecewise_skew_deg{degree}"] = piecewise_skew[degree](df["strike_price"])

#### Degree 4: Piecewise vs. Not

In [14]:
degree = 4
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Degree {degree} Piecewise Polynomial Skew",
    y_col=f"piecewise_skew_deg{degree}",
    strike_range=near_atm,
    mode="lines",
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Degree {degree} Polynomial Skew",
    y_col=f"poly_skew_deg{degree}",
    strike_range=near_atm,
    mode="lines",
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="Polynomial vs. Piecewise Skew")
fig.show()

#### Piecewise Degree 4 vs. 5

In [15]:
degrees = [4, 5]
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)
for degree in degrees:
    add_vol_plot(
        fig=fig,
        vol_df=df,
        name=f"Degree {degree} Piecewise Polynomial Skew",
        y_col=f"piecewise_skew_deg{degree}",
        strike_range=near_atm,
        mode="lines",
    )
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="Piecewise Polynomial Skews")
fig.show()

#### Splines

Along similar lines, curve fitting implied vol is often done with splines.
By default, the splines will go through all the points, typically overfitting.
Different approaches are taken to reduce the overfitting.

In [16]:
percents = (10, 20, 30)
extrapolates = (True, False)
for percent in percents:
    for extrapolate in extrapolates:
        spline_skew = fit_spline_skew(
            df["strike_price"],
            df["iv_midprice"],
            pct=percent / 100,
            extrapolate=extrapolate,
        )
        df[f"spline_skew_{percent}_{extrapolate}"] = spline_skew(df["strike_price"])

In [17]:
percent, extrapolate = 10, False
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Cubic Spline Skew - {percent}%",
    y_col=f"spline_skew_{percent}_{extrapolate}",
    strike_range=near_atm,
    mode="lines",
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail=f"Spline Skew - {percent}%")
fig.show()

Extrapolation is very sensitive. Not to say there are not methods to improve extrapolation,
but naively letting the spline do it is a recipe for disaster.

In [18]:
percent, extrapolate = 10, True
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Cubic Spline Skew - {percent}%",
    y_col=f"spline_skew_{percent}_{extrapolate}",
    strike_range=near_atm,
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail=f"Spline Skew - {percent}%")
fig.show()

### SABR

Professor Hendricks covers this in FINM 37500 Fixed Income Derivatives, so I won't cover it here.
CEV model plus stochastic volatility.
Parameters:
* $\alpha$: vol of vol
* $\beta$: elasticity of variance
* $\rho$: correlation of two Wiener processes driving underlying price and volatility respectively.


### SVI

The stochastic volatility inspired model combines an ability to fit many situations with strong theoretical underpinnings.
Jim Gatheral credits its birth to Merrill Lynch, but he is the best resource on it, e.g., The Volatilty Surface through
Arbitrage-free SVI Volatility Surfaces with Jacquier https://arxiv.org/pdf/1204.0646
There is a large literature on it, and one particular note is that
it has been extended by Timothy Klassen of VolaDynamics to fit even more difficult skew shapes.
https://voladynamics.com/examples/commodity-futures-options

I have not used SVI for production trading (not to imply you can't!!),
but it provides many lessons for any model you may use (useful insight whatever model you have).

#### De-Americanization

SVI theory is primarily based on European option pricing theory, which leads to a common practice of de-americanizing vol surfaces.

That is, given option prices on American options, imply the volatility for each strike, or more often, some tree parameter,
then use that tree parameter to generate a corresponding European price.
Proceed to the analysis as if these generated European prices are the target.

This is an important step for equity options with dividends where the dividend handling in American pricing
is especially tricky. For options on futures, this is less complicated, but it's important to note
because the theory of SVI is primarily based on European pricing.

#### Strike parametrization

How should we parametrize a skew model's strike?

* **Strike**: Strikes are a natural input to a skew model, but they do not always lend themselves to the
clearest theoretical understanding.
* **Delta**: Many traders like to think of options in terms of delta rather than strike. This introduces
two difficulties:
    1. The delta is typically output from the model and cannot be pre-specified.
    1. The delta is not a natural input to option pricing.
* **Moneyness**: $F/K$ removes the level of the prices, translating to relative price.
* **Log-moneyness**: $\log(F/K)$ takes that a step further.
* **Time-normalized log-moneyness**: $\log(F/K)/\sqrt(T)$ removes the most expected time-dependence.
* **Standardized log-moneyness**: $\log(F/K)/ (\sigma \sqrt(T))$ converts to a $z$-score-like quantity, assuming $\sigma$ is the log-normal volatility.
* **Log-strike**: $k = \log(K/F)$

SVI uses log-strike $k$.

#### Volatility parametrization

We have computed Black(-Scholes) implied volatility $\sigma_B(k,t)$, and that is the de facto standard of discussion
for options markets, but SVI literature shows that there are some more natural parametrizations of the volatility surface:

* **Total implied variance**: $w(k,t) = \sigma_B^2(k,t)t$
* **Implied variance (annualized)**: $v(k, t) = \sigma_B^2(k,t) = w(k,t)/t$


#### Raw SVI Parametrization



In [19]:
df["total_variance"] = df["iv_midprice"] ** 2 * df["years_to_expiration"]
df["log_strike"] = np.log(df["strike_price"] / atm)
raw_svi = fit_raw_svi(df["log_strike"], df["total_variance"])
df["raw_svi"] = raw_svi.calc(df["log_strike"])
df["raw_svi_vol"] = (df["raw_svi"] / df["years_to_expiration"]) ** 0.5

In [20]:
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="OTM midprice implied vol",
    y_col="iv_midprice",
    strike_range=near_atm,
    mode="lines",
)
add_vol_plot(
    fig=fig,
    vol_df=df,
    name="Raw SVI fit",
    y_col="raw_svi_vol",
    strike_range=near_atm,
    mode="lines",
)
add_vol_range(fig, vol_df=df, strike_range=near_atm)

layout_vol(fig=fig, label=option_label, detail="Raw SVI fit")
fig.show()

## Properties of good skew models
### No static arbitrage

This is covered in Gatheral's work, or a shorter explication of this point by
Carr and Madan https://engineering.nyu.edu/sites/default/files/2018-09/CarrFinResearchLetters2005.pdf

These arguments rest on European exercise, so if you want to check them for American prices,
you have to de-americanize the prices first.

For the SVI model(s), there are nice parametric formulas for these no-arbitrage conditions,
but the concept applies to any skew you might use.

#### Call spread arbitrage

If $K_1 < K_2$, then $0 \le C(K_1, T) - C(K_2, T) \le K_2 - K_1$. If your theoretical call prices violate those inequalities,
there is _call spread arbitrage_.


#### Butterfly arbitrage

For equidistant strikes $K_1$, $K_2$, $K_3$, a call butterfly is $C(K_1,T) - 2 C(K_2,T) + C(K_3, T)$.

The terminal payoff is nonnegative, so the price must be also, otherwise your skew has _butterfly arbitrage_.

(Full definition allows non-equidistant strikes with appropriate ratios.)


In [21]:
df["iv_mid_density"] = calc_call_price_implied_density(
    df["strike_price"],
    calc_black(
        vol=df["iv_midprice"],
        F=atm,
        K=df["strike_price"],
        r=interest_rate,
        T=df["years_to_expiration"],
        option_type=OptionType.CALL,
    ),
)
df["svi_mid_density"] = calc_call_price_implied_density(
    df["strike_price"],
    calc_black(
        vol=df["raw_svi_vol"],
        F=atm,
        K=df["strike_price"],
        r=interest_rate,
        T=df["years_to_expiration"],
        option_type=OptionType.CALL,
    ),
)
degree = 4
df[f"poly_density_deg{degree}"] = calc_call_price_implied_density(
    df["strike_price"],
    calc_black(
        vol=df[f"poly_skew_deg{degree}"],
        F=atm,
        K=df["strike_price"],
        r=interest_rate,
        T=df["years_to_expiration"],
        option_type=OptionType.CALL,
    ),
)
df[f"piecewise_density_deg{degree}"] = calc_call_price_implied_density(
    df["strike_price"],
    calc_black(
        vol=df[f"piecewise_skew_deg{degree}"],
        F=atm,
        K=df["strike_price"],
        r=interest_rate,
        T=df["years_to_expiration"],
        option_type=OptionType.CALL,
    ),
)

for extrapolate in extrapolates:
    for percent in percents:
        df[f"spline_density_{percent}_{extrapolate}"] = calc_call_price_implied_density(
            df["strike_price"],
            calc_black(
                vol=df[f"spline_skew_{percent}_{extrapolate}"],
                F=atm,
                K=df["strike_price"],
                r=interest_rate,
                T=df["years_to_expiration"],
                option_type=OptionType.CALL,
            ),
        )


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

38 call spreads are outside of [0, 1]

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

1 call spreads are outside of [0, 1]

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`


Series.__getitem__ treating keys as positions is

In [22]:
iv_densities = (
    "iv_mid_density",
    "svi_mid_density",
    f"poly_density_deg{degree}",
    f"piecewise_density_deg{degree}",
    "spline_density_10_False",
)
plot_df = pd.melt(
    df,
    id_vars=["strike_price"],
    value_vars=iv_densities,
    var_name="skew",
    value_name="density",
)
px.line(
    plot_df, x="strike_price", y="density", color="skew", title="Skew-implied density"
)

In [23]:
iv_densities = (
    "iv_mid_density",
    "svi_mid_density",
)
plot_df = pd.melt(
    df,
    id_vars=["strike_price"],
    value_vars=iv_densities,
    var_name="skew",
    value_name="density",
)
px.line(
    plot_df,
    x="strike_price",
    y="density",
    color="skew",
    title="Skew-implied density: Mid-market vs. Raw SVI",
)

In [24]:
iv_densities = (
    "svi_mid_density",
    f"poly_density_deg{degree}",
    f"piecewise_density_deg{degree}",
    "spline_density_10_False",
)
plot_df = pd.melt(
    df,
    id_vars=["strike_price"],
    value_vars=iv_densities,
    var_name="skew",
    value_name="density",
)
px.line(
    plot_df,
    x="strike_price",
    y="density",
    color="skew",
    title="Skew-implied density - SVI vs. Curve-fitting",
)

In [25]:
iv_densities = (
    "spline_density_10_False",
    "spline_density_10_True",
)
plot_df = pd.melt(
    df,
    id_vars=["strike_price"],
    value_vars=iv_densities,
    var_name="skew",
    value_name="density",
)
px.line(
    plot_df,
    x="strike_price",
    y="density",
    color="skew",
    title="Skew-implied density - Problems with Spline extrapolation",
)

#### Calendar arbitrage
If $T_1 < T_2$ and $C(K, T_1) > C(K, T_2)$, then there is a calendar arbitrage.

Note that this is typically stated for equity options where the underlying for two different expirations is the same.
For options on futures, it is often not the case that different option expirations have the same underlying,
but the prices use the forward price of the underlying. Under typical assumptions, these will be assumed to
be compatible.




**Theorem**

A skew admits no calendar arbitrage if and only if $\partial_t w(k, t) \ge 0$.

Again, this assumes European prices for the no-arbitrage part of the argument.

Nonnegative derivative of the total variance with respect to time means that
the total variance curves at different times should not cross each other.

In [26]:
underlying_symbol2 = "CLF6"
option_label2 = "LOF6"
options_chain = get_options_chain(
    parent=parent_option,
    underlying=underlying_symbol2,
    start=time_range[0][0],
    client=client,
)
top_prices = {}
for start, end in time_range:
    top_prices[end] = get_top_of_book(
        symbols=[underlying_symbol2, *options_chain["raw_symbol"]],
        start=start,
        end=end,
        client=client,
    )

trades = client.timeseries.get_range(
    dataset=db.Dataset.GLBX_MDP3,
    symbols=[underlying_symbol2, *options_chain["raw_symbol"]],
    start=volume_start,
    end=volume_end,
    schema="trades",
).to_df()
trade_volume = aggregate_ohlcv(trades)

In [27]:
for timestamp, top_price in top_prices.items():
    with_vol, underlying_price[(underlying_symbol2, timestamp)] = calculate_option_vols(
        top_price, underlying_symbol2, options_chain, interest_rate
    )
    otm_options[(option_label2, timestamp)] = filter_otm(
        with_vol, underlying_price[(underlying_symbol2, timestamp)]
    )

In [28]:
df2 = otm_options[(option_label2, selected_time)]
atm2 = underlying_price[(underlying_symbol2, selected_time)]

In [29]:
df2["log_strike"] = np.log(df2["strike_price"] / atm2)
df2["total_variance"] = df2["iv_midprice"] ** 2 * df2["years_to_expiration"]
raw_svi = fit_raw_svi(df2["log_strike"], df2["total_variance"])
df2["raw_svi"] = raw_svi.calc(df2["log_strike"])
df2["raw_svi_vol"] = (df2["raw_svi"] / df2["years_to_expiration"]) ** 0.5

piecewise_skew2 = fit_weighted_piecewise_polynomial_skew(
    df2["strike_price"], df2[vol_target], atm=atm2, degree=4
)
df2["piecewise_skew_deg4"] = piecewise_skew2(df2["strike_price"])

df["total_midprice_variance"] = df["iv_midprice"] ** 2 * df["years_to_expiration"]
df["piecewise_total_variance_deg4"] = (
    df["piecewise_skew_deg4"] ** 2 * df["years_to_expiration"]
)
df2["total_midprice_variance"] = df2["iv_midprice"] ** 2 * df2["years_to_expiration"]
df2["piecewise_total_variance_deg4"] = (
    df2["piecewise_skew_deg4"] ** 2 * df2["years_to_expiration"]
)

In [30]:
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Raw SVI {option_label}",
    y_col="raw_svi",
    strike_range=near_atm,
)
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"Raw SVI {option_label2}",
    y_col="raw_svi",
    strike_range=near_atm,
)

layout_total_variance(
    fig=fig, label=f"{option_label} vs. {option_label2}", detail="Raw SVI fit"
)
fig.show()

Zooming out reveals that there is, in fact, calendar arbitrage.

In [31]:
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Raw SVI {option_label}",
    y_col="raw_svi",
)
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"Raw SVI {option_label2}",
    y_col="raw_svi",
)

layout_total_variance(
    fig=fig, label=f"{option_label} vs. {option_label2}", detail="Raw SVI fit"
)
fig.show()

In [32]:
mid_range = (10, 180)
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Mid Price {option_label}",
    y_col="total_midprice_variance",
    strike_range=mid_range,
)
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"Mid Price {option_label2}",
    y_col="total_midprice_variance",
    strike_range=mid_range,
)

layout_total_variance(
    fig=fig, label=f"{option_label} vs. {option_label2}", detail="Mid-market IV"
)
fig.show()

In [33]:
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df,
    name=f"Piecewise {option_label}",
    y_col="piecewise_total_variance_deg4",
    strike_range=mid_range,
)
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"Piecewise {option_label2}",
    y_col="piecewise_total_variance_deg4",
    strike_range=mid_range,
)

layout_total_variance(
    fig=fig, label=f"{option_label} vs. {option_label2}", detail="Piecewise Skew"
)
fig.show()

### Smoothness

Besides looking nice, why is smoothness important?


#### POLL

[What is so bad about the following piecewise polynomial skew?](https://www.polleverywhere.com/free_text_polls/al8faAJvCriW4BdCXwJqR)


In [34]:
def approximate_slope(x, y):
    return np.diff(y, prepend=np.nan) / np.diff(x, prepend=np.nan)


df["piecewise_slope"] = approximate_slope(
    y=df["piecewise_skew_deg4"], x=df["strike_price"]
)
df2["piecewise_slope"] = approximate_slope(
    y=df2["piecewise_skew_deg4"], x=df2["strike_price"]
)
df2["svi_slope"] = approximate_slope(y=df2["raw_svi_vol"], x=df2["strike_price"])

In [35]:
fig = go.Figure()
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"SVI Slope {option_label2}",
    y_col="svi_slope",
    strike_range=mid_range,
)
add_vol_plot(
    fig=fig,
    vol_df=df2,
    name=f"Piecewise Slope {option_label2}",
    y_col="piecewise_slope",
    strike_range=mid_range,
)

fig.update_layout(
    title="Skew slope",
    xaxis_title="Strike Price",
    yaxis_title="Slope",
    template="plotly_white",
)
fig.show()

### Extrapolation

How does your model work beyond the strikes used to fit?

We have seen problems with both curve-fitting techniques like splines and SVI parametrizations.

The polynomial fits can be even worse.

### Fitting/Overfitting markets

Price to vol and back to price would ideally generate price between the bid and the ask.

Often it does not.

How can you manage this situation?

#### Trader-focused calibration/validation

It is fairly common that traders want some level of manual theoretical
value override to adapt to particular market conditions like
specific strikes being targeted by customers for which they would
like to quote more or less aggressively than a theoretical skew model
would imply.

In [36]:
def calc_percent_within_bid_ask(
    vol_df, vol_col, bid_col="iv_bid", ask_col="iv_ask"
) -> float:
    return float(
        (
            (vol_df[vol_col] >= vol_df[bid_col]) & (vol_df[vol_col] <= vol_df[ask_col])
        ).mean()
    )


((df["raw_svi_vol"] >= df["iv_bid"]) & (df["raw_svi_vol"] <= df["iv_ask"])).mean()
vol_cols = (
    "raw_svi_vol",
    "piecewise_skew_deg4",
    "poly_skew_deg4",
    "poly_skew_deg11",
    "spline_skew_10_False",
    "iv_midprice",
)
with_market = {col: calc_percent_within_bid_ask(df, vol_col=col) for col in vol_cols}
pd.Series(with_market)

raw_svi_vol             0.393333
piecewise_skew_deg4     0.453333
poly_skew_deg4          0.320000
poly_skew_deg11         0.606667
spline_skew_10_False    0.726667
iv_midprice             0.766667
dtype: float64

These numbers are just to give a flavor for how well we are doing so far.
They are not very good representations of how things would look in practice
with more work and more refinement. For example,
* We are counting cases where implied vol is not fitting one side of the market.
* Raw SVI is in its most basic form. There are really good refinements of this method in the literature and in proprietary systems.
* All cases need non-theo parts of the system to handle how to trade when theos are not on the market.