In this note we will take a look at the data provided for the Stochastic Modelling project. 

Following the information provided in the project documentation, we take today to be 1-Dec-2020. Let us load the option data and look at the data structure.

# Part2:
> 1. Collecting data from SPX_options.csv and SPY_options.csv
> 2. Calculate for average of the bid and offer(mid-price)
> 3. Discount factor:zero_rates_20201201.csv provided
> 4. Adjusting the Displaced-Diffusion(volatility $\alpha$ and shift factor $\beta$) and SABR model(Initial volatility: $\alpha$, correlation:$\rho$, volatility:v)
> 5. Goal: find the optimal parameter
> 6. Model parameter from the adjusting model: use the adjusted result to calculate the implied volatility, and plot the 'smile curve' and Implied market volatility.

### step1.

In [48]:
import datetime as dt
import pandas as pd
import numpy as np
from scipy.stats import norm
from scipy.optimize import brentq
from scipy.optimize import minimize

from pprint import pprint


spx_df = pd.read_csv('SPX_options.csv')
spx_df.head()

Unnamed: 0,date,exdate,cp_flag,strike_price,best_bid,best_offer,exercise_style
0,20201201,20201218,C,100000,3547.6,3570.5,E
1,20201201,20201218,C,200000,3447.6,3470.5,E
2,20201201,20201218,C,300000,3347.7,3370.6,E
3,20201201,20201218,C,400000,3247.7,3270.6,E
4,20201201,20201218,C,500000,3147.7,3170.6,E


- 'date' is the current date as mentioned above
- 'exdate' is the expiry date of the option
- 'cp_flag' will be 'C' for call options and 'P' for put options
- 'strike_price' is the strike price multiplied by 1000 (*why do this?*)
- 'exercise_style' is 'E' for European, and 'A' for American

### step2.

For each strike, the mid price is calculated as

\begin{equation*}
\begin{split}
\mbox{Mid Price} = \frac{\mbox{Best Bid} + \mbox{Best Ask}}{2}
\end{split}
\end{equation*}



In [31]:
spx_df['exdate'].unique()# 到期日提取

array([20201218, 20210115, 20210219])

In [32]:
spx_df['cp_flag'].value_counts()

cp_flag
C    1036
P    1036
Name: count, dtype: int64

In [62]:
# 计算每个期权的中间价
spx_df['mid_price'] = (spx_df['best_bid'] + spx_df['best_offer']) / 2
mid_prices = spx_df['mid_price'].values  # 提取为 numpy 数组
print(mid_prices)  # 确认 mid_prices 是一个数组

[3559.05 3459.05 3359.15 ... 1543.15 1642.95 1741.85]


We can calculate the time-to-maturity $T$ in Python as follows:

In [61]:
today = dt.date(2020, 12, 1)# 提取到期时间（年化）
expiries = [pd.Timestamp(str(x)).date() for x in spx_df['exdate'].unique()]
T = [(exdate-today).days/365.0 for exdate in expiries]
# pprint(dict(zip(expiries, T)))
# print(T)


The file "zero_rates_20201201.csv" contains information about the "zero rates" to be used for discounting.


In [34]:
strike_prices = spx_df['strike_price']# 行权价
strike_prices

0        100000
1        200000
2        300000
3        400000
4        500000
         ...   
2067    5000000
2068    5100000
2069    5200000
2070    5300000
2071    5400000
Name: strike_price, Length: 2072, dtype: int64

In [35]:
rates_df = pd.read_csv('zero_rates_20201201.csv')
rates_df.head()

Unnamed: 0,date,days,rate
0,20201201,7,0.10228
1,20201201,13,0.114128
2,20201201,49,0.21648
3,20201201,77,0.220707
4,20201201,104,0.219996


In [36]:
rates_df.tail()

Unnamed: 0,date,days,rate
40,20201201,3212,0.878441
41,20201201,3303,0.898843
42,20201201,3394,0.918827
43,20201201,3485,0.938031
44,20201201,3576,0.956515


### step3.

Note that the interest rates provided in the 'rate' column are in % unit. So for instance, to discount a cashflow paid 49 days from today, the discount factor is

\begin{equation*}
\begin{split}
D(0,T) = e^{-0.00216480 \times \frac{49}{365}}
\end{split}
\end{equation*}


If the payment date is not provided in the dataframe, you can perform linear interpolation for the corresponding zero rate.

In [37]:
from scipy.interpolate import interp1d

days_to_expiry1 = (expiries[0] - today).days# 获取到期时间对应的天数
print(days_to_expiry1)

17


In [38]:
zero_rate_curve = interp1d(rates_df['days'], rates_df['rate'])# 构造插值函数
rate1 = zero_rate_curve(days_to_expiry1)# 到期时间的贴现率
print(rate1)

0.12550044444444444


In [39]:
# calculate the Discount factor for multi-exp time
days_to_expiries = [(ex - today).days for ex in expires]
rates = zero_rate_curve(days_to_expiries)  # 插值得到对应的零息利率
discount_factors = np.exp(-rates * np.array(days_to_expiries) / 365)  # calculate Discount factor

In [40]:
discount_factors

array([0.99417182, 0.97502981, 0.95279375])

### step4.Displaced-Diffusion model(Calibrated parameter: $\sigma, \beta$)

In [50]:
# Black-Scholes pricing function
def BlackScholesCall(S, K, r, sigma, T):
    d1 = (np.log(S/K)+(r+sigma**2/2)*T) / (sigma*np.sqrt(T))
    d2 = d1 - sigma*np.sqrt(T)
    return S*norm.cdf(d1) - K*np.exp(-r*T)*norm.cdf(d2)

In [51]:
# S = 100.0
# r = 0.05
# T = 2.0
# K = S * np.exp(r*T)
# sigma = 0.4

In [52]:
# print('Call price: %.4f' % BlackScholesCall(S, K, r, sigma, T))

##### Modified based on Black-Scholes pricing function:
> 1.Added parameter Beta\
> 2.adjust the formula mode of the d1 and d2 based on the shifted of the S\
> 3.The portion of the return option price has not changed

In [63]:
def displaced_diffusion_price(S, K, T, r, sigma, beta, option_type='call'):
    # 计算位移后的标的资产价格
    S_shifted = beta * S + (1 - beta) * K# S is replaced by the underlying price after the shift
    d1 = (np.log(S_shifted / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))
    d2 = d1 - sigma * np.sqrt(T)
    
    # 计算期权价格
    if option_type == 'call':
        return S_shifted * norm.cdf(d1) - K * np.exp(-r * T) * norm.cdf(d2)
    elif option_type == 'put':
        return K * np.exp(-r * T) * norm.cdf(-d2) - S_shifted * norm.cdf(-d1)


#### Minimize the error between the market price and the model price：
> 1. Using the displaced_diffusion_price function to calculate the model price
> 2. Comparing the model and the market price and returning the sum of squares of error(误差平方和)

In [64]:
def calibration_objective(params, market_prices, S, K, T, r, option_type):
    sigma, beta = params  # 参数列表
    model_prices = [
        displaced_diffusion_price(S, k, t, r, sigma, beta, option_type) for k, t in zip(K, T)
    ]
    # 误差平方和
    return np.sum((np.array(model_prices) - np.array(market_prices))**2)

### step5. find the optimal parametre

In [65]:
strike_prices

0        100000
1        200000
2        300000
3        400000
4        500000
         ...   
2067    5000000
2068    5100000
2069    5200000
2070    5300000
2071    5400000
Name: strike_price, Length: 2072, dtype: int64

In [66]:
initial_guess = [0.2, 0.5]

# 市场数据
S = 3662.45  # 当前标的资产价格（示例）
K = strike_prices  # 行权价列表
T = T  # 到期时间列表（年化）
r = rates  # 零息利率
market_prices = mid_prices  # 市场中间价

# 校准
result = minimize(calibration_objective, initial_guess, args=(market_prices, S, K, T, r, 'call'))
optimal_sigma, optimal_beta = result.x

print("Optimal sigma:", optimal_sigma)
print("Optimal beta:", optimal_beta)

ValueError: operands could not be broadcast together with shapes (3,3) (2072,) 

### step6. Verification and plotting

In [None]:
# 使用最优参数计算模型价格
model_prices = [
    displaced_diffusion_price(S, k, t, r, optimal_sigma, optimal_beta, 'call') for k, t in zip(K, T)
]

# 比较模型价格和市场价格
print("Market Prices:", market_prices)
print("Model Prices:", model_prices)

In [None]:
import matplotlib.pyplot as plt

# 隐含波动率计算（使用之前的函数）
implied_vols = [
    impliedCallVolatility(S, k, r, price, t) for k, price, t in zip(K, market_prices, T)
]
model_vols = [
    impliedCallVolatility(S, k, r, price, t) for k, price, t in zip(K, model_prices, T)
]

# 绘制
plt.figure(figsize=(10, 6))
plt.plot(K, implied_vols, label='Market Implied Volatility', marker='o')
plt.plot(K, model_vols, label='Model Implied Volatility (Displaced-Diffusion)', linestyle='--')
plt.xlabel('Strike Price')
plt.ylabel('Implied Volatility')
plt.legend()
plt.title('Implied Volatility Smile')
plt.show()
