# Math 164 Project 1 Report - Option Pricing

### We want to analyze the dynamics of underlying security price. 
### We will use the Black-Scholes Formula to estimate the European call/put option price with historical volatility and implied volatility respectively.

## Part I. Black-Scholes Formula

The Black-Scholes formula is a simple way to estimate the option price with the given maturity ($T$), strike price ($K$), current stock price ($S_0$), interest rate (risk-free rate) ($r$), dividend rate ($\delta$) and volatility ($\sigma^2$). In this project, we assume divident rate, $\delta=0$ for simplicity.

Thus, we will use the following Black-Scholes formulas in this project to estimate option prices:

Call Option:
\begin{equation}
C=S_0N(d_1)-Ke^{-rT}N(d_2)
\end{equation}

Put Option:
\begin{equation}
P=Ke^{-rT}N(-d_2)-S_0N(-d_1)
\end{equation}

where
\begin{equation}
d_1=\frac{ln(\frac{S_0}{K})+(r+\frac{1}{2}\sigma^2)T}{\sigma\sqrt{T}}\\
d_2=d_1-\sigma\sqrt{T}
\end{equation}

From Yahoo Finance, we can easily obtain all of the required information except volatility. Thus, the goal of this project is to assess the accuracy of option price estimation using various volatility estimating methods.

## Part II. Preparation

We will be using the data of options of Apple Inc traded on March 20, 2018 for this project. We obtain this dataset from Yahoo Finance.

From treasury.gov, we use the value of Treasury 20-Yr CMT on March 20, 2018 as the interest rate. i.e. $r=3.01\%$

### Preparation

In [None]:
import numpy as np
import scipy.stats
from math import *
import csv
from datetime import datetime
import matplotlib.pyplot as plt
import sys
from scipy.optimize import curve_fit
import statistics
import time
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import learning_curve
from sklearn.kernel_ridge import KernelRidge

To start the project, we import all dependencies needed. We will be using sci-kit learn to approximate graphs in Part IV. Implied Volatility.

In [None]:
### GLOBAL VARIABLES ###
# We set risk-free rate at 3.01%. Data obtained from treasury.gov, using Treasury 20-yr CMT
RATE = 0.0301
DAY = 365
BEST = 1
DATE = '0-0-0'
DATE = '3/20/2018'   ### DEBUG ###
PT_NUM = 10

We introduce some global variables. Clearly, the risk free rate is constant throughout. We set the total days in a year as 365. We can also use 261, which is the total number of calendar working days in 2018. This number will not have a noticeable influence to our result. DATE is initialized for convenience. PT_NUM is a parameter indicating the minimum number of points required for Implied Volatlity estimation in Part IV.

In [None]:
# Load historical option data on 2018/3/20
with open("../data/AAPL_032018.csv") as data:
    data_reader = csv.reader(data)
    raw_data = list(data_reader)

# Load historical price
with open("../data/AAPL_HP.csv") as data:
    data_reader = csv.reader(data)
    raw_hp = list(data_reader)

AAPL_032018.csv stores the details of options traded on 3/20/2018. 

AAPL_HP.csv stores the historical stock prices of Apple Inc from 2/4/2010 to present. We import these 2 tables.

In [None]:
logg = open("../log/log.txt", "w")

# Write log info into local log.txt
def write_log(*args, sep=' ', end='\n', file=None):
    # print(string, therest)
    print(*args, sep=' ', end='\n', file=None)
    # print(string, file=log)
    print(*args, sep=' ', end='\n', file=logg)

# Write report into local file
def write_report_into_file(report_list, filename="report.csv"):
    with open("../report/" + filename, 'w') as book:
        wr = csv.writer(book, delimiter=',', lineterminator='\n')
        for row in report_list:
            wr.writerow(row, )

We initiate 2 helper functions here. Function write_log will write all outputs into the log file at "../log/log.txt". Function write_report_into_file will write the input list into the report table at "../report/report.csv" (default).

## Part III. Historical Volatility

In this section, we assume volatility is constant throughout the day for all options. We will estimate the volatility on March 20, 2018 by learning the historical stock prices of Apple Inc. 

We use MLE to calculate the volatility of the day:
\begin{equation}
\bar{\sigma}^2=\frac{1}{n}\sum_{t=1}^n(R_t-\bar{R})^2
\end{equation}
where
\begin{equation}
R_t=log(\frac{S_t}{S_{t-1}})
\end{equation}

### Estimate Historical Volatility (HV)

In [None]:
# Calculate average R_t in the given data
def average_volatility(raw, start=1, end=0):
    # parameter:
        # raw - matrix of history price
    r = len(raw)
    count = 0
    # print("Number of records = " + str(r))
    sum = 0
    for i in range (start + 1, r):
        # if i <= start:
        #     continue
        if end != 0 and i > end:
            break
        p1 = float(raw[i-1][4])
        p2 = float(raw[i][4])
        temp1 = log(p2/p1)
        # print(temp1)
        sum = sum + temp1
        count = count + 1
    avg = sum/count
    # print("Sum is " + str(sum))
    # print("Avg is " + str(avg))
    return avg

Function average_volatility will return $\bar{R}$ which is required to calculate historical volatility.

In [None]:
# Calculate estimated variance - volatility
def get_volatility(raw, start=1, end=0):
    # parameter:
        # raw - matrix of history price (including header)
    r = len(raw)
    sum = 0
    count = 0
    avg = average_volatility(raw, start=start, end=end)
    for i in range(start+1, r):
        if end != 0 and i > end:
            break
        p1 = float(raw[i-1][4])
        p2 = float(raw[i][4])
        temp1 = log(p2/p1)
        sum = sum + temp1 * temp1
        count = count + 1

    # Use variance formula which \sum (X-E[X])^2 = E[X^2] - E[X]^2
    E_X2 = sum/count
    var = E_X2 - avg * avg
    # We times volatility of one business day with #business days
    var = 260 * var
    print("Estimated volatility^2 is " + str(var))
    return var

Function get_volatility will return $\bar{\sigma}^2$, which is the historical volatility (HV) that we want.

### Evaluation of HV

After obtaining HV, we want to assess its accuracy in predicting option prices. We will plug in the calculated HV to the Black-Scholes Formula to obtain the estimated put/call option prices. We will compare these estimated results with the real results to check the accuracy of HV.

In [None]:
def get_maturity(raw_trade, raw_expire):
    # parameter:
        # raw_trade - trade date of the option
        # raw_expire - expiration date of the option
    # print("GET_MATURITY CALLED", raw_trade, raw_expire)
    trade_date = datetime.strptime(raw_trade, '%m/%d/%Y')
    trade_days = float(datetime.strftime(trade_date, '%j')) * DAY / 365
    trade_year = float(datetime.strftime(trade_date, '%Y'))
    expire_date = datetime.strptime(raw_expire, '%m/%d/%Y')
    expire_days = float(datetime.strftime(expire_date, '%j')) * DAY / 365
    expire_year = float(datetime.strftime(expire_date, '%Y'))
    expire_days = expire_days + (expire_year - trade_year) * DAY
    day_difference = expire_days - trade_days
    h = day_difference / DAY        # Length of maturity
    return h

Function get_maturity will return maturity in number of years.

In [None]:
# Calculate d1 and d2
def get_d1_d2(sd, h, s, k):
    # parameter:
        # sd - Volatility
        # h - Maturity
        # s - Stock Price, S_0
        # k - Strike Price, K
    d1 = (1/(sd * sqrt(h))) * ((log(s/k)) + (RATE + ((sd * sd)/2)) * h)
    # Use d2 = d1 - sqrt(vol) * sqrt(maturity)
    d2 = d1 - ( sd * sqrt(h) )
    # print(d1, d2)
    return d1, d2

Function get_d1_d2 will return values of d1 and d2 for Black-Scholes formula.

In [None]:
# Estimate option price using BS formula
def bs_call_put(h, s, k, sd):
    # parameter:
        # h - Maturity
        # s - Stock Price, S_0
        # k - Strike Price, K
        # sd - Volatility
    d1, d2 = get_d1_d2(sd, h, s, k)
    # We ignore divident here
    N1 = scipy.stats.norm.cdf(d1)
    N2 = scipy.stats.norm.cdf(d2)
    call = (N1 * s) - (N2 * k * exp(- RATE * h))
    N_1 = 1 - N1
    N_2 = 1 - N2
    put = (N_2 * k * exp(- RATE * h)) - (s * N_1)
    return call, put

Function bs_call_put will return call and put option prices with the given parameters.

In [None]:
# Calculate estimated call option price
def estimate_call_put(raw, vol, start=1, end=0):
    # parameters:
        # raw - raw_data (including header)
        # vol - estimated volatility^2
        # start - starting row
        # end - ending row
    r = len(raw)
    # Calculate Standard Deviation (Vol)
    sd = sqrt(vol)
    # print("Standard Deviation = Volatility is " + str(sd))
    # Obtain maturity of option

    if sd == 0:
        # If volatility is 0, call/put options will have price of 0 
        # since future prices of stock price is fully predicatable with no risk.
        return [[0,0]]*r

    output = []

    for i in range(r):
        # Get T-t
        if i < start:
            continue
        if i >= end and end != 0:
            break
        raw_trade = raw[i][3]
        raw_expire = raw[i][10]
        year_difference = get_maturity(raw_trade, raw_expire)
        # print("Maturity is " + str(year_difference))

        # Get Initial Stock price
        S = float(raw[i][26])
        # print("Stock price is " + str(S))

        # Get Strike price
        K = float(raw[i][11])
        # print("Strike price is " + str(K))

        # Get call, put option premium
        call, put = bs_call_put(year_difference, S, K, sd)
        output.append([call, put])
        # print((call, put))
        
    return output

Function estimate_call_put will take in the raw option dataset and the HV to estimate option prices. It will return a list of estimated calls and puts prices.

In [None]:
# Evaluate how accurate my estimation is
def check_estimation(est, raw, start=1, end=0, plot=0):
    # parameters:
        # est - estimated call & put prices
        # raw - raw_data (including header)
        # start - starting row
        # end - ending row
    r = len(raw)
    diff_table = []
    perc_table = []
    est_l = []
    act_l = []
    for i in range(start, r):
        if i >= end and end != 0:
            break
        act_i = float(raw[i][5])
        act_l.append(act_i)
        if raw[i][12] == 'C':
            est_i = float(est[i-start][0])
            abs_diff = abs(act_i-est_i)
            diff_table.append(abs_diff)
            perc_table.append(abs_diff/act_i)
            est_l.append(est_i)
        elif raw[i][12] == 'P':
            est_i = float(est[i-start][1])
            abs_diff = abs(est_i - act_i)
            diff_table.append(abs_diff)
            perc_table.append(abs_diff/act_i)
            est_l.append(est_i)
        else:
            print("Neither C or P detected at i = " + str(i))

    # Print average absolute difference and percentage difference
    num = 0
    if (end == 0):
        num = r - start
    else:
        num = end - start
    avg_abs_diff = sum(diff_table)/num
    avg_perc_diff = sum(perc_table)/num
    print("Average absolute error is " + str(avg_abs_diff))
    print("Average percentage error is " + str(avg_perc_diff))
    global BEST
    if BEST > avg_perc_diff:
        BEST = avg_perc_diff
    print("Best result: " + str(BEST))

    # Plot Dots vs Linear
    if plot:
        plt.scatter(est_l, act_l)
        plt.plot([1],[1])
        plt.xlabel('estimated data')
        plt.ylabel('actual data')
        plt.show()

    return avg_abs_diff, avg_perc_diff

Function check_estimation will tell us the average absolute difference and average percentage difference of our estimated option prices using HV. We can also plot a graph showing the distribution of estimated option prices against the actual prices in scatter.

In [None]:
report = [['First Period', 'last Period', 'Risk Free Rate', 'Estimated Volatility', 'Avg Abs Err', 'Avg Perc Err']]

len_hp = len(raw_hp)
rate = RATE

for i in range(1, len_hp, 20):
    k = 2045   # fix to the last day
    # print(i, k)
    # Estimate volatility using data from i to k
    print("Using rate = " + str(RATE))
    print("Using data from " + raw_hp[i][0] + " to " + raw_hp[k][0])
    volatility = get_volatility(raw_hp, start=i, end=k)
    call_put_table = estimate_call_put(raw_data, volatility)
    avg_abs_err, avg_perc_err = check_estimation(call_put_table, raw_data)
    report.append([raw_hp[i][0], raw_hp[k][0], RATE, volatility, avg_abs_err, avg_perc_err])

for i in range(1995, 2045):
    # k = len_hp - j - 1
    k = 2045   # fix to the last day
    # Estimate volatility using data from i to k
    print("Using rate = " + str(RATE))
    print("Using data from " + raw_hp[i][0] + " to " + raw_hp[k][0])
    volatility = get_volatility(raw_hp, start=i, end=k)
    call_put_table = estimate_call_put(raw_data, volatility)

    avg_abs_err, avg_perc_err = check_estimation(call_put_table, raw_data)
    report.append([raw_hp[i][0], raw_hp[k][0], RATE, volatility, avg_abs_err, avg_perc_err])

We check HV of different periods. The results of HV estimation and evaluation are stored in [../report/report_HV_assessment.csv](../report/report_HV_assessment.csv).

In [None]:
write_report_into_file(report, "report_HV_assessment.csv")

Write the report generated above into a csv file. 

### Concluson of Historical Volatility

Among all obtained HV, the best result gives average percentage error at 23.5%, which is not a satisfactory result. 

HV is not accurate as expected since the volatility is very volatile itself. It could vary every second. It is indeed stochastic. Thus, using historical data in the past days or months is not a good idea to approximate volatility.

## Part IV. Implied Volatility

As we have shown in Part III that volatility is so volatile that using data in a long period time in the past is not a good idea, we believe using data in a short time period that is very close to the trade time would give us a more accurate estimation of volatility, and thus, a more accurate estimation of option prices.

The idea of Implied Volatility (IV) method is to use the exisiting option information to obtain the volatility of each option through inverse Black-Scholes formula and use them to estimate the volatility and calculate the option price. If we want to estimate the price of an option, we will select existing options with the same maturity (ideal case), obtain IV of these options and interpolate a graph with IV against K. We can use this graph to estimate the volatility according to the K of this particular option and plug it into the Black-Scholes formula and calculate the option price. Since volatility is very volatile and it is not constant across strike prices nor over time, we will need to interpolate each graph for each particular option.

### Establish IV Table

The idea procedure to estimate option price using IV will be to select the suitable existing options before calculate their IV. As such, we will be able to estimate the price of a desired option in the shortest time period. However, in order to save time in later mass performance assessment, we calculate the IV of all options in advance since each option will be used in estimating multiple "future" options.

In [None]:
# Returns a table of implied volatility
def create_iv_csv(raw):
    # parameters:
        # raw - raw_data (including header)
    output = [[0]]
    r = len(raw)
    for i in range(1, r):
        # Option type
        opt_t = raw[i][12]
        oc = 1

        # Set up constants
        # Option price
        opt_p = float(raw[i][5])
        # Strike price
        k = float(raw[i][11])
        # Initial stock price
        s = float(raw[i][26])
        # Period
        raw_trade = raw[i][3]
        raw_expire = raw[i][10]
        h = get_maturity(raw_trade, raw_expire)

        # # Set up BS to obtain N(d1) and N(d2)
        # # Fact1: N(d1) and N(d2) takes value from 0 to 1
        # # Fact2: N(d1) > N(d2)
        # # Thus, I will initialize N(d1) as 1 and lessen it accordingly to see what's the pattern
        # Nd1 = 0.92
        if (opt_t == 'P'):
            oc = 0

        # Estimate volatility using brute force (GD)
        i_sd = 0.01
        gap = 0.1
        count = 1
        while 1:
            i_c, i_p = bs_call_put(h, s, k, i_sd)
            if oc:
                if i_c > opt_p:
                    i_sd = i_sd - gap
                    gap = gap * 0.1
                    count = count + 1
            else:
                if i_p > opt_p:
                    i_sd = i_sd - gap
                    gap = gap * 0.1
                    count = count + 1
            if count == 16:
                output.append([i_sd])
                break
            i_sd = i_sd + gap
    return output

Function create_iv_csv will return a list of IV of each option. The result IV table is saved at ["../data/IV.csv"](../data/IV.csv). The existing version of IV table will have 3 extra columns which will no longer be used.

### Estimate an Option Price using IV

Now we have obtained IV of each option. In this section, we will show how to use this IV table to produce an estimation of option price. Due to the lack of data, we will not estimate the price of options traded at 9:30.

In [None]:
def convert_input(time='-1', k='-1', p='-1'):
    # parameters:
        # time - time string
        # k - strike price string
        # p - expiration date string
    
    # Validate input
    try:
        k = float(k)
    except:
        k = -1
    try:
        tm = datetime.strptime(time,'%H:%M:%S.%f')
    except:
        print ("Error (Time):", sys.exc_info()[0])
        tm = -1
    try:
        p = get_maturity(DATE, p)
    except:
        p = -1
    return tm, k, p

Function convert_input serves to take in raw inputs from data and return easy-to-use data. It converts time string into datetime format in Python. It converts K from string to float. It also calculates the maturity period in number of years.

In [None]:
def build_database(raw, iv, tm, interval=1):
    # parameters:
        # raw - raw_data (including header)
        # iv - iv_table (including header)
        # tm - time of option trade - interval
        # interval - the interval (in minutes) which data in the past will be added to database
        
    tm2 = tm + timedelta(minutes=interval)

    # Handle overflow
    start_time = datetime.strptime("9:30", "%H:%M")
    if (tm - start_time).total_seconds() < 0:
        tm = start_time
    if (tm2 - start_time).total_seconds() < 0:
        # Time given before 9:30AM. Not valid
        return [[-1,-1,-1,-1,-1,-1,-1]]

    # Extract useful data based on input to build a temporary database
    database = []
    # Format of database:
        # 0. Option Trade Price
        # 1. Option Maturity/Period (Expiration Date - Trade Date) (in term of year)
        # 2. Strike Price
        # 3. Call or Put
        # 4. Price of Underlying Asset
        # 5. Implied Volatility (Standard Deviation)
        # 6. Time in datetime
    size = len(iv)
    for i in range(1, size):
        this_time = datetime.strptime(raw[i][27], "%H:%M:%S.%f")
        if (this_time-tm).total_seconds() >= 0:
            if (tm2-this_time).total_seconds() > 0:
                if float(iv[i][0]) > 0.05:
                    database.append([float(raw[i][5]),                      # [0]Option Trade Price
                                     get_maturity(raw[i][3], raw[i][10]),   # [1]Maturity
                                     float(raw[i][11]),                     # [2]Strike Price
                                     raw[i][12],                            # [3]Put or Call
                                     float(raw[i][26]),                     # [4]Stock Price
                                     float(iv[i][0]),                       # [5]Implied Volatility (sd)
                                     this_time                              # [6]Time in datetime
                                    ])
                else:
                    continue
            else:
                break

    return database

Function build_database will return a database containing limited amount of option data that are useful and relevant to the target option.

In [None]:
def single_iv_estimation_svr(raw, iv, tm, kk, pp, plot=1, method="svr"):
    # parameters:
        # raw - raw_data (including header)
        # iv - iv_table (including header)
        # tm - time of the option trade
        # kk - strike price of the option
        # pp - maturity of the option
        # method - use svr or kr
        
    # Extract database - clean data
    timer_start = time.time()
    database_f = []
    left_bound_ind = 0
    right_bound_ind = 0
    while len(database_f) < 10 or not left_bound_ind or not right_bound_ind:
        if len(database_f) > 200:
            break
        tm = tm - timedelta(minutes=2)
        database_5 = build_database(raw, iv, tm, 2)

        if len(database_5) == 0:
            continue
        if database_5[0][0] == -1:
            break

        # Fix Underlying Asset Price
        uap_list = []
        MAX = 0
        MIN = 10000000
        for row in database_5:
            uap_list.append(row[4])
        current_price = statistics.median(uap_list)
        
        # Filter data based on maturity
        current_maturity = pp
        for row in database_5:
            if row[1] == current_maturity:
                database_f.append(row)
                if row[2] < kk:
                    left_bound_ind = 1
                elif row[2] > kk:
                    right_bound_ind = 1
    write_log("database_f size: ", len(database_f))
    if len(database_f) < 10:
        return -1, -1

    # Fit a smile curve based on prepared database_f using SVR/KR
    if method=="svr":
        svr = GridSearchCV(SVR(kernel='rbf'), cv=3,
                       param_grid={
                                    "C": [1e4],
                                   'epsilon':[0.0005],
                                   "gamma": [1e-4, 1e-5, 1e-6]
                                   })

    ##### Parameters for KR is not tuned #####
    if method=="kr":
        kr = GridSearchCV(KernelRidge(kernel='rbf'), cv=3,
                      param_grid={
                                  "alpha": [5e-2, 1e-2, 5e-3, 1e-3, 5e-4],
                                  "gamma": [1e-5, 5e-5, 1e-4]
                                  })
    
    # Form X - List of strike price & y - List of iv
    X = []
    y = []
    for row in database_f:
        X.append([row[2]])
        y.append(row[5]*row[5])
        if row[2] > MAX:
            MAX = row[2]
        if row[2] < MIN:
            MIN = row[2]

    if method=="svr":
        svr.fit(X, y)
    elif method=="kr":
        kr.fit(X, y)
    if method == 'svr':
        write_log("Estimation Model: SVR")
        write_log("Best parameters:", svr.best_params_)
        write_log("Best score:", svr.best_score_)
    elif method == 'kr':
        write_log("Estimation Model: KR")
        write_log("Best parameters:", kr.best_params_)
        write_log("Best score:", kr.best_score_)

    # Plot
    if plot:
        X_plot = np.linspace(80, 340, 260)[:, None]
        if method=="svr":
            y_svr = svr.predict(X_plot)
            plt.plot(X_plot, y_svr, c='r', label='fit: SVR')
        elif method=="kr":
            y_kr = kr.predict(X_plot)
            plt.plot(X_plot, y_kr, c='grey', label='fir: KR')
        plt.scatter(X, y, s=5, label='Actual Data')
        plt.xlabel('Strike Price (K)')
        plt.ylabel('Implied Volatility (σ^2)')
        plt.legend()
        plt.show()

    if method=="svr":
        est_vol = svr.predict([[kk]])
    elif method=="kr":
        est_vol = kr.predict([[kk]])
    if est_vol[0] < 0:
        return -2, -2
    est_sd = sqrt(est_vol[0])

    timer_stop = time.time()

    est_call, est_put = bs_call_put(current_maturity, current_price, kk, est_sd)
    write_log("Time used:", str(timer_stop-timer_start), "seconds")
    write_log("The estimated implied volatility = ", est_vol)
    write_log("The estimated standard deviation = ", est_sd)
    write_log("The estimated Call option price = ", est_call)
    write_log("The estimated Put option price  = ", est_put)
    return est_call, est_put

Function single_iv_estimation_svr estimates the option price (both call and put) with the given input. This function gives estimation of option price using IV method in the following procedure:
1. Build a database including data in the past 2 minutes use function build_database.
2. From the obtained database, select data that has the same maturity to database_f. If there is not enough data added to database_f, build another database using data in the previous 2 minutes. Iterate until the database_f has enough data or we run out of data. 
3. After building database_f, we use this database to interpolate the "smile curve". We use the Support Vector Regression (SVR) with Radial Basis Function (RBF) kernel to fit the curve to data points (K, $\sigma$). We also use GridSearchCV to do 3-fold cross validation to ensure we obtain the best fit of the data.
4. By plugging the strike price of the target option into the interpolated "smile curve", we obtain the volatility of the that option. Then we plug this volatility into the Black-Scholes formula to obtain the estimated option prices using the function bs_call_put defined earlier. 
5. We also used a timer to record the time required to estimate the option price using IV method.
6. The plot function is reserved. We can simply change the input of the function to enable plotting.

(We can also use Kernel Bridge (KR) to fit the curve but I have not found optimal parameters to interpolate a good curve. Currently SVR gives a better interpolation.)

### Evaluation of IV

We just showed how to use the IV method to estimate the option price. Now we want to assess the performance of this method. We will use the function single_iv_estimation_svr to estimate the option price for each option traded from 9:31 onwards on 3/20/2018 and compare the estimated price with the actual price.

In [None]:
def mass_iv_assessment_svr(raw, iv, plot=0, specific=-1, method="svr"):
    # parameters:
        # raw - raw_data (including header)
        # iv - iv_table (including header)
        # plot - plot=1, default=0
        # specific - assess performance for a specific option, usually used in handling abnormal data and large difference data
        # method - svr or kr
        
    diff_table = []
    perc_table = []
    diff_call_table = []
    diff_put_table = []
    large_diff_table = [["Index", "Percentage Error", "Type", "Estimated Option Price", "Actual Option Price"]]
    abnormal_table = []
    perc_err_table = [["index", "Percentage Error"]]
    cumm_perc_err_table = [["Index", "Cumulative Percentage Error"]]

    for i in range(1, size): 
        if specific != -1:
            i = specific
            plot = 1
        write_log(i, iv[i])
        write_log(raw[i])
        if iv[i][1] == '9' and iv[i][2] == '30':
            write_log("9:30 detected")
            continue
        # Ignore those with volatility < 0 (Abnormal data)
        if float(iv[i][0]) < 0:
            continue

        tm, kk, pp = convert_input(raw[i][25], raw[i][11], raw[i][10])
        option_type = raw[i][12]

        est_call, est_put = single_iv_estimation_svr(raw, iv, tm, kk, pp, plot=plot, method=method)
        if est_call == -1 and est_put == -1:
            write_log("Not enough data to predict option price.")
            abnormal_table.append(i)
            continue
        elif est_call == -2 and est_put == -2:
            write_log("Negative iv detected")
            abnormal_table.append(i)
            continue

        actual_price = float(raw[i][5])
        write_log("Actual Option:", option_type, actual_price)
        diff = 0
        if option_type == 'C':
            diff = abs(est_call - actual_price)
            diff_call = est_call - actual_price
            diff_call_table.append(diff)
        elif option_type == 'P':
            diff = abs(est_put - actual_price)
            diff_put = est_put - actual_price
            diff_put_table.append(diff)
        else:
            write_log("Option Type Error: ", option_type)
            continue
        diff_table.append(diff)
        perc_table.append(diff/actual_price)
        if diff/actual_price > 0.10:
            if option_type == 'C':
                large_diff_table.append([i, diff/actual_price, option_type, est_call, actual_price])
            if option_type == 'P':
                large_diff_table.append([i, diff/actual_price, option_type, est_put, actual_price])

        write_log("Perc diff is ", diff/actual_price)
        perc_err_table.append([i, diff/actual_price])
        write_log("Cumulative Avg Perc Diff =", sum(perc_table)/len(diff_table))
        cumm_perc_err_table.append([i, sum(perc_table)/len(diff_table)])

        if specific != -1:
            break

    size = len(diff_table)
    avg_abs_diff = sum(diff_table)/size
    avg_perc_diff = sum(perc_table)/size
    avg_diff_call = 0
    avg_diff_put = 0
    if len(diff_call_table) != 0:
        avg_diff_call = sum(diff_call_table)/len(diff_call_table)
    if len(diff_put_table) != 0:
        avg_diff_put = sum(diff_put_table)/len(diff_put_table)
    write_log("Large Difference Table")
    write_log(large_diff_table)
    write_report_into_file(large_diff_table, "large_diff_table.csv")
    write_log("\n\n\nAbnormal Table")
    abnormal_table = list(map(lambda x: [x], abnormal_table))
    write_log(abnormal_table)
    write_report_into_file(abnormal_table, "abnormal_table.csv")
    write_log("Average absolute error is " + str(avg_abs_diff))
    write_log("Average percentage error is " + str(avg_perc_diff))
    write_log("Average (est_call_5 - actual_price) = ", avg_diff_call)
    write_log("Average (est_put_5 - actual_price) = ", avg_diff_put)

    write_report_into_file(perc_err_table, "perc_error_table.csv")
    write_report_into_file(cumm_perc_err_table, "cumm_perc_err_table.csv")

Then simply set parameters for this function and run it for all data we have.

In [None]:
# Assess the accuracy of current IV method (SVR/KR)
METHOD = "svr"
DATE = raw_data[1][3]
write_log("Date is ", DATE)
mass_iv_assessment_svr(raw_data, iv_list, plot=0, method=METHOD)

There are certain data that we cannot obtain a valid value for IV. This is normal since Black-Scholes formula is just a model to estimate the option pricing. It will always be wrong in actual world when the assumptions of the formula fails. We store these option data in the [Abnormal Table](../report-Backup/abnormal_table.csv). 

We also take note of option data that deviates largely ($\ge$10%). We store these option data in the [Large Difference Table](../report-Backup/large_diff_table.csv).

The full log of this evaluation process can be viewed at [../log/log-Backup.txt](../log/log-Backup.txt).

The bars represent the percentage error of each option. The red dotted line represents the trend of cumulative percentage error.
![alt text](../report-Backup/Figure_1.png)
From the graph, we can tell that options traded at the start of trading hours are estimated poorly. The subsequent estimations are more accurate.

### Conclusion of IV

Overall, we obtain an average percentage error of 4.53%, which is much better than the result we obtained earlier during evaluation of HV (23.5%). 

We also do not need a long time to estimate through IV. Most option pricing results can be obtained within 1 second.

This proves that IV is a much stronger estimation method compared to HV.

## Conclusion

We present 2 ways of option price estimation. It is obvious that IV is more accurate than HV. This clearly proves that volatility is not constant across strike prices nor over time. Old data of volatility have little value in estimating current volatility. 

This can also be further proven by the "Fear Index" where the fluctuation of the implied volatility over time is completely random. 

In [None]:
# Clean up
write_log("Task done. Exiting main function...")
logg.close()

Finally, we close the log file and exit gracefully.