## Exploratory Data Analysis and Feature Engineering
Here we work with adjusted options data from the `data/adjusted_options` directory. We aim to produce
a novel technical indicator to be used for training of model

**Importing required packages & changing working directory**

In [6]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import os

if os.getcwd()[-3:] == "src":
    os.chdir(os.path.dirname(os.getcwd()))
else:
    pass

**User defined parameters**

In [7]:
stock_of_interest = "CVX"
option_data_path = "data/adjusted_options/"
stock_data_path = "data/adjusted_daily_closing/"
num_bars = 100


**Load adjusted options data**


In [8]:
try:
    options_df = pd.read_csv(os.path.abspath(os.path.join(option_data_path, stock_of_interest)) + ".csv")
    options_df["date"] = pd.to_datetime(options_df["date"]).dt.date
    options_df["expiration date"] = pd.to_datetime(options_df["expiration date"]).dt.date
except FileNotFoundError:
    raise SystemExit("Option data for " + stock_of_interest + " not found in path: " +
                     os.path.abspath(os.path.join(option_data_path, stock_of_interest)) + ".csv")


First off, I would like to note that we are only using `bid price` or `last price` as a measure of an option's value.
This is because, we are focused on the selling of options, and `ask price` is an exaggerated representation of demand.

Thus, we first remove options that neither have bid or last price, as we cannot determine demand. In addition, we will
also remove options that have neither volume nor open interest. I made this decision because I believe that options
that have no holders and liquidity is not representative of the market, as the `last price` may be outdated and
`bid price` undervalued.


In [9]:
options_df = options_df[(options_df[["bid price", "last price"]].max(axis=1) > 0) &
                        (options_df[["volume", "open interest"]].max(axis=1) > 0)]
options_df.head()

Unnamed: 0,date,expiration date,type,strike price,ask price,ask size,bid price,bid size,last price,volume,open interest,closing price,exp date closing price,date div,exp date div
9,2016-01-04,2016-01-08,put,65.0,0.25,173.0,0.0,0.0,0.01,0.0,13.0,88.85,82.13,0.578852,0.649016
10,2016-01-04,2016-01-08,call,70.0,19.15,27.0,17.45,344.0,10.83,0.0,1.0,88.85,82.13,0.578852,0.649016
11,2016-01-04,2016-01-08,put,70.0,0.03,28.0,0.0,0.0,0.04,0.0,173.0,88.85,82.13,0.578852,0.649016
13,2016-01-04,2016-01-08,put,75.0,0.05,418.0,0.01,36.0,0.02,162.0,1506.0,88.85,82.13,0.578852,0.649016
15,2016-01-04,2016-01-08,put,76.5,0.04,117.0,0.02,25.0,0.02,291.0,11.0,88.85,82.13,0.578852,0.649016


Since the option strike price, closing price on current and expiration dates all contain priced in dividends,
we will remove these with the pre-calculated dividend contributions. This is done in script `scrape_and_preprocess.py`.

In [10]:
options_df["adj closing"] = options_df["closing price"] - options_df["date div"]
options_df["adj strike"] = options_df["strike price"] - options_df["exp date div"]
options_df["adj exp closing"] = options_df["exp date closing price"] - options_df["exp date div"]
options_df["days till exp"] = np.busday_count(begindates=options_df["date"],
                                              enddates=options_df["expiration date"])
options_df = options_df.drop(columns=["date div", "exp date div"])

options_df.head()

Unnamed: 0,date,expiration date,type,strike price,ask price,ask size,bid price,bid size,last price,volume,open interest,closing price,exp date closing price,adj closing,adj strike,adj exp closing,days till exp
9,2016-01-04,2016-01-08,put,65.0,0.25,173.0,0.0,0.0,0.01,0.0,13.0,88.85,82.13,88.271148,64.350984,81.480984,4
10,2016-01-04,2016-01-08,call,70.0,19.15,27.0,17.45,344.0,10.83,0.0,1.0,88.85,82.13,88.271148,69.350984,81.480984,4
11,2016-01-04,2016-01-08,put,70.0,0.03,28.0,0.0,0.0,0.04,0.0,173.0,88.85,82.13,88.271148,69.350984,81.480984,4
13,2016-01-04,2016-01-08,put,75.0,0.05,418.0,0.01,36.0,0.02,162.0,1506.0,88.85,82.13,88.271148,74.350984,81.480984,4
15,2016-01-04,2016-01-08,put,76.5,0.04,117.0,0.02,25.0,0.02,291.0,11.0,88.85,82.13,88.271148,75.850984,81.480984,4
