# Optiver Realized Volatility EDA
![](https://media.istockphoto.com/vectors/stockmarketconcept-vector-id1262967772?k=6&m=1262967772&s=612x612&w=0&h=cfo-mSbEeMI8VY1BpgQUNuUEJoOaXK28OBjgxpV68xc=)


#### Stock Market in different countries is always a tug of war between the bears(who pull the price down) and bulls(who pull the price up). At a particular point in time,a stock price is determined by simple supply and demand. 
#### Stock price may be affected by different factors like economic releases, company news, a recommendation from a well-known analyst, a popular initial public offering (IPO) or unexpected earnings results.
#### Due to different factors mentioned above, the price may be volatile i.e increase or decrease in a short interval of time based on the market's emotion(stakeholders/investors).
#### Order book is an electronic buy (number of buy orders at a particular price) and sell (number of buy orders at a particular price) list at each price levels.
![](https://zerodha.com/z-connect/wp-content/uploads/2017/12/Floating-MarketDepth.png)
#### Above is a simple order book of a stock listed in NSE of Indian Stock Market called Reliance Industries. In the above image, we can see that bid column represents the buyer orders at each price level of 925.5, 925.45 etc with the quantity and orders. In the same fashion, there is offer column which represents the seller orders at each price level.
#### Volatility is really important as it can indicate and help in assigning risk level to a particular stock/index/securities.
#### Order book data provide the maximum amount of information about stock/index/securities at the lowest aggregation level.

### In this notebook, we try to analyse the volatility of stocks from different research papers. Will be an extension to the starter notebook (https://www.kaggle.com/jiashenliu/introduction-to-financial-concepts-and-data)

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt
import plotnine as p9
from plotnine import *
import warnings
warnings.filterwarnings('ignore')

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        

In [None]:
# Read Train Data
train = pd.read_csv('../input/optiver-realized-volatility-prediction/train.csv')
train.head()
book_df = pd.read_parquet('../input/optiver-realized-volatility-prediction/book_train.parquet')
trade_df =  pd.read_parquet('../input/optiver-realized-volatility-prediction/trade_train.parquet')

In [None]:
# Specify Stock ID to analyse volatility
stock_id = 0
time_id = 5

In [None]:
book_sample = book_df[(book_df['stock_id']==stock_id)&(book_df['time_id']==time_id)]
book_sample.loc[:,'stock_id'] = stock_id
trade_sample = trade_df[(book_df['stock_id']==stock_id)&(book_df['time_id']==time_id)]
trade_sample.loc[:,'stock_id'] = stock_id

In [None]:
book_sample

In [None]:
trade_sample.head()

## Bid/Ask/Trade 

In [None]:
plot_df = pd.merge(book_sample, trade_sample, on = ['stock_id','time_id','seconds_in_bucket'], how = 'left')
plot_df['price'] = plot_df['price'].ffill().bfill().rolling(10).mean()
plot_df = plot_df[['seconds_in_bucket', 'bid_price1','ask_price1','bid_price2','ask_price2', 'price']].melt(id_vars=["seconds_in_bucket"], 
        var_name="Variable", 
        value_name="Value")
colors = {'bid_price1':'royalblue', 'ask_price1':'maroon', 'bid_price2' : 'lightsteelblue', 'ask_price2' : 'indianred', 'price' : 'cyan'}  
# Create a time series plot
(
    ggplot(data = plot_df)+
    geom_line(aes(x='seconds_in_bucket', y = 'Value', color = 'Variable'), size = 1.5)+
    
    labs(title = 'Various OrderBook Prices of stock_id : ' + str(stock_id) + ', time_id : ' + str(time_id))+
    xlab('seconds_in_bucket') 
    + theme(figure_size=(16, 8)) + scale_color_manual(values = colors)
)

## Realized Volatility

In [None]:
book_sample['wap'] = (book_sample['bid_price1'] * book_sample['ask_size1'] +
                                book_sample['ask_price1'] * book_sample['bid_size1']) / (
                                       book_sample['bid_size1']+ book_sample['ask_size1'])

In [None]:
# Create a time series plot
(
    ggplot(data = book_sample)+
    geom_line(aes(x = 'seconds_in_bucket',
                  y = 'wap',
                  group = 1),
              size = 1.5,
              color = 'navy')+
    labs(title = 'WAP of stock_id : ' + str(stock_id) + ', time_id : ' + str(time_id))+
    xlab('seconds_in_bucket')+
    ylab('wap') + theme(figure_size=(16, 8))
)

## Log Returns

In [None]:
def log_return(list_stock_prices):
    return np.log(list_stock_prices).diff() 
book_sample.loc[:,'log_return'] = log_return(book_sample['wap'])
book_sample = book_sample[~book_sample['log_return'].isnull()]
# Create a time series plot
(
    ggplot(data = book_sample)+
    geom_line(aes(x = 'seconds_in_bucket',
                  y = 'log_return',
                  group = 1),
              size = 1.5,
              color = 'cadetblue')+
    labs(title = 'log_return of stock_id : ' + str(stock_id) + ', time_id : ' + str(time_id))+
    xlab('seconds_in_bucket')+
    ylab('log_return') + theme(figure_size=(16, 8))
)

## Complex Plot combining all info

In [None]:
fig = go.Figure(data=go.Heatmap(
        z=book_sample["bid_size1"],
        x=book_sample["seconds_in_bucket"],
        y=book_sample["bid_price1"],
        colorscale='Viridis'))

fig.update_layout(
    title='Bidding Price and Order Size',
    xaxis_nticks=36)

fig.show()

In [None]:
fig = go.Figure(data=go.Heatmap(
        z=book_sample["ask_size1"],
        x=book_sample["seconds_in_bucket"],
        y=book_sample["ask_price1"],
        colorscale='Viridis'))

fig.update_layout(
    title='Asking Price and Order Size',
    xaxis_nticks=36)

fig.show()

In [None]:
# At what price did trading happen(transaction complete) between ask and bid price
plot_df = pd.merge(book_sample, trade_sample, on = ['stock_id','time_id','seconds_in_bucket'], how = 'left').dropna(subset = ['size'])
fig = px.scatter(plot_df, x="ask_price1", y="bid_price1", size="size", size_max=60)
fig.update_layout(
    title='Volume of Trades Executed at what bid/ask price')
fig.show()

In [None]:
buy_df = book_sample[['seconds_in_bucket','bid_price1','bid_size1']]
buy_df['side'] = 'buy'
sell_df = book_sample[['seconds_in_bucket','ask_price1','ask_size1']]
sell_df['side'] = 'sell'
plot_df = pd.concat([buy_df, sell_df], axis = 0)
plot_df['quantity'] = plot_df['bid_size1'].fillna(0) + plot_df['ask_size1'].fillna(0)
plot_df['price'] = plot_df['bid_price1'].fillna(0) + plot_df['ask_price1'].fillna(0)
plot_df = plot_df.groupby(['price','side']).quantity.sum().reset_index()
plot_df

In [None]:
import matplotlib.style as style
style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize = (20,10))

ax.set_title('Weighted ECDF Plot of stock_id : ' + str(stock_id) + ', time_id : ' + str(time_id))

sns.ecdfplot(x="price", weights="quantity", stat="count", complementary=True, data=plot_df[plot_df.side == "buy"], ax=ax)
sns.ecdfplot(x="price", weights="quantity", stat="count", data=plot_df[plot_df.side == "sell"], ax=ax)
# sns.scatterplot(x="price", y="quantity", hue="side", data=data, ax=ax)

ax.set_xlabel("Price")
ax.set_ylabel("Quantity")

plt.show()

# Work In Progress

* More Financial Concepts
* Insights from Above plots