# Chpt 2: Financial Data Structures

## Standard Bars

### Time Bars -sampled on min, hour, day etc:
* Timestamp 
* Volume-weighted average price = avg price a security has traded at throughout the day based on both volume and price = (price * volume) / total volume. numerator is for every transaction
* Open price
* Close price
* High price
* Low price
* Volume Trades

Pros:
* Extremely easy to obtain

Cons: 
* Markets dont process info at constant time intervals. Mornings are more active etc
* Time bars oversample low-activity periods and undersample high-activity periods
* Shit statistical properties
    * Serial Correlation: Errors in a given time period carry forward. Ex: overestimate in one year high dividends will carry over to the next
    * Heteroscedasticity: The size of the error term differs across values of an independent variable
    * Non-normality of returns 
    
    

### Tick Bars - sampled every x num of trades
* Take a sample variable (timestamp, VWAP, open price etc) each time x **number of transactions (trades)** take place
    * EX: Sample VWAP every 1000 trades (ticks) 
Pros:
* Sampling as a function of number of transactions lets the returns be more IID (independent and identically distributed normal), which allows us to make gaussian assumptions
* Instead of depending on each minute or each day, is instead based on every 1000 trades which better fits info processing of markets

Cons:
* Arbitariness in number of ticks
* Ex: One order sitting in offer for 10 lots is one tick, but 10 orders of 1 lot are 10 ticks

### Volume Bars - sampled every X amount of volume
* Same as tick, but the variable is sampled every time X volume has been traded
Pros:
* Naturally describes relationship between volume and price (foundation for market microstructure theories)

### Dollar Bars - sample every X amount of dollars exchanged
* Represents the fact that number of shares traded is a function of actual value exchanged (if a stock price goes up 100% selling $1000 of the stock will require only selling half the num of shares as you bought)
* When analysis involves signifigant price fluctuation, use dolar bars
* Is robust to splits, buybacks etc which would impact volume bars

In [2]:
# Markdown guide Latex: https://towardsdatascience.com/write-markdown-latex-in-the-jupyter-notebook-10985edb91fd

# Information Driven-Bars

## Tick Imbalance Bars - TIBS
* Sequence of ticks $ \left\{(p_{t} , v_{t}) \right\}_{t=1,...,T}$
* $ p_{t} $ is price associated with tick $ t $. Tick is minimum upward or downward movement in price of a security (transaction / trade)

#### To Sign Ticks with Tick Rule:
* **Tick Rule** - how to sign ticks to find imbalance:
    * A sequence $ \left\{b_{t}\right\}_{t=1,...,T}$ where:

$$
b_{t} = \left\{
    \begin{array}\\
         b_{t-1} & \mbox{if } \Delta p_{t} = 0 \\
        \frac{|\Delta p_{t}|}{\Delta p_{t}} \ & \mbox{if } \ \Delta p_{t} \neq 0 \\
    \end{array}
\right.
$$

* This rule allows us to assign a value $ b_{t} \in \left\{ -1,1 \right\} $ 
* This is "signing" the ticks based on change in price
* The goal is to find a **tick imbalance** hence creating the Tick Imbalance Bars **TIBs** to sample whenever a tick imbalance exceeds expectations. 
* A tick imblance is given by the accumulation (sum) of signed ticks, to tick index $ T $. The accumulation will either be negative or positve, exceeding expectation and giving info

##### To find Tick Imblance at time $ T $:
* Sum of tick signs in period of time T: $\theta_{T} $ = tick imbalance at time T ( Basically how positive (bullish) or negative (bearish) the sum of the ticks are in the period)
$$
\theta_{T} = \sum_{t=1}^{T} b_{t}$$
How to determine $ T $

1. Define Tick Imbalance at time T as $ \theta_{T} = \sum_{t=1}^{T} b_{t} $
2. Compute the expected value of the tick imbalance (how positive or neg its expected to be) with
    * $ E_{0}[\theta_{T}]= E_{0}[T](2P[b_{t}=1] - 1)$
    * ^ Can be broken down as Exp Value of Tick Imbalance = Expected size of tick bar * probabilities the tick was a buy or a sell
    * In practice: $ E_{0}[\theta_{T}] $ = avg size of previous bars * avg $b_{t} $ (value of sign of previous ticks) 
    
3. Define size: A tick imbalance bar (TIB) as a T*-contiguous subset of ticks such that the following condition is met:
    * $ T* = \underset{T}{\operatorname{argmin}} \left\{ |\theta_{T}|  \geq  E_{0}[T]|2P[b_{t}=1] - 1| \right\}$ 
    * Basically, Take the minimum size T where the tick imbalance $ \theta_{T}$ for group of Ticks size T is $ \geq $ expected value of the expected value of the tick imbalance for size T group of past

#### Summary:
* **Process**
    1. Take in Raw ticks
    2. Sign all the raw ticks with the tick rule
    3. Find correct size T to make bar with
* **Result**
    * A **TIB (tick imbalance bar)** where each bar is a T size group of ticks where the tick imbalance (how + or - sum of signs is ) for size T is greater than the expected value for size a TIB size T (using past tick averages). Basically, **sample a group of ticks using the smallest size T (creating a TIB) when the tick imbalance is greater than expected for that size**
    * Lower T = more samples
    * A TIB is a bucket of trades containing equal information (regardless of volume, price, etc)
    * When $ \theta_{T} $ is more imbalanced, a smaller bucket size will be required (smaller T) and TIBS are generated more frequently. Hence TIBs are produced more frequently in the presence of informed trading (one-side imbalanced trading based on asymetric info traders have), and data sampling is synched with arrival of informed traders
 

## Volume Imbalance Bars and Dollar Imbalance Bars (VIBs and DIBs)
* Extends the concept of TIBS to volumne and dollar imbalance
* Sample Bars when volume or dollar imbalance diverge from expectations

### Calculate VIB or DIB
* AFter applying tick rule to sign ticks, define imbalance as $$\theta_{T} = \sum_{t=1}^{T} b_{t}v_{t}$$
* $ v_{t} $ can represent the number of securities traded (VIB) or dollar amount exchanged (DIB)
* Calculate expected value of imblanace (dollar or volume):
    * $ E_{0}[\theta_{T}]= E_{0}[T](v^{+}-v^{-}) = E_{0}[T](2v^{+} - E_{0}[v_{t}])$
        * ^ in practice, $ E_{0}[\theta_{T}] $ = avg of T values from prior bar * avg of $ b_{t}v_{t} $ from prior bars
* **Result**: A VIB or DIB is a T* contiguous subset of ticks such that:
    * $ T* = \underset{T}{\operatorname{argmin}} \left\{ |\theta_{T}|  \geq  E_{0}[T]|2v^{+} - E_{0}[v_{t}]| \right\}$
* **Conclusion**:
    * Like TIBs, a VIB or DIB is a dynamically sized group of ticks with size T, where each bar is sampled when the dollar or volume imbalance is greater than the expected dollar or volume imbalance
    * Is the information-based analogue of volume and dollar bars, in the way that is addressed concerns of tick fragmentation (one order of 10 lots is one tick, but 10 orders of 1 lot is 10 ticks)
    * Robust to coporate actions bc info bars do not rely on constant bar size, theyre adjusted automatically

## Tick Runs Bars - TRBs
* TIBs, VIBs, and DIBs monitor order flow imbalance as measured in terms of ticks, volumes, and dollars exchanged
* However, large traders will create runs using execution strategies like:
    * Sweep the order book: Aggresively soak up all liquidity (sweep) the order book. Splits a huge trade into chunks, where each chunk takes all the liquidity availble at that price, then the next chunk takes all the liquidity at the next best price etc etc
    * Iceberg Orders: large orders that are split up into lots or small sized limit orders to disguise big trade
    * Other tactics to split parent order into multiple children
* Because of these stratigies, it could be useful to monitor the **sequence** of buys in the overall volume and take samples when the sequence diverges from out estimates

#### Process:
1. Sign ticks using tick rule
2. **Define length** of the current run as:
$$
\theta_{T} = max \left\{ \sum_{t|b_{t}=1}^{T} b_{t} , -\sum_{t|b_{t}=-1}^{T} b_{t}  \right\}
$$
    * Basically saying, with length T, take the sum of the signs (all will be +1 and -1) and see if there's more +1s (buying run) or -1s (selling run)
    * Notes: This definition of sequence allows for sequence breaks. Instead of measuing the longest seq, we count the number of ticks for each side (+1 or -1) which is a more useful definition
3. **Compute expected value** of $ \theta_{T} $:
    * $ E_{0}[\theta_{T}]$ = exp. weighted moving avg of T values from prior bars * exp weighted moving avg of proportion of buy ticks from prior bars
4. **Define the TRB** as sample T ticks large where the length of the curr run is >= expected len of run for sample size T
    * $ T* = \underset{T}{\operatorname{argmin}} \left\{ |\theta_{T}|  \geq  E_{0}[T] max(P[b_{t} = 1], 1-P[b_{t} = 1])\right\}$



* Note: As with all information bars, a greater imbalance (when $ \theta_{T} $ exhibits more runs than expected), will require a lower T will satisfy the conditions. So more info means more frequent samples

### Volume / Dollar Runs Bars - VRBs and DRBs
* Same as tick run Bars, but the idea is extended to volume or dollars exchanged.
* **Goal:** to sample bars whenever the volumes or dollars traded by one side exceed expectations

**Process**:
1. After using tick rule to sign ticks, define the volumes or dollars associated with a run as ($ v_{t} $ can represent number of securities traded (VRB) or dollar amount traded (DRB):
$$
\theta_{T} = max \left\{ \sum_{t|b_{t}=1}^{T} b_{t}v_{t} , -\sum_{t|b_{t}=-1}^{T} b_{t}v_{t}  \right\}
$$
2. Compute Expected Value of $ \theta_{T} $:
    *  $ E_{0}[\theta_{T}]$ =  avg of T values from prior bars *  avg of proportion of buy ticks from prior bars * avg of buy volumes of previous bars * avg of proportion sell ticks ( 1- avg proportion of buy ticks) * avg volume of sell ticks
3. Define a VRB as T*-contiguous subset of ticks such that the following condition is met:
    * $\theta_{T}$ >= expected value of $\theta_{T}$ . The actual volume / dollars associated with a run is higher than the expected volume for the run.

## The ETF Trick
* What if you want to deal with a spread vs single instrument? Example spread of futures
    * **Problems**:
        * Spread is a vector of weights that change over time, not a single weight
        * Spread may converge even if the prices do not change soley bc of weights
        * Spreads can have negative value, bc its not a price
        * Trading doesn't always align for all contracts in the spread (liquidity)
    * **Solution**:
        * Produce a time series that reflects \\$1 invested in a spread
        * This series can be used to model, generate signals, and trade as if it were an ETF
* **Step 1: Data**:
    * Using historical data in the form of bars, created from any of the above methods, each bar will contain:
        * $ o_{i,t} $ : raw open price of instrument $ i = 1, ...  I $ at bar $ t = 1, ..., T $
        * $ p_{i,t} $ : raw close price of instrument $ i = 1, ...  I $ at bar $ t = 1, ..., T $
        * $ \varphi_{i,t} $ : usd value of one point of instrument (including foreign exchange rate). Essentially finding cost scaled to pip $ i = 1, ...  I $ at bar $ t = 1, ..., T $
            * <Cost of 1 point> = (Contract * (Price + One_Point)) - (Contract * Price)
                * Contract – the contract size.
                * Price – the current quote of the instrument.
                * One_Point – the size of 1 point.(Usually 0.0001
        * $ v_{i,t} $ : volume of instrument $ i = 1, ...  I $ at bar $ t = 1, ..., T $
        * $ d_{i,t} $ : carry (some currency shit), dividend, or coupon paid by instrument $ i $ at bar $ t $. This variable can also be used to charge margin costs or costs of funding. 
        * All instruments ( $ i = 1, ... I $ ) are tradable at all bars ($ t = 1, ... ,T ). All instruments may not have been tradeable at every period of time, they were tradable at all the bars.


* **Step 2: Calculate $ K_{t} $**
* $ K_{t} $ is the value of \\$1usd in investment into the "ETF"
* For a basket of futures characterized by allocation vectors $ \omega_{t} $ rebalanced on bars $ B\subseteq \left\{1,...,T \right\} $, the \\$1 investment value $ K_{t} $ is derived as:
