# Introduction

> **Underlying Literature**: The following module was inspired by the ideas put forward in Chapter 18 of [Advances in Financial Machine Learning](https://www.wiley.com/en-us/Advances+in+Financial+Machine+Learning-p-9781119482086) by Marcos Lopez de Prado

The classes below use tick data to produce time, volume, dollar, imbalance and run bars in a live streaming way.
New information can quickly be added to produce these bars on the go.
The same functionality is offered here as explained in the Data Structures section in MlFinLab.

The classes below create time, volume, dollar, imbalance and run bars with tick data that is used in batches and tick data used for live streaming.

### Table of Contents
- [Time Bars](#time)
- [Volume Bars](#volume)
- [Dollar Bars](#dollar)
- [Imbalance Bars](#imbalance)
- [Run Bars](#run)

<div class="alert alert-block alert-warning">
    
<b>Note:</b> The classes for streaming tick data return a dictionary whereas the classes for tick data used in batches return a pandas DataFrame.

</div>

Before starting, we must first import our data from the Sample Data Folder

In [1]:
# Import packages
import pandas as pd

# Import data 
data = pd.read_csv('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv')

# Preview each tick data set
print(data.head())

             Date and Time    Price  Volume
0  2011/07/31 22:38:45.108  1205.00       1
1  2011/07/31 22:38:45.934  1005.00       1
2  2011/07/31 22:38:47.008  1304.75       6
3  2011/07/31 22:38:48.944  1904.75       1
4  2011/07/31 22:38:52.951  1304.75      20


Each generated bar from streaming stick data contains the following:

- timestamp            | the time stamp at which the bar ends                                
- start timestamp      | the first tick timestamp of the bar as specified by the threshold        
- open                 | the open price of the security at the start timestamp                      
- high                 | the highest price of the security at the start timestamp             
- low                  | the lowest price of the security at the start timestamp                 
- close                | the close price of the security at the start timestamp               
- volume               | the volume of the security at the start timestamp                      
- cum_buy_volume       | cumulative buy volume of ticks in the bar                       
- cum_dollar_value     | cumulative dollar value of ticks in the bar                         
- tick_rule_buy_volume | amount of buy volume estimated by the Tick Rule                
- num_ticks            | number of ticks in the bar              
- ticker               | chosen ticker for the class 

## Time bars <a class="anchor" id="time"></a>

Time bars, are sampled after a fixed interval of time has passed.

### Tick data

In [2]:
# Import tools
from mlfinlab.online_data_structures import time_bars

# Generate time bars according to threshold
# Threshold in seconds 
# Threshold can be changed according to a specified condition
bars_time = time_bars.TimeBarGenerator(threshold=2,
                                               tick_fields_mapping={'timestamp': 'Date and Time', 'price': 'Price',
                                                                    'volume': 'Volume', 'ticker': 'ticker',
                                                                    'aggressor_side': 'aggressor_side'})

# Get the time bars from the ticks
# Range specifies the number of incoming ticks to iterate through
for i in range(10):
    agg_side = bars_time.apply_tick_rule(data['Price'][i])[0]
    new_bar = bars_time.process_tick(
        {'Date and Time': pd.to_datetime(data['Date and Time'][i]), 'Price': data['Price'][i],
         'Volume': data['Volume'][i], 'aggressor_side': agg_side, 'ticker': 'S&P'})
    # If new_bar is True then a new bar is generated
    if new_bar:
        print('New bar has been generated')
        print(bars_time.bars[-1])
    # User specified condition to change the threshold
    # For example when tick 25 is reached the threshold can be changed
    if i == 5:
        bars_time.set_threshold(10)
        
# Access all the generated bars
generated_bars = bars_time.bars

  @jit((float64[:], int64), nopython=False, nogil=True)


New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:46'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.108000040'), 'open': 1205.0, 'high': 1205.0, 'low': 1005.0, 'close': 1005.0, 'volume': 2, 'cum_buy_volume': 0, 'tick_rule_buy_volume': 0, 'cum_dollar_value': 2210.0, 'num_ticks': 2, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:48'), 'start_timestamp': Timestamp('2011-07-31 22:38:47.007999897'), 'open': 1304.75, 'high': 1304.75, 'low': 1304.75, 'close': 1304.75, 'volume': 6, 'cum_buy_volume': 6, 'tick_rule_buy_volume': 6, 'cum_dollar_value': 7828.5, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:50'), 'start_timestamp': Timestamp('2011-07-31 22:38:48.944000006'), 'open': 1904.75, 'high': 1904.75, 'low': 1904.75, 'close': 1904.75, 'volume': 1, 'cum_buy_volume': 1, 'tick_rule_buy_volume': 1, 'cum_dollar_value': 1904.75, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has been ge

### Batch data

In [3]:
# Import tools
from mlfinlab.data_structures import time_data_structures

# Tick Bars
time_bars = time_data_structures.get_time_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', resolution='D', verbose=False)

## Volume bars<a class="anchor" id="volume"></a>

Volume bars, are sampled after a fixed number of contracts (volume) has been traded.

### Tick data

In [4]:
# Import tools
from mlfinlab.online_data_structures import volume_bars

# Generate volume bars according to threshold
# Threshold in seconds 
# Threshold can be changed acording to a specified condition
bars_volume = volume_bars.VolumeBarGenerator(threshold=2,
                                               tick_fields_mapping={'timestamp': 'Date and Time', 'price': 'Price',
                                                                    'volume': 'Volume', 'ticker': 'ticker',
                                                                    'aggressor_side': 'aggressor_side'})

# Get the volume bars from the ticks
# Range specifies the number of incoming ticks to iterate through
for i in range(10):
    agg_side = bars_volume.apply_tick_rule(data['Price'][i])[0]
    new_bar = bars_volume.process_tick(
        {'Date and Time': pd.to_datetime(data['Date and Time'][i]), 'Price': data['Price'][i],
         'Volume': data['Volume'][i], 'aggressor_side': agg_side, 'ticker': 'S&P'})
    # If new_bar is True then a new bar is generated
    if new_bar:
        print('New bar has been generated')
        print(bars_volume.bars[-1])
    # User specified condition to change the threshold
    # For example when tick 25 is reached the threshold can be changed
    if i == 5:
        bars_volume.set_threshold(10)
        
# Access all the generated bars
generated_bars = bars_volume.bars

New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:45.934000'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.108000'), 'open': 1205.0, 'high': 1205.0, 'low': 1005.0, 'close': 1005.0, 'volume': 2, 'cum_buy_volume': 0, 'tick_rule_buy_volume': 0, 'cum_dollar_value': 2210.0, 'num_ticks': 2, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:47.008000'), 'start_timestamp': Timestamp('2011-07-31 22:38:47.008000'), 'open': 1304.75, 'high': 1304.75, 'low': 1304.75, 'close': 1304.75, 'volume': 6, 'cum_buy_volume': 6, 'tick_rule_buy_volume': 6, 'cum_dollar_value': 7828.5, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:52.951000'), 'start_timestamp': Timestamp('2011-07-31 22:38:48.944000'), 'open': 1904.75, 'high': 1904.75, 'low': 1304.75, 'close': 1304.75, 'volume': 21, 'cum_buy_volume': 1, 'tick_rule_buy_volume': 1, 'cum_dollar_value': 27999.75, 'num_ticks': 2, 'ticker': 'S&P'}


### Batch data

In [5]:
# Import tools
from mlfinlab.data_structures import standard_data_structures

# Volume Bars
volume = standard_data_structures.get_volume_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', threshold=28000,
                                                  batch_size=1000000, verbose=False)


## Dollar bars<a class="anchor" id="dollar"></a>

Dollar bars, are sampled after a fixed monetary amount has been traded.

### Tick data

In [6]:
# Import tools
from mlfinlab.online_data_structures import dollar_bars

# Generate dollar bars according to threshold
# Threshold in seconds 
# Threshold can be changed according to a specified condition
bars_dollar = dollar_bars.DollarBarGenerator(threshold=2,
                                               tick_fields_mapping={'timestamp': 'Date and Time', 'price': 'Price',
                                                                    'volume': 'Volume', 'ticker': 'ticker',
                                                                    'aggressor_side': 'aggressor_side'})

# Get the dollar bars from the ticks
# Range specifies the number of incoming ticks to iterate through
for i in range(10):
    agg_side = bars_dollar.apply_tick_rule(data['Price'][i])[0]
    new_bar = bars_dollar.process_tick(
        {'Date and Time': pd.to_datetime(data['Date and Time'][i]), 'Price': data['Price'][i],
         'Volume': data['Volume'][i], 'aggressor_side': agg_side, 'ticker': 'S&P'})
    # If new_bar is True then a new bar is generated
    if new_bar:
        print('New bar has been generated')
        print(bars_dollar.bars[-1])
    # User specified condition to change the threshold
    # For example when tick 25 is reached the threshold can be changed
    if i == 5:
        bars_dollar.set_threshold(10)
        
# Access all the generated bars
generated_bars = bars_dollar.bars

New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:45.108000'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.108000'), 'open': 1205.0, 'high': 1205.0, 'low': 1205.0, 'close': 1205.0, 'volume': 1, 'cum_buy_volume': 0, 'tick_rule_buy_volume': 0, 'cum_dollar_value': 1205.0, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:45.934000'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.934000'), 'open': 1005.0, 'high': 1005.0, 'low': 1005.0, 'close': 1005.0, 'volume': 1, 'cum_buy_volume': 0, 'tick_rule_buy_volume': 0, 'cum_dollar_value': 1005.0, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:38:47.008000'), 'start_timestamp': Timestamp('2011-07-31 22:38:47.008000'), 'open': 1304.75, 'high': 1304.75, 'low': 1304.75, 'close': 1304.75, 'volume': 6, 'cum_buy_volume': 6, 'tick_rule_buy_volume': 6, 'cum_dollar_value': 7828.5, 'num_ticks': 1, 'ticker': 'S&P'}
New bar has 

### Batch data

In [7]:
# Import tools
from mlfinlab.data_structures import standard_data_structures

# Dollar Bars
dollar = standard_data_structures.get_dollar_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', threshold=70000000,
                                                  batch_size=1000000, verbose=True)

Reading data in batches:
Batch number: 0
Returning bars 



## Imbalance bars<a class="anchor" id="imbalance"></a>

Imbalance bars form part of information-driven bars.
Information-driven bars are based on the notion of sampling a bar when new information arrives to the market.

2 types of imbalance bars are implemented in MlFinLab:

- Expected number of ticks, defined as EMA (book implementation)
- Constant number of expected number of ticks.


EMA is used to calculate tick, dollar and volume imbalance bars for streaming tick data.

Both the EMA and constant method are used to calculate tick, dollar and volume imbalance bars for batch data.

### Tick data

In [8]:
# Import tools
from mlfinlab.online_data_structures import imbalance_bars

# Generate dollar imbalance bars according to expected imbalance window and expected number of inital ticks per bar
bars_imbalance_dollar = imbalance_bars.ImbalanceBarGenerator(imbalance_type='dollar_imbalance',
                                                             expected_imbalance_window=3,
                                                             exp_num_ticks_init=30,
                                                             tick_fields_mapping={'timestamp': 'Date and Time',
                                                                                'price': 'Price',
                                                                                'volume': 'Volume', 
                                                                                'ticker': 'ticker',
                                                                                'aggressor_side': 'aggressor_side'})
# Get the imbalance dollar bars from the ticks
# Range specifies the number of incoming ticks to iterate through
for i in range(50):
    agg_side = bars_imbalance_dollar.apply_tick_rule(data['Price'][i])[0]
    new_bar = bars_imbalance_dollar.process_tick(
        {'Date and Time': pd.to_datetime(data['Date and Time'][i]), 'Price': data['Price'][i],
         'Volume': data['Volume'][i], 'aggressor_side': agg_side, 'ticker': 'S&P'})
        # If new_bar is True then a new bar is generated
    if new_bar:
        print('New bar has been generated')
        print(bars_imbalance_dollar.bars[-1])

# Access the generated bars
generated_bars = bars_imbalance_dollar.bars

New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:39:38.991000'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.108000'), 'open': 1205.0, 'high': 1904.75, 'low': 1005.0, 'close': 1304.5, 'volume': 88, 'cum_buy_volume': 8, 'tick_rule_buy_volume': 8, 'cum_dollar_value': 115005.5, 'num_ticks': 30, 'ticker': 'S&P', 'imbalance': -91724.5}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:39:59.606000'), 'start_timestamp': Timestamp('2011-07-31 22:39:40.423000'), 'open': 1304.5, 'high': 1304.5, 'low': 1304.5, 'close': 1304.5, 'volume': 33, 'cum_buy_volume': 0, 'tick_rule_buy_volume': 0, 'cum_dollar_value': 43048.5, 'num_ticks': 13, 'ticker': 'S&P', 'imbalance': -43048.5}


### Batch data

In [9]:
# Import tools
from mlfinlab.data_structures.imbalance_data_structures import (get_ema_dollar_imbalance_bars,
                                      get_const_dollar_imbalance_bars)

# EMA Dollar Imbalance Bars
dollar_imbalance_ema = get_ema_dollar_imbalance_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', num_prev_bars=3,
                                                     exp_num_ticks_init=100000,
                                                     exp_num_ticks_constraints=[100, 1000],
                                                     expected_imbalance_window=10000)
# Const Dollar Imbalance Bars
dollar_imbalance_const = get_const_dollar_imbalance_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv',
                                                         exp_num_ticks_init=100000,
                                                         expected_imbalance_window=10000)

Reading data in batches:
Batch number: 0
Returning bars 

Reading data in batches:
Batch number: 0
Returning bars 



## Run Bars <a class="anchor" id="run"></a>

Run bars share the same mathematical structure as imbalance bars, however, instead of looking at each individual trade, we are looking at sequences of trades in the same direction.

2 types of run bars are implemented in MlFinLab:

- Expected number of ticks, defined as EWMA (book implementation)

- Constant number of expected number of ticks.

EMA is used to calculate tick, dollar and volume imbalance bars for streaming tick data.

Both the EMA and constant method are used to calculate tick, dollar and volume imbalance bars for batch data.

In [10]:
# Import tools
from mlfinlab.online_data_structures import runs_bars

# Generate dollar runs bars according to expected imbalance window and expected number of inital ticks per bar
# This example calculates dollar imbalance bars
bars_run = runs_bars.RunsBarGenerator(runs_type='dollar_runs',
                                      expected_imbalance_window=3,
                                      exp_num_ticks_init=30,
                                      tick_fields_mapping={'timestamp': 'Date and Time',
                                                            'price': 'Price',
                                                            'volume': 'Volume', 
                                                            'ticker': 'ticker',
                                                            'aggressor_side': 'aggressor_side'})

# Get the dollar runs bars from the ticks
# Range specifies the number of incoming ticks to iterate through
for i in range(50):
    agg_side = bars_run.apply_tick_rule(data['Price'][i])[0]
    new_bar = bars_run.process_tick(
        {'Date and Time': pd.to_datetime(data['Date and Time'][i]), 'Price': data['Price'][i],
         'Volume': data['Volume'][i], 'aggressor_side': agg_side, 'ticker': 'S&P'})
        # If new_bar is True then a new bar is generated
    if new_bar:
        print('New bar has been generated')
        print(bars_run.bars[-1])

# Access the generated bars
generated_bars = bars_run.bars

New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:39:38.991000'), 'start_timestamp': Timestamp('2011-07-31 22:38:45.108000'), 'open': 1205.0, 'high': 1904.75, 'low': 1005.0, 'close': 1304.5, 'volume': 88, 'cum_buy_volume': 8, 'tick_rule_buy_volume': 8, 'cum_dollar_value': 115005.5, 'num_ticks': 30, 'ticker': 'S&P'}
New bar has been generated
{'timestamp': Timestamp('2011-07-31 22:40:11.168000'), 'start_timestamp': Timestamp('2011-07-31 22:39:40.423000'), 'open': 1304.5, 'high': 1304.75, 'low': 1304.5, 'close': 1304.75, 'volume': 49, 'cum_buy_volume': 10, 'tick_rule_buy_volume': 10, 'cum_dollar_value': 63923.0, 'num_ticks': 18, 'ticker': 'S&P'}


### Batch data

In [11]:
# Import tools
from mlfinlab.data_structures.run_data_structures import (get_ema_dollar_run_bars, get_const_dollar_run_bars)

# EMA Run Bars
dollar_imbalance_ema = get_ema_dollar_run_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', num_prev_bars=3,
                                               exp_num_ticks_init=100000,
                                               exp_num_ticks_constraints=[100, 1000],
                                               expected_imbalance_window=10000)

# Constant Run Bars
dollar_imbalance_const = get_const_dollar_run_bars('https://raw.githubusercontent.com/hudson-and-thames/example-data/main/test_tick_data.csv', num_prev_bars=3,
                                                   exp_num_ticks_init=100000,
                                                   expected_imbalance_window=10000)

Reading data in batches:


Batch number: 0
Returning bars 

Reading data in batches:
Batch number: 0
Returning bars 



## Conclusion

This notebook describes 2 different ways of using tick data in the MlFinLab package.

Key takeaways from the notebook:

* Time, volume, dollar, imbalance and run bars are created by streaming live tick data.

* Time, volume, dollar, imbalance and run bars are created by using tick data in batches.

## Reference

* Lopez de Prado, M. (2018) Advances in Financial Machine Learning. New York, NY: John Wiley & Sons.