# Create Financial Data Structure

Most people look at the chart, which is made of time bar. We can create not only time bar but also different types of useful and informative bars.

In [1]:
import mlfinlab as ml

import numpy as np
import pandas as pd



If you need sample data, please check sample of S&P500 E-mini futures, [ES_Sample.zip](https://s3-us-west-2.amazonaws.com/tick-data-s3/downloads/ES_Sample.zip). In this document, we use only the first millon rows of data because of the processing time.

In [2]:
FILE_PATH = 'raw_tick_data.csv'

In [3]:
raw_tick_df = pd.read_csv(FILE_PATH)
raw_tick_df.head()

Unnamed: 0,date,price,volume
0,09/01/2013 17:00:00.083,1640.25,8
1,09/01/2013 17:00:00.083,1640.25,1
2,09/01/2013 17:00:00.083,1640.25,2
3,09/01/2013 17:00:00.083,1640.25,1
4,09/01/2013 17:00:00.083,1640.25,1


## Time Bars

In [4]:
time_bars = ml.data_structures.get_time_bars(FILE_PATH,
                                             resolution='D' # Day
                                            )
time_bars

Reading data in batches:
Batch number: 0
Returning bars 



Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,1378080000,13642,1640.25,1644.0,1639.0,1643.0,49208,25816,13641,80769570.0
1,1378166400,66322,1643.0,1648.5,1642.0,1646.75,171378,94591,52680,282128100.0
2,1378252800,606535,1646.75,1650.0,1630.75,1641.75,2100640,1056010,540213,3445759000.0
3,1378339200,994036,1641.5,1655.5,1635.0,1655.25,1596430,813116,387501,2630749000.0


## Standard bars

In [5]:
from mlfinlab.data_structures import get_dollar_bars, get_volume_bars, get_tick_bars

In [6]:
dollar = get_dollar_bars(FILE_PATH, threshold=70000000, batch_size=1000000, verbose=False)
volume = get_volume_bars(FILE_PATH, threshold=28000, batch_size=1000000, verbose=False)
tick = get_tick_bars(FILE_PATH, threshold=5500, batch_size=1000000, verbose=False)

In [7]:
dollar

Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2013-09-01 21:34:39.298,11207,1640.25,1643.50,1639.00,1640.75,42862,21896,11207,70347610.00
1,2013-09-02 02:56:24.209,26547,1640.75,1646.00,1640.25,1644.50,42585,24320,15340,70000546.50
2,2013-09-02 06:37:33.128,40473,1644.50,1647.50,1644.25,1647.50,42580,23167,13926,70095794.25
3,2013-09-02 09:34:46.141,51328,1647.50,1648.50,1645.25,1647.00,42535,23904,10855,70053015.75
4,2013-09-02 22:55:20.297,64261,1647.00,1648.50,1645.25,1648.00,42512,23884,12933,70024910.50
...,...,...,...,...,...,...,...,...,...,...
87,2013-09-04 14:54:36.713,964221,1650.25,1652.75,1650.25,1651.00,42382,22128,11014,70002345.75
88,2013-09-04 14:59:51.015,972703,1651.00,1651.75,1649.50,1651.75,42410,25430,8482,70005355.25
89,2013-09-04 15:00:32.804,977274,1651.50,1652.50,1651.25,1652.00,42455,21459,4571,70127698.50
90,2013-09-04 15:13:38.243,984600,1652.00,1654.00,1651.75,1653.25,42364,22046,7326,70019786.25


## Information-Driven Bars

In addition to the below dollar impalance bar, you can also make an impalance bar for volume and tick like standard bar. All you have to do is change 'dollar' from method name to 'volume' and 'tick' for purpose.
(ex. get_ema_dollar_imblance_bars -> get_ema_volume_imblance_bars)

In [8]:
from mlfinlab.data_structures import get_ema_dollar_imbalance_bars, get_const_dollar_imbalance_bars

In [9]:
dollar_imbalance_ema, thresholds = get_ema_dollar_imbalance_bars(FILE_PATH,
                                                                 num_prev_bars=3,
                                                                 exp_num_ticks_init=100000,
                                                                 exp_num_ticks_constraints=[100, 1000],
                                                                 expected_imbalance_window=10000,
                                                                 analyse_thresholds=True,
                                                                 verbose=False
                                                                )

dollar_imbalance_const, _ = get_const_dollar_imbalance_bars(FILE_PATH,
                                                            exp_num_ticks_init=100000,
                                                            expected_imbalance_window=10000,
                                                            verbose=False
                                                           )

In [10]:
dollar_imbalance_ema

Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2013-09-04 10:40:22.835,807274,1640.25,1650.50,1630.75,1650.50,3089748,1565287,807274,5.071516e+09
1,2013-09-04 10:40:34.020,808203,1650.50,1650.50,1650.00,1650.50,3741,2360,929,6.173741e+06
2,2013-09-04 10:40:46.226,808755,1650.50,1650.75,1650.25,1650.50,2713,1848,552,4.477849e+06
3,2013-09-04 10:40:50.260,809013,1650.50,1650.75,1650.50,1650.75,1338,1192,258,2.208562e+06
4,2013-09-04 10:40:50.280,809136,1650.75,1650.75,1650.75,1650.75,531,531,123,8.765482e+05
...,...,...,...,...,...,...,...,...,...,...
21658,2013-09-05 02:22:23.874,999982,1654.50,1654.50,1654.50,1654.50,11,11,5,1.819950e+04
21659,2013-09-05 02:22:24.986,999988,1654.50,1654.50,1654.50,1654.50,10,10,6,1.654500e+04
21660,2013-09-05 02:22:25.001,999991,1654.50,1654.50,1654.50,1654.50,13,13,3,2.150850e+04
21661,2013-09-05 02:22:25.001,999995,1654.50,1654.50,1654.50,1654.50,9,9,4,1.489050e+04


In [11]:
thresholds

Unnamed: 0,cum_theta,expected_imbalance,exp_num_ticks,timestamp
0,0.0,,100000.0,2013-09-01 17:00:00.083
1,0.0,,100000.0,2013-09-01 17:00:00.083
2,0.0,,100000.0,2013-09-01 17:00:00.083
3,0.0,,100000.0,2013-09-01 17:00:00.083
4,0.0,,100000.0,2013-09-01 17:00:00.083
...,...,...,...,...
999995,8272.5,-133.800352,100.0,2013-09-05 02:22:25.001
999996,16545.0,-133.800352,100.0,2013-09-05 02:22:25.001
999997,1654.5,-129.661946,100.0,2013-09-05 02:22:25.001
999998,3309.0,-129.661946,100.0,2013-09-05 02:22:25.001


In [12]:
dollar_imbalance_const

Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2013-09-04 10:40:22.835,807274,1640.25,1650.5,1630.75,1650.5,3089748,1565287,807274,5071516000.0


## Run Bars

In [13]:
from mlfinlab.data_structures import get_ema_dollar_run_bars, get_const_dollar_run_bars


In [14]:
# EMA Dollar Imbalance bars
dollar_imbalance_ema, thresholds = get_ema_dollar_run_bars(FILE_PATH,
                                                           num_prev_bars=3,
                                                           exp_num_ticks_init=100000,
                                                           exp_num_ticks_constraints=[100, 1000],
                                                           expected_imbalance_window=10000,
                                                           analyse_thresholds=True,
                                                           verbose=False
                                                          )
# Const Dollar Imbalance Bars
dollar_imbalance_const, _ = get_const_dollar_run_bars(FILE_PATH,
                                                  num_prev_bars=3,
                                                  exp_num_ticks_init=100000,
                                                  expected_imbalance_window=1000,
                                                  verbose=False
                                                 )

In [15]:
dollar_imbalance_ema

Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2013-09-03 09:01:18.249,202180,1640.25,1650.00,1639.00,1645.25,695515,356057,202180,1.144997e+09
1,2013-09-03 09:01:29.446,202756,1645.25,1646.00,1645.25,1646.00,2938,2274,576,4.835396e+06
2,2013-09-03 09:01:36.991,203344,1646.00,1646.25,1645.25,1645.25,2770,523,588,4.558979e+06
3,2013-09-03 09:01:50.826,204387,1645.25,1645.75,1644.50,1645.25,4352,2071,1043,7.159457e+06
4,2013-09-03 09:02:06.890,205037,1645.25,1645.25,1644.50,1644.50,2809,1048,650,4.620606e+06
...,...,...,...,...,...,...,...,...,...,...
4834,2013-09-05 00:00:16.610,994090,1653.75,1655.50,1653.25,1655.50,5222,3149,1912,8.638504e+06
4835,2013-09-05 00:39:34.579,995320,1655.50,1655.75,1653.75,1653.75,3500,1135,1230,5.792073e+06
4836,2013-09-05 01:25:12.236,996808,1653.75,1654.50,1653.50,1653.50,4545,2006,1488,7.517411e+06
4837,2013-09-05 01:56:13.281,997845,1653.50,1653.75,1652.25,1652.25,2995,765,1037,4.950406e+06


In [16]:
thresholds

Unnamed: 0,cum_theta_buy,cum_theta_sell,exp_imbalance_buy,exp_imbalance_sell,exp_num_ticks,exp_buy_ticks_proportion,buy_ticks_num,timestamp
0,0.0,0.00,,,100000.0,,0,2013-09-01 17:00:00.083
1,0.0,0.00,,,100000.0,,0,2013-09-01 17:00:00.083
2,0.0,0.00,,,100000.0,,0,2013-09-01 17:00:00.083
3,0.0,0.00,,,100000.0,,0,2013-09-01 17:00:00.083
4,0.0,0.00,,,100000.0,,0,2013-09-01 17:00:00.083
...,...,...,...,...,...,...,...,...
999995,2183255.0,980756.75,5755.549257,6085.558713,1000.0,0.520541,423,2013-09-05 02:22:25.001
999996,2191527.5,980756.75,5755.549257,6085.558713,1000.0,0.520541,424,2013-09-05 02:22:25.001
999997,2193182.0,980756.75,5755.549257,6085.558713,1000.0,0.520541,425,2013-09-05 02:22:25.001
999998,2194836.5,980756.75,5755.549257,6085.558713,1000.0,0.520541,426,2013-09-05 02:22:25.001


In [17]:
dollar_imbalance_const

Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2013-09-03 09:01:18.249,202180,1640.25,1650.0,1639.0,1645.25,695515,356057,202180,1144997000.0
1,2013-09-03 10:23:41.889,302746,1645.25,1648.75,1640.25,1640.5,376083,180734,100566,618836900.0
2,2013-09-03 12:00:32.007,406364,1640.5,1641.5,1633.25,1633.25,427482,204658,103618,700119300.0
3,2013-09-03 14:04:22.049,513261,1633.25,1636.0,1630.75,1636.0,398845,210584,106897,651527400.0
4,2013-09-03 15:31:41.400,595147,1636.0,1640.0,1635.0,1638.75,388199,204234,81886,635665500.0
5,2013-09-04 09:55:10.377,770606,1638.5,1648.0,1635.0,1647.5,635087,318220,175459,1042495000.0
6,2013-09-04 13:17:18.167,894858,1647.5,1654.75,1646.75,1651.25,568024,285570,124252,938043900.0
7,2013-09-04 15:00:00.885,975162,1651.25,1654.25,1649.5,1652.0,327195,176753,80304,540382500.0


## Out of Memory

There may be a shortage of memory when processing the tick data. Every method that makes bar has parameters called _batch_size_ and _output_path_. They can be used to determine the number of rows that can be processed at once and to store the results in a local disk.

In [18]:
OUTPUT_PATH = "result.csv"
time_bars = ml.data_structures.get_time_bars(FILE_PATH,
                                             batch_size=1000000,
                                             resolution='D', # Day
                                             to_csv=True,
                                             output_path=OUTPUT_PATH
                                            )

Reading data in batches:
Batch number: 0
Returning bars 

