<a href="https://colab.research.google.com/github/gsanc018/Financial-ML/blob/master/BTC_Tick_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook will show you how to obtain BTCUSD tick data (for free) and use the Python Package MLFINLAB https://github.com/hudson-and-thames/mlfinlab in order to form some of the data.

#**Obtaining the Data**

##Data from BitcoinChartsAPI


From the raw tick data, we will be constructing different type of bar charts that will hopefully show more favorable statistical properties than those know to be prevelant in time bars.

```
# todo: research validity of dataset
```

The bitcoincharts API provides this dataset for free so hope to the data gods that it is accurate. 


In [0]:
#Packages
import numpy as np
import pandas as pd
from datetime import datetime

In [156]:
data = pd.read_csv("https://api.bitcoincharts.com/v1/csv/krakenUSD.csv.gz", header = None)
data.columns = ['timestamp', 'price', 'volume']
data['timestamp'] = pd.to_datetime(data['timestamp'],unit='s')
ticks = len(data)
data.tail()
print(ticks)

19313882


In [168]:
data[-14000000:].head()

Unnamed: 0,timestamp,price,volume
5313882,2017-12-18 21:13:37,18624.0,0.01386
5313883,2017-12-18 21:13:42,18646.2,0.06835
5313884,2017-12-18 21:13:42,18670.8,0.010326
5313885,2017-12-18 21:13:42,18671.0,0.675907
5313886,2017-12-18 21:14:04,18668.0,0.1


##Live Data from Kraken Exchange

Later we will be using this same analysis for a live trading bot on the Kraken API, so let's make sure the data historical data matches the live data as closely as possible to avoid future headaches.

In [0]:
#install libraries for Kraken API wrappers
pip install pykrakenapi krakenex

In [0]:
import krakenex
from pykrakenapi import KrakenAPI
api = krakenex.API()
k = KrakenAPI(api)

In [65]:
trades, last = k.get_recent_trades("XXBTZUSD")
trades.head()

Unnamed: 0_level_0,price,volume,time,buy_sell,market_limit,misc
dtime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-01-12 04:06:14.160300016,8093.0,0.920147,1578802000.0,sell,market,
2020-01-12 04:06:14.158799887,8093.5,0.842,1578802000.0,sell,market,
2020-01-12 04:06:14.156500101,8093.8,0.1,1578802000.0,sell,market,
2020-01-12 04:06:14.146800041,8097.7,0.137853,1578802000.0,sell,market,
2020-01-12 04:03:46.114500046,8089.6,0.01,1578802000.0,buy,limit,


In [60]:
trades = trades[['time', 'price', 'volume']]
trades.head()

Unnamed: 0_level_0,time,price,volume
dtime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-12 04:02:41.498399973,1578802000.0,8097.8,0.339986
2020-01-12 04:02:41.496900082,1578802000.0,8097.8,0.04
2020-01-12 04:02:41.495500088,1578802000.0,8097.8,0.098756
2020-01-12 04:02:41.473200083,1578802000.0,8097.8,0.039258
2020-01-12 04:02:20.259799957,1578802000.0,8098.7,0.000607


We similarly get price, volume and timestamp but we also get the side and type of the order which will come in useful. Unfortunately we only get 6,000 ticks, but we can make repreated calls so theres just some math we gotta do to see how much data we can actually use in real time.



Let's see how time bars look like when we obtain them from Kraken.

###Timebars

In [16]:
ohlc, last = k.get_ohlc_data("XXBTZUSD",  interval= 1440)
ohlc.tail(10)

Unnamed: 0_level_0,time,open,high,low,close,vwap,volume,count
dtime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2018-02-01,1517443200,10101.6,10180.9,8525.3,9037.6,9230.8,14656.681513,60114
2018-01-31,1517356800,9979.8,10331.0,9513.1,10101.6,9927.9,8867.80881,56940
2018-01-30,1517270400,11150.0,11200.0,9750.3,9979.8,10359.5,10679.554378,60551
2018-01-29,1517184000,11633.9,11708.3,11015.0,11150.0,11263.9,4209.311271,38648
2018-01-28,1517097600,11380.2,11932.7,11100.0,11629.9,11619.8,5972.084313,46256
2018-01-27,1517011200,11080.8,11595.0,10850.0,11386.0,11278.7,5943.538378,45372
2018-01-26,1516924800,11159.3,11634.2,10320.0,11080.8,10914.8,7680.88569,59173
2018-01-25,1516838400,11419.8,11760.1,10899.0,11159.3,11290.7,4607.940407,38733
2018-01-24,1516752000,10850.0,11500.0,10529.0,11419.8,11043.8,4144.017683,40683
2018-01-23,1516665600,10811.3,11456.0,10001.0,10849.9,10796.3,5067.033053,46533


In [17]:
ohlc.size

5760

You get almost 2 years of Daily data. This'll give us a good idea on when to sample from our ticks.

#***Creating Different Types of Bar Charts***

##Using mlfinlab

Installing with a quick pip

In [0]:
pip install mlfinlab

These are the various data structures that are available in the package. An explanation can be found in Marcos Lopez De Prado's book or in my medium post that I'm going to eventually write (I promise!)
```
Logic regarding the various sampling techniques, in particular:
* Time Bars
* Tick Bars
* Volume Bars
* Dollar Bars
* Tick Imbalance Bars (EMA and Const)
* Volume Imbalance Bars (EMA and Const)
* Dollar Imbalance Bars (EMA and Const)
* Tick Run Bars (EMA and Const)
* Volume Run Bars (EMA and Const)
* Dollar Run Bars (EMA and Const)
```

In [0]:
from mlfinlab.data_structures.standard_data_structures import get_tick_bars, get_dollar_bars, get_volume_bars

**Tick Bars**

In [169]:
tick_bars = get_tick_bars(trades, threshold=100,batch_size=1000)
print(tick_bars.shape)
tick_bars.head()

Reading data in batches:
Batch number: 0
Returning bars 

(10, 10)


Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2020-01-12 03:36:06.051199913,100,8093.0,8100.5,8078.4,8085.0,29.41228,7.74862,100,237782.402849
1,2020-01-12 03:08:18.612699986,200,8083.3,8092.1,8056.9,8092.1,16.393723,9.220056,100,132472.413655
2,2020-01-12 02:52:25.030800104,300,8092.1,8100.5,8075.6,8090.0,52.253461,33.954036,100,422541.64044
3,2020-01-12 02:42:13.652199984,400,8088.4,8102.0,8084.5,8098.0,27.636726,13.881671,100,223740.686601
4,2020-01-12 02:26:25.944099903,500,8098.0,8098.0,8070.0,8079.2,40.374385,2.095649,100,326613.883068


**Volume Bars**

In [170]:
volume_bars = get_volume_bars(trades, threshold = 1, batch_size = 1000)
print(volume_bars.shape)
volume_bars.head()

Reading data in batches:
Batch number: 0
Returning bars 

(190, 10)


Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
0,2020-01-12 04:06:14.158799887,2,8093.0,8093.5,8093.0,8093.5,1.762147,0.842,2,14261.472948
1,2020-01-12 04:02:03.626199961,13,8093.8,8100.5,8089.6,8100.5,1.05156,1.04156,11,8515.213775
2,2020-01-12 04:00:51.234499931,18,8100.4,8100.4,8094.9,8094.9,1.703438,0.0,5,13792.8061
3,2020-01-12 03:59:28.093300104,24,8094.9,8095.0,8082.7,8082.7,3.925607,0.088,6,31730.703028
4,2020-01-12 03:59:28.088099957,27,8082.4,8082.4,8082.2,8082.2,1.475462,0.0,3,11924.984565


**Dollar Bars**

In [191]:
dollar_bars = get_dollar_bars(data[-10000:],threshold=10000, batch_size = 100000)
print(dollar_bars.shape)
dollar_bars.tail()

Reading data in batches:
Batch number: 0
Returning bars 

(1645, 10)


Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
1640,2020-01-11 23:18:56,9975,8049.9,8050.0,8049.9,8050.0,1.623856,1.623856,4,13071.939236
1641,2020-01-11 23:19:50,9981,8051.6,8057.0,8051.6,8057.0,2.122263,2.122263,6,17093.495467
1642,2020-01-11 23:19:50,9984,8057.2,8057.4,8057.2,8057.4,2.6807,2.6807,3,21599.42418
1643,2020-01-11 23:19:50,9985,8057.4,8057.4,8057.4,8057.4,3.709,3.709,1,29884.8966
1644,2020-01-11 23:19:50,9987,8058.4,8059.4,8058.4,8059.4,1.488037,1.488037,2,11992.103544


Finally let's work on time bars for comparison.

In [0]:
from mlfinlab.data_structures.time_data_structures import get_time_bars

In [185]:
time_bars = get_time_bars(data[-60000:], resolution='S', num_units=60, batch_size=2000000, verbose=True, to_csv=False)
time_bars['date_time'] = pd.to_datetime(time_bars['date_time'],unit='s')
print(time_bars.shape)
time_bars.tail()


Reading data in batches:
Batch number: 0
Returning bars 

(4377, 10)


Unnamed: 0,date_time,tick_num,open,high,low,close,volume,cum_buy_volume,cum_ticks,cum_dollar_value
4372,2020-01-11 23:20:00,59988,8051.6,8059.4,8051.6,8055.0,10.0,10.0,12,80569.919791
4373,2020-01-11 23:21:00,59990,8055.0,8055.0,8054.5,8054.6,0.015499,0.0,2,124.840253
4374,2020-01-11 23:22:00,59992,8054.6,8054.6,8052.8,8052.8,0.005308,0.005298,2,42.750899
4375,2020-01-11 23:23:00,59996,8052.8,8052.9,8052.6,8053.3,0.232454,0.226979,4,1871.925666
4376,2020-01-11 23:24:00,60000,8053.3,8053.3,8052.3,8052.3,0.352124,0.01,4,2835.478211


# Plotting Stuff

In [192]:
import altair as alt
open_close_color = alt.condition("datum.open < datum.close",
                                 alt.value("#06982d"),
                                 alt.value("#ae1325"))

rule = alt.Chart(dollar_bars[:]).mark_rule().encode(
    alt.X('date_time:T',
        scale=alt.Scale(domain=[{"month": 1, "date": 1, "year": 2019},
                                {"month": 2, "date": 1, "year": 2020}]),
        #axis=alt.Axis(format='%m/%d', title='Date')    
        ),
    alt.Y('low', title='Price', scale=alt.Scale(zero=False) ),
    alt.Y2('high'), color=open_close_color)

bar = alt.Chart(dollar_bars[:]).mark_bar().encode(
    x='date_time:T',
    y='open',
    y2='close',
    color=open_close_color
).interactive(bind_y=False)

rule + bar