In [19]:
from IPython.display import HTML
HTML('''<script>
code_show=false; 
function code_toggle() {
 if (code_show){
 $('div.input').show();
 } else {
 $('div.input').hide();
 }
 code_show =! code_show
} 
$( document ).ready(code_toggle);
</script>

<font> The Python code is hidden by default, if you'd like to see it, click <a href="javascript:code_toggle()">here</a>.</font>
''')

In [1]:
## Steup - libraries
import quandl
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import numpy as np
import credentials # keep my quandl and plot.ly api keys private

import plotly.plotly as py
import plotly.graph_objs as go

#import cufflinks as cf
#cf.set_config_file(world_readable=True,offline=False)

In [2]:
## Setup - appearance
# add a ; at the end to suppress function output (when plotting)
# matplotlib.style.use('ggplot')

# get rid of the annoying 'SettingWithCopy' warning
pd.options.mode.chained_assignment = None # default='warn'

# more than one print of an unassigned variable
from IPython.core.interactiveshell import InteractiveShell;
InteractiveShell.ast_node_interactivity = "all";

# Display plot results inline, not in a separate window
#%matplotlib inline

color1 = '#137a28' # dark green
color2 = '#b3d1b9' # light transparent green

## Number of transactions awaiting confirmation (Mempool size)

In [3]:
# import mempool data downloaded from https://blockchain.info/charts/mempool-count?timespan=all
mempool = pd.read_csv(
    'mempool-count.csv', 
    header=None, 
    names=['DateTime','size'], 
    parse_dates=[0], 
    infer_datetime_format=True
    )

# split the datetime to date and time
temp = pd.DatetimeIndex(mempool['DateTime'])
mempool['Date'] = temp.date
mempool['Time'] = temp.time
del mempool['DateTime']

# reorder the columns
cols = ['Date','Time','size']
mempool = mempool[cols]
mempool.set_index('Date', inplace=True)

In [4]:
# there are 3 values per day. get average mempool size for each day
mempool = mempool.groupby([mempool.index.get_level_values('Date')]).mean()

In [5]:
# mempool.loc[mempool['size'].idxmax()]

During April and May 2017 the number of transactions waiting to be confirmed on the Bitcoin blockchain increase to an all time maximum of 175,978 on May 18th, compared to an average value in 2016 of less than 10,000.  

The number of unconfirmed transactions fell as quickly as it rose and by mid July was generally below 10,000 again. 

The figure below shows the erratic changes in the number of unconfirmed transactions. **This post seeks to consider what may have caused it.**

In [6]:
mempool['size_av']=mempool['size'].rolling(window=7).mean();

series1 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=mempool['size'], 
    name='Daily average',
    line = dict(
        color = (color2),
        width = 2,))

series2 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=mempool['size_av'], 
    name='Weekly average',
    line = dict(
        color = (color1),
        width = 3,))

data = [series1, series2]

layout = go.Layout(
    title='Unconfirmed transactions',
    yaxis=dict(title='Number of transactions'),
    legend=dict(orientation="h", yanchor='top', y=1, xanchor='center', x=0.5)
    )

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='bitcoin_mempool')

## Median time taken for transactions to be accepted into a block (Minutes)

It is expected that the average time taken to confirm a transaction would increase proportionally to the size of the confirmation pool. 

In [7]:
# The Daily Median time taken for transactions to be accepted into a block, presumably in minutes
ATRCT = quandl.get("BCHAIN/ATRCT")
ATRCT = ATRCT.loc['2016-04-24':]

In [8]:
ATRCT['artct_av']=ATRCT['Value'].rolling(window=7).mean();

series1 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=ATRCT['Value'], 
    name='Daily median time to acceptance',
    line = dict(
        color = (color2),
        width = 2))

series2 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=ATRCT['artct_av'], 
    name='Moving weekly average',
    line = dict(
        color = (color1),
        width = 3))

data = [series1, series2]

layout = go.Layout(
    title='Daily median time taken for transactions to be accepted into a block',
    yaxis=dict(title='Time (minutes)'),
    legend=dict(orientation="h", yanchor='top', y=1.1, xanchor='center', x=0.5)
    )

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='btc_acceptance_time')

In [9]:
ATRCT['Value'].corr(mempool['size'], method='pearson', min_periods=None)

0.37360977521220956

The median transaction time does not increase noticably when the pool of unconfirmed transactions increases, in fact the two features have only a weak Pearson correlation of 0.37. This is surprising as intuition would suggest that the larger the pool of transactions waiting to be confirmed, the longer it would take to confirm a transaction.

However it is worth noting that we are looking at the median average, which would negate the effect of outlier transactions which took a very long time to be accepted. Perhaps the average size of a transaction decreased as the number of unconfirmed transactions increased.. nope.

Perhaps and the reason for the lack of correlation is unclear. 

## Average block size (Daily average, MB) 

Each block in the Bitcoin network is limited (before 1 August 2017) to 1MB. As the Bitcoin network has grown and transaction volume has increased the blocksize limit became a key limitation to the transaction volume the network could support. Was the mempool size increase due to the blocks getting "filled up" to their maximum 1MB size?

In [10]:
# The Average block size in MB
AVBLS = quandl.get("BCHAIN/AVBLS")

In [11]:
av_bs = AVBLS.loc['2016-04-24':]
av_bs['Size']=av_bs['Value']
del av_bs['Value']

In [12]:
av_bs['Size_av']=av_bs['Size'].rolling(window=7).mean();

series1 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=av_bs['Size'], 
    name='Average block size',
    line = dict(
        color = (color2),
        width = 2))

series2 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=av_bs['Size_av'], 
    name='Moving weekly average',
    line = dict(
        color = (color1),
        width = 3))

data = [series1, series2]

layout = go.Layout(
    title='Average block size',
    yaxis=dict(title='Block size (MB)'),
    legend=dict(orientation="h", yanchor='top', y=1.1, xanchor='center', x=0.5)
    )

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='btc_block_size')

From March through June the blocksizes seem to have frequently hit their maximum possible size, suggesting that the bitcoin network was processing the maximum amount of data possible. 

The average block size began a sharp decresae on July 2nd, and at the same time the median transaction confirmation time also began a quick reduction. By July 2nd the number of transactions has already completed its reduction back to "normal" levels. 

Not all transactions are the same size, as a transaction can have any number of outputs (and inputs?), and a transaction with many outputs would be be a larger amount of data than a transaction with only 1 output. Lets confirm if the number of transactions increased over the same period:

## Average number of transactions per (1MB) block

In [13]:
# The average number of transactions per block. each day?
NTRBL = quandl.get("BCHAIN/NTRBL")
NTRBL = NTRBL.loc['2016-04-24':]

In [14]:
NTRBL['Size_av']=NTRBL['Value'].rolling(window=7).mean();

series1 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=NTRBL['Value'], 
    name='Average transactions per block',
    line = dict(
        color = (color2),
        width = 2))

series2 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=NTRBL['Size_av'], 
    name='Moving weekly average',
    line = dict(
        color = (color1),
        width = 3))

data = [series1, series2]

layout = go.Layout(
    title='Average number of transactions per block',
    yaxis=dict(title='Number of transactions'),
    legend=dict(orientation="h", yanchor='top', y=1.1, xanchor='center', x=0.5)
    )

fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='btc_tnsx_per_block')

The average number of transactions per block hit a peak at the end of May 2017, and fell sharply at the beginning of June and then again at the beginning of July. 

In June the blocksizes remained more or less as large as possible, but in July the average blocksize had reduced to significantly below its maximum. 

The difference in average blocksizes at these two times suggests that at the beginning of June the number of transactions reduced because the average size of transactions had increased, but in July the number of transactions per block reduced because fewer transactions were being created.  

Perhaps bitcoin exchanges and other organsiations with high transaction volumes had changed their behaviour and begun posting larger transactions with many outputs, compared to previously posting more transactions each with fewer outputs. 

## Total value (BTC) of transactions fees earned by miners per day

Transaction fees are charged to users sending Bitcoin by users operating a node on the network. Node operators ("Bitcoin miners") collect unconfirmed transactions, confirm their validity and perform the proof-of-work requirements to build these transactions into a new block of the blockchain. 

In order to provide an incentive for node operators to process and confirm new transactions, and to compensate them for the equipment and energy costs required to do so, a transaction fee is charged to confirm each transaction. The size of the fee needs to be proportional to the size (in bytes) of the transaction, otherwise miners would prefer smaller sized transactions as they could fit more into each block and thus make a higher profit from mining the block. 

A user making a transaction can offer to pay a higher or lower transaction fee depending on how quickly they want the transaction processed/confirmed. This is because miners (who collect unconfirmed transactions to process into the next block) will choose transactions with higher fees over those with lower fees. 

Therefore it is expected that as the backlog of unprocessed transactions increases, the fees paid to ensure a transaction gets processed will also increase. This is shown in the figure below:

=== we could also look at time to confirm a transaction as we would expect that to increase with mempool size ===

In [15]:
# transaction fees - the total BTC value of transaction fees miners earn per day.
TRFEE = quandl.get("BCHAIN/TRFEE")

In [16]:
tn_fee = TRFEE.loc['2016-04-24':]
tn_fee['Fee']=tn_fee['Value']
del tn_fee['Value']

In [17]:
tn_fee['Fee_av']=tn_fee['Fee'].rolling(window=1).mean();

trace1 = go.Scatter( 
    x=mempool.index.get_level_values('Date'), 
    y=mempool['size'], 
    name='Mempool' ) # used later

trace2 = go.Scatter( 
    x=tn_fee.index.get_level_values('Date'), 
    y=tn_fee['Fee_av'], 
    name='tn_fee',
    yaxis='y2' )

data2 = [trace1, trace2]

layout = go.Layout(
    title='Double Y Axis Example',
    yaxis=dict(
        title='yaxis title'
        ),
    yaxis2=dict(
        title='yaxis2 title',
        titlefont=dict(
            color='rgb(148, 103, 189)'
        ),
    tickfont=dict(
            color='rgb(148, 103, 189)'
        ),
    overlaying='y',
    side='right'
    )
)
fig = go.Figure(data=data2, layout=layout)
py.iplot(fig, filename='multiple-axes-double')

#py.iplot(data, filename='btc_AVBLS_av')


In [18]:
tseries = [mempool['size'],ATRCT['Value'],av_bs['Size'],NTRBL['Value'],tn_fee['Fee']]
cols = ['unconf_tnsx','conf_time','block_size','tnsx_per_block','tn_fee']
tbl = np.zeros((len(tseries), len(tseries))) 

for i in range(len(tseries)):
    for j in range(len(tseries)):
        tbl[i,j] = tseries[i].corr(tseries[j], method='pearson', min_periods=None)

pd.DataFrame(data=tbl,    # values
              index=cols,    # 1st column as index
              columns=cols)  # 1st row as the column names



Unnamed: 0,unconf_tnsx,conf_time,block_size,tnsx_per_block,tn_fee
unconf_tnsx,1.0,0.37361,0.434051,0.485007,0.659499
conf_time,0.37361,1.0,0.642878,0.623867,0.485486
block_size,0.434051,0.642878,1.0,0.83488,0.624348
tnsx_per_block,0.485007,0.623867,0.83488,1.0,0.494258
tn_fee,0.659499,0.485486,0.624348,0.494258,1.0


This figure shows a strong correlation between transaction fees and the size of the mempool, as expected. We can see that changes in the size of fees on the bitcoin network lag behind changes in the mempool size by X days on average. 

This has quanitifed some of the effects of changes in the mempool size, but has not explained what caused the mempool to increase.

**number of transactions per day** must be proportional to **transactions per block**, as the number of blocks per day is fixed (right?)

**number of transactions per block** increases consistently until may 2017 when it suddenly decreases - showing that **average transaction size** increased suddenly. the same effect is shown by the falling **number of unique btc addresses** and **number of transactions per day** - due to the same reasons. 

Proxy for **exchange activity**. (sending btc towards exchanges?) = **Number of transactions per day excluding popular addresses (exchanges)** - **number of transaction per day**


**outputs to other addresses** not the same (or is it) as **all transactional outputs** maybe.....

Estimated transaction volume (ignores senders “change”) - total output volume are not correlated. Why?

how cheap is the bitcoin network for a user... 