## Fetch Historical Blocks

Loop over each day since Bitcoin genesis and fetch the mined blocks using the Blockchain Data API.

The URL format for the Blocks endpoint looks like the following:
* Blocks for one day: https://blockchain.info/blocks/$time_in_milliseconds?format=json

The data will be saved into a blocks.pkl pickle file, so that it can processed later.

In [13]:
import os
import sys
import time
import pickle
import requests

# Set the starting timestamp to Jan 8, 2009 in milliseconds
start_time = 1231300800 * 1000

# Get the current timestamp in milliseconds
end_time = int(time.time() * 1000)

# Create the data dir if needed
if not os.path.exists("./data"):
    os.makedirs("./data")

# Save our output so we can re-use it
with open('./data/blocks.pkl', 'wb') as output:

    # Loop through each day fetching the blocks
    for timestamp in range(start_time, end_time, 24 * 60 * 60 * 1000):
        
        print(timestamp)
        
        # We will try up to 5 times, after that, we give up for this date.
        # The missing blocks can be fetched later
        for tries in range(5):
            # Fetch the data for the current day and append to our results
            request = requests.get(f"https://blockchain.info/blocks/{timestamp}?format=json")
            if request.status_code == 200:
                pickle.dump(request.json(), output, pickle.HIGHEST_PROTOCOL)
                break
            else:
                # Wait 100ms and try again
                time.sleep(0.1)
                print(f"Failed the request with code: {request.status_code}")
    

1231300800000
1231387200000
1231473600000
1231560000000
1231646400000
1231732800000
1231819200000
1231905600000
1231992000000
1232078400000
1232164800000
1232251200000
1232337600000
1232424000000
1232510400000
1232596800000
1232683200000
1232769600000
1232856000000
1232942400000
1233028800000
1233115200000
1233201600000
1233288000000
1233374400000
1233460800000
1233547200000
1233633600000
1233720000000
1233806400000
1233892800000
1233979200000
1234065600000
1234152000000
1234238400000
1234324800000
1234411200000
1234497600000
1234584000000
1234670400000
1234756800000
1234843200000
1234929600000
1235016000000
1235102400000
1235188800000
1235275200000
1235361600000
1235448000000
1235534400000
1235620800000
1235707200000
1235793600000
1235880000000
1235966400000
1236052800000
1236139200000
1236225600000
1236312000000
1236398400000
1236484800000
1236571200000
1236657600000
1236744000000
1236830400000
1236916800000
1237003200000
1237089600000
1237176000000
1237262400000
1237348800000
123743

## Fetch any missing blocks

Because the above may not have found all blocks, lets look at the data to see if we are missing any. If so, fetch the missing blocks with the Block Height endpoint of the Blockchain Data API.

We'll update the blocks.pkl file with the missing blocks (if there are any) so that we have everything we need.

The URL format for the Block Height endpoint looks like the following:
* https://blockchain.info/block-height/$block_height?format=json

In [34]:
# Read in the saved block data
blocks = []
with open('./data/blocks.pkl', 'rb') as input:
    try:
        while True:
            blocks += pickle.load(input)
    except EOFError:
        pass

# Get the blocks indexes and sort them
block_indexes = set(map(lambda block: block['block_index'], blocks))

# What is the highest block we've seen?
max_index = max(block_indexes)

# Get the full range of blocks so that we can determine which, or if, any are missing
block_list = set(range(0,max_index+1))

# Get the missing block indexes
missing_blocks = block_list - block_indexes

print(len(blocks))
print(len(missing_blocks))
print(missing_blocks)

blocks_to_add = []

# Run through the missing blocks
for block in missing_blocks:

    # We will try up to 5 times, after that, we give up for this block.
    # This block of code can be re-run to try and get any missing blocks again
    for tries in range(5):
        # Fetch the data for the current day and append to our results
        request = requests.get(f"https://blockchain.info/block-height/{block}?format=json")
        if request.status_code == 200:
            result = request.json()
            blocks_to_add += result['blocks']
            break
        else:
            # Wait 100ms and try again
            time.sleep(0.1)
            print(f"Failed the request with code: {request.status_code}")

# Append these missing blocks to our list
blocks += list(map(lambda record: {'hash': record['hash'], 'height': record['height'], 'time': record['time'], 'block_index': record['block_index']},blocks_to_add))

# Save our updated output so we can re-use it
with open('./data/blocks.pkl', 'wb') as output:
    pickle.dump(blocks, output, pickle.HIGHEST_PROTOCOL)

747747
0
set()


## Prepare the data for use

Read in our full list of blocks, sort them by their index and compute the time delta between them.

In [72]:
# Read in the saved block data
blocks = []
with open('./data/blocks.pkl', 'rb') as input:
    try:
        while True:
            blocks += pickle.load(input)
    except EOFError:
        pass

# Sort the list by block_index
sorted_blocks = sorted(blocks, key=lambda block: block['block_index'])

# Get the times for each block
block_times = list(map(lambda block: block['time'], sorted_blocks))

# Calculate the number of minutes between blocks
time_diff = [(block_times[i] - block_times[i-1])/60 for i in range(1,len(block_times))]

# Time Delta Statistics

With this data we are going to see how often we'd expect a block to be separated by its previous block by more than two hours, if the block times had a normal distribution.

We are then going to compare that with what actually happened.

In [90]:
import math
from statistics import mean, stdev

# Determine the probability of a 2 hour or greater block if the data was normally distributed.
avg = mean(time_diff)
cdf = (1+math.erf((120-avg)/stdev(time_diff)/math.sqrt(2)))/2
prob = 1 - cdf

# Get our actual probability based on the historical data
actual = len(list(filter(lambda time: time >= 120, time_diff))) / len(time_diff)

days_between_long_blocks = round(avg / actual / 60)

print(f"If the time between blocks were normally distributed, we'd expect to see two hours or more between blocks one in every {int(round(1/prob))} blocks.")
print()
print(f"However, we've actually seen 2 hours or more between blocks on average one in every {int(round(1/actual))} blocks, or once every {days_between_long_blocks} days")

If the time between blocks were normally distributed, we'd expect to see two hours or more between blocks one in every 375299968947541 blocks.

However, we've actually seen 2 hours or more between blocks on average one in every 4919 blocks, or once every 783 days


## Determine the number of blocks with 2 hours or more between them

Filter this list to only those that had 2 hours or more between them.

In [87]:
# Determine how many times there were blocks separated by more than 2 hours
long_block_deltas = len(list(filter(lambda time: time >= 120, time_diff)))

print(f"There were {long_block_deltas} times where consecutive blocks were separated by 2 hours or more.")

There were 152 times where consecutive blocks were separated by 2 hours or more.
