In [None]:
import pandas as pd
import plotly.express as px
import polars as pl
import numpy as np

pd.options.plotting.backend = "plotly"
pd.set_option('display.precision', 17)

df = pd.read_csv('block-times-oct-27-2023.zip', index_col=0)

We want to see how much node timestamps on block signatures had varied historically.

In [None]:
min_time = df.min(axis=1)
max_time = df.max(axis=1)
delta = max_time - min_time
fig = px.line(delta)
fig.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

fig.show()


While there was a significant delta of 400s in one of the initial blocks this wasn't repeated so is probably safe to
ignore for now. There appears to be a somewhat consistently larger delta between ~800K blocks and 1.2M blocks.
It may be a bit misleading because this graph is so dense.

Let's see if we can zoom in on that 800K to 1.2M block range.

In [None]:
slow_area = px.line(delta[800_000:1_200_000])
slow_area.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

slow_area.show()


This still isn't clear enough, so we'll zoom in further to that high 200s point. 

First will print the date range to know when this extra variance occurred.

In [None]:
print(df.iloc[800_000].min().astype(dtype="datetime64[s]"))
print(df.iloc[1_200_000].min().astype(dtype="datetime64[s]"))

In [None]:
zoomed_slow_area = px.line(delta[979_200:979_400])
zoomed_slow_area.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

zoomed_slow_area.show()


Zooming in to a range of 200 blocks gives us a better idea. It looks like there was an issue where ~30 consecutive
blocks took a bit longer.
If we look at the timestamps per node perhaps we can better understand what's going on.

In [None]:
node_times = px.line(df[979270:979300].astype(dtype="datetime64[s]"))

node_times.update_layout(
    xaxis_title='Block',
    yaxis_title='Time (UTC)',
    legend=dict(
        orientation="h",
        yanchor="top",
        y=4,
        xanchor="center",
        x=0.5
    )
)

node_times.show()

Looking at that output it appears that for whatever reason the IdeasBeyondBorders node was being delayed noticeably more
than the other nodes.
Hint: you can mouse over the graph to see the exact values and node name.

Let's spot check one other hotspot.

In [None]:
zoomed_slow_tail = px.line(delta[1_168_900:1_169_000])
zoomed_slow_tail.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

zoomed_slow_tail.show()


In [None]:
node_times_2 = px.line(df[1_168_940:1_168_950].astype(dtype="datetime64[s]"))

node_times_2.update_layout(
    xaxis_title='Block',
    yaxis_title='Time (UTC)',
    legend=dict(
        orientation="h",
        yanchor="top",
        y=4,
        xanchor="center",
        x=0.5
    )
)

node_times_2.show()

It looks like the LongNowFoundation node was delayed in this instance. Just
based on these two examples it seems that we can't claim one node is delayed 
more than the others.

What we can do is plot the delta per node based on the median or mode of the timestamps.

In [None]:
# Switched to polars here because pandas was slow to do the median and deviation
# I need to learn how to plot with polars
median = df.median(axis=1)
deviation = pl.from_pandas(df) - pl.from_pandas(median)
pandas_deviation = deviation.to_pandas()

In [None]:
deviations = px.line(pandas_deviation[800_000:1_200_000])
deviations.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    legend=dict(
        orientation="h",
        yanchor="top",
        y=4,
        xanchor="center",
        x=0.5
    ),
)

deviations.show()


This can be a bit hard to separate the nodes. Since there are only 10 nodes we can plot them all individually.

In [None]:
per_node = px.line(pandas_deviation[800_000:1_200_000], facet_col="variable")
per_node.for_each_annotation(lambda a: a.update(text=''))
per_node.update_layout(
    yaxis_title='Timestamp delta(s)',
    legend=dict(
        orientation="h",
        yanchor="top",
        y=4,
        xanchor="center",
        x=0.5
    ),
)
per_node.show()

It's a little dense, but it appears that nodes for BlockDaemon, Ideas
Beyond Borders, and The Long Now Foundation were all facing delays during that time. The other nodes were fairly close
to each other. 

Another interesting thing we can look at is the distribution of the deltas.

In [None]:
histogram = px.histogram(delta)
histogram.update_layout(
    yaxis_title='Number of Blocks',
    xaxis_title='Timestamp delta(s)',
    showlegend=False,
)
histogram.show()

This is a bit dense, one can see that a significant number of blocks have a smaller than a 10-second timestamp delta.

Perhaps this data is better shown as a quantiles.

In [None]:
quantiles = delta.quantile([0.99, 0.95, 0.9, 0.75, 0.50, 0.25])
quantiles

Looking here we can see that 99% of the timestamps are within 17 seconds of each other. 95% are within 7, etc.

However, this data is keeping the most delayed node in the set. How do these values look if we remove the most delayed
node? To do this we'll use the median absolute deviation(MAD) to remove outliers. This will mimic a node not consenting
on the block because the time is out of acceptable range, because of these we'll want to see how many blocks would now
be rejected.

In [None]:
consensed_nodes = df.count(axis=1)
(consensed_nodes < 8).sum()

The first 10000 or so blocks only had 7 nodes. We need to take these into account when we see how many blocks may be rejected due to not consensing if the timestamp delta is too great.

In [None]:
df_copy = df.copy()
mad = np.abs(pandas_deviation).mean()
df_copy[np.abs(pandas_deviation) >= 3 * mad] = np.nan
time_consensed_nodes = df_copy.count(axis=1)
print((time_consensed_nodes < 8).sum())
print(((time_consensed_nodes < 8).sum() - (consensed_nodes < 8).sum()) / len(time_consensed_nodes))

It looks like throwing out the outliers results in 80,000 blocks that would have been rejected. Doing the math it
appears that this is a 4% failure rate. Let us plot and see what the new time delta looks like.

In [None]:
new_max = df_copy.max(axis=1)
new_min = df_copy.min(axis=1)
new_delta = new_max - new_min
new_fig = px.line(new_delta)
new_fig.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

new_fig.show()


We can see now that the nodes are all within 3 seconds of each other.

The logic to determine which outliers to remove used the magic number `3`
```python
df_copy[np.abs(pandas_deviation) >= 3 * mad] = np.nan
```
We can increase this value to increase the allowed time delta, to get a better idea of how many blocks would be rejected in each instance.

Another curiosity is if we've always been increasing in node timestamps or if there are ones that have stayed the same
or gone backward. Fog reports the minimum timestamp, so we can diff all of those.

In [None]:
subsequent_timestamp_differences = min_time.diff()
print(subsequent_timestamp_differences.min())
print(subsequent_timestamp_differences[subsequent_timestamp_differences <= 0].count())
subsequent_fig = px.line(subsequent_timestamp_differences)
subsequent_fig.update_layout(
    xaxis_title='Block',
    yaxis_title='Timestamp delta(s)',
    showlegend=False,
)

subsequent_fig.show()


It looks like we've never gone backwards in time, but we've had some blocks that occurred in the same second.