# ANOVA on network revenue


In this notebook we present a simple ANalysis Of VAriance (ANOVA) on the network revenue, as impacted by (i) hyperdrive [FIP-0013](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0013.md) and (ii) the subsequent  upgrade in [FIP-0024](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0024.md). In short, here we perform a simple statistical test to corroborate wether FIP-0013 or FIP-0024 had any effect on network revenue. 


As it can be seen in the [lotus releases](https://github.com/filecoin-project/lotus/releases), FIP-0013 was intrudced in Lotus v1.10.0, which took place in 23rd of June, 2021, while FIP-0024 was introduced in Lotus v1.12.0 which was released on October 12th, 2021.


We begin by loading the required packages and a dataset of daily network revenue. 

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv('gasDaily.csv')
#cutoff day for when hyperdrive went live, plus some wiggle room
cutoff_13='2021-06-23'
cutoff_24='2021-10-12'
df_pre=df[df['stat_date']<cutoff_13]
df_post=df[df['stat_date']>cutoff_13]

df_post_13=df[cutoff_13<df['stat_date']]
df_post_13=df_post_13[df_post_13['stat_date']<cutoff_24]            
df_post_24=df[df['stat_date']>cutoff_24]


df_pre['time']=len(df_pre)*['pre-hyperdrive']
df_post_13['time']=len(df_post_13)*['post FIP-0013']
df_post_24['time']=len(df_post_24)*['post FIP-0024']
df=pd.concat([df_pre,df_post_13,df_post_24])

In [8]:
r=df_post['protocol_revenue'].mean()/df_pre['protocol_revenue'].mean()
ri=round(100*(1-r),2)
print('We can see that there was a decrease of {} %'.format(ri))

We can see that there was a decrease of 87.95 %


In [9]:
r1=df_pre['protocol_revenue'].std()/df_pre['protocol_revenue'].mean()
r2=df_post['protocol_revenue'].std()/df_post['protocol_revenue'].mean()



print('ratio std/mean for pre-hyperdrive {:.2f}'.format(r1))
print('ratio std/mean for post-hyperdrive {:.2f}'.format(r2))





ratio std/mean for pre-hyperdrive 0.70
ratio std/mean for post-hyperdrive 0.90


Which means that hyperdrive reduced the intra-day variance of protocol revenue. 

# Visualisation


Let's visualise as a box-and-whiskers plot

In [10]:
import plotly.express as px
fig = px.box(df, x="time", y="protocol_revenue",color='time')
fig.show()
fig = px.histogram(df,  x="protocol_revenue",color='time')
fig.show()

df.groupby(by='time')['protocol_revenue'].var()


time
post FIP-0013     4.007571e+07
post FIP-0024     1.190309e+08
pre-hyperdrive    4.190421e+09
Name: protocol_revenue, dtype: float64

As we can see, there is a clear difference between the revenue in the pre-hyper drive days and the one after that. This is confirmed by the following Kruskal ANOVA test. AS a reminer, recall that such a test compares the following two hypothesis:


$H_0$: all  samples come from the same distribution.

$H_a$: Not $H_0$.

In [12]:
from scipy import stats
stats.kruskal(df_pre['protocol_revenue'], df_post_13['protocol_revenue'], df_post_24['protocol_revenue'])



KruskalResult(statistic=293.2963573040406, pvalue=2.0488267240975284e-64)

Performing the same test for network revenue post FIP0013 and before FIP0024 and post FIP0024 yields the results below.

In [13]:
from scipy import stats
stats.kruskal(df_post_13['protocol_revenue'], df_post_24['protocol_revenue'])


KruskalResult(statistic=1.4204887234095622, pvalue=0.23332297835940247)

Given that the p-value of the test is fairly large, one could not reject the null hypothesis in favor of the alternative one, at any confidence level beyond 77%.m 