*Import the data and the packages*

In [5]:

import numpy as np
import pandas as pd
from scipy.stats import binomtest
from data_personal_example import transaction_data_file, block_data_file,  \
large_pre_gas_prices_file, suite_spot_txn, suite_spot_blx


#Read in the data, use your own machines specific path as you see fit
transaction_data=pd.read_csv(transaction_data_file)
block_data=pd.read_csv(block_data_file)
#read in the data and convert it to a list for better calculation speed
large_pre_gas_prices=list(pd.read_csv(large_pre_gas_prices_file).gas_price)

suite_txn=pd.read_csv(suite_spot_txn)
suite_blx=pd.read_csv(suite_spot_blx)

*Adding the gas limits into the dataframe*

first we add the gas limits into the transaction dataframe, as well as add the gas limits and base fee back into the dataframe

In [7]:
#Get the block number and gas limits and base fee from the dataset
my_block_number=list(block_data.block_number)
my_gas_limit=list(block_data.gas_limit)
my_base_fee=list(block_data.base_fee_per_gas)
#initialize a dictionary to assign gas limits and base fee to the transaction data
gas_limit_tracker={}
base_fee_tracker={}


#makes a dicionary with the key being the block number and the 
#value being the gas limits and base fee, because this will allow us to 
#easily assign a gas limit and base fee to the transaction going forward
for i in range(len(my_block_number)):
    gas_limit_tracker[my_block_number[i]]=my_gas_limit[i]
    base_fee_tracker[my_block_number[i]]=my_base_fee[i]
    
    
##get the block numbers from the transaction data
transaction_block_numbers=list(transaction_data.block_number)


#initialize a list for the purpose of saving the gas limits that will 
#be assigned to the transaction data and assign the correct information
gas_limits_for_transaction_data=[gas_limit_tracker[x] for x in transaction_block_numbers]
base_fee_for_transaction_data=[base_fee_tracker[x] for x in transaction_block_numbers]

    
#add the column into the dataframe
#transaction_data['gas_limit']=gas_limits_for_transaction_data
transaction_data['base_fee']=base_fee_for_transaction_data


# The final step is to remove the NA's from then dataframe, from testing of the 
# dataset, i have found that the max priority fee per gas and the max fee per gas
# have the the same number of NA's -this can be oberved with the line 
# np.sum(transaction_data.isna())- that we can remove the NA's with the line...
transaction_data=transaction_data[pd.notnull(transaction_data.max_fee_per_gas)]

*Rescaling the gas prices*

 now we need to add another column to revert the gas price into a metric that we 
can compare to the pre EIP 1559 data. to do this, we will need to work under the
 assumption that gas limits represent the same metric that they do in the pre EIP 
 1559 network (which is an assumption that the previous paper made that we will 
 continue in this proposal). then, we see that the user bid has a specific value 
 in the post EIP 1559 section which is min(base fee + tip , max tip), while in the pre EIP section the userbid is equal 
 to gas price * gas limit. Therefore, if we set these metrics to be equal, we can 
 solve for the equivilent of the gas prices in the post EIP section, giving us an ultimate answer of $\frac{min(base \: fee \: + \: tip)}{gas \: limit}  \: = \: pre \: EIP \: gas \: price$.

In [8]:
#get all the values...
b_fee=list(transaction_data.base_fee)
g_limit=list(transaction_data.gas)
m_fee=list(transaction_data.max_fee_per_gas)
tip=list(transaction_data.max_priority_fee_per_gas)


#Evaluate and store the rescaled gas prices
rescaled_gas_prices=[min(b_fee[x]+tip[x],m_fee[x])/g_limit[x] for \
                    x in range(len(b_fee))]

#transaction_data=transaction_data['rescaled_gas_prices']=rescaled_gas_prices

*Comparing variance*

We will be comparing the variance in two ways, first, we will be simply taking the variance of the entire dataset, then we will run a simulation where the code will randomly take sets of 500 from both the pre and post EIP 1559 data and compare the variance in a simulation of many times and reports the results.

*Clean up the data for comparison, remove outliers*

I will be using the "03_22_03_26.csv" dataset in the CAMCOS google drive for the largest portion of data available, for both the 03_22_03_26.csv dataset and the suite spot dataset I'm going to use 40,000 results for a more appropriate comparison of variance.

*First we will clean up the pre EIP dataset*

In [9]:
#randomly generate 40000 indexes for the larger dataset
pre_index=np.random.uniform(0,len(large_pre_gas_prices)-2,40000)
pre_index=[round(x) for x in pre_index]


#assign values with the random indexes
pre_gas_prices=[large_pre_gas_prices[x] for x in pre_index]
    

#gets 10% quantile and 90% quantile for both pre and post 
#for later use in removing outliers
pre_up_lim=np.quantile(pre_gas_prices,0.9)
pre_lo_lim=np.quantile(pre_gas_prices,0.1)
post_up_lim=np.quantile(rescaled_gas_prices,0.9)
post_lo_lim=np.quantile(rescaled_gas_prices,0.1)


#Remove the outliers, save the results in two variables that
#will be our final variables
pre_gas=[x for x in pre_gas_prices if (x<pre_up_lim) & (x>pre_lo_lim)]
post_gas=[x for x in rescaled_gas_prices if (x<post_up_lim) & (x>post_lo_lim)]

*Now to clean uo the suite spot data*

In [12]:
#get gas prices

#randomly generate 40000 indexes for the larger dataset
suite_index=np.random.uniform(0,len(large_pre_gas_prices)-2,40000)
suite_index=[round(x) for x in pre_index]

display(suite_txn)

Unnamed: 0,block_number,block_size,gas_limit,gas_used,timestamp
0,12475800,67620,14985280,14975121,1621574513
1,12475801,51571,14999911,14980854,1621574556
2,12475802,69401,14985264,14976410,1621574591
3,12475803,52812,14999897,14995444,1621574615
4,12475804,59044,15000000,14998997,1621574670
...,...,...,...,...,...
16097,12491295,55316,15000000,14984404,1621782641
16098,12491296,56221,14985353,14967308,1621782657
16099,12491297,47457,14999986,14972551,1621782667
16100,12491298,64960,15000000,14987757,1621782672


In [None]:



        
#######################################################################################################################
## Simulation 1: with non-ideal data
##
##
## NOTE!!! the data was of notably different scales, so in order to compare variance
## with any sort of accuracy, we must normalize the data in our simulation
#######################################################################################################################


##a function to normalize the gas price data via dividing by the square of the mean
def my_normalizer(my_list):
    my_mean=np.mean(my_list)**2
    return [x/my_mean for x in my_list]





#a function designed to take two lists, pre and post EIP respectively,
#and return False if post is bigger and True if post is smaller
def variance_checker(pre,post):
    if np.var(pre)<np.var(post):
        return False
    else:
        return True
    
    
#a function desinged to take two lists, along with a specefied integer, and then
#generate an amount of random indexes associated with indexes to the two lists 
#in the amount of the number specified
def random_index_generator(list1,list2,number):
    result1=list(np.random.uniform(0,len(list1)-2,number))
    result1=[round(x) for x in result1]
    result2=list(np.random.uniform(0,len(list2)-2,number))
    result2=[round(x) for x in result2]
    return [result1,result2]


#declare a variable to represent the number of trials to take place in the simulation 
trials=10000
#initialize a list to represent the output of the simulation
results=[]


#this code runs a simulation that randomly takes 500 observations from each dataset and 
#records the percentage of times the variance is smaller in the post EIP dataset
for i in range(trials):
    my_index=random_index_generator(pre_gas,post_gas,500)
    index_1=my_index[0]
    index_2=my_index[1]
    my_pre_gas=my_normalizer([pre_gas[x] for x in index_1])
    my_post_gas=my_normalizer([post_gas[x] for x in index_2])
    results.append(variance_checker(my_pre_gas,my_post_gas))

    
#output results of simulation and simple variance of the two datasets
print("the variance in the post EIP-1559 data is " +  str(np.var(my_normalizer([post_gas]))) + \
      " and the variance in the pre EIP-1559 data is " + str(np.var(my_normalizer([pre_gas]))) + \
      ". the percentage of times the variance was lower in post EIP-1559 data " + \
      "durring our simulation after normalizing was " +
      str(int(round((sum(results)/len(results))*100))) + '%. Note, the data had ' + \
      "to be normalized to make up for the discrepency of size in the units")


#output summary stats of pre and post EIP gas prices
print('\n Some summary stats: \n \t Pre-EIP: \n')
print('\t Max: ' + str(np.max(pre_gas)))
print('\n \t Min: ' + str(np.min(pre_gas)))
print('\n \t Mean: ' + str(np.mean(pre_gas)))
print('\n \t Variance: ' + str(np.var(pre_gas)))
print('\n \t Quartile 25,50,75: ' + str(np.quantile(pre_gas,0.25)) + "," + \
      str(np.quantile(pre_gas,0.5)) + ',' +  str(np.quantile(pre_gas,0.75)))
print('\n \n \t Post-EIP:')
print('\t Max: ' + str(np.max(post_gas)))
print('\n \t Min: ' + str(np.min(post_gas)))
print('\n \t Mean: ' + str(np.mean(post_gas)))
print('\n \t Variance: ' + str(np.var(post_gas)))
print('\n \t Quartile 25,50,75: ' + str(np.quantile(post_gas,0.25)) + "," + \
      str(np.quantile(post_gas,0.5)) + ',' +  str(np.quantile(post_gas,0.75)))


##############################################################################################################################
##Implimentation of the c-test
##############################################################################################################################
##
##This method works by approximating the joint probability distribution of X_1 and X_2 as 
##a binomial distribution, where in the binomial distibtion, the x parameter is lambda_1 
##(from X_1), the n parameter is lambda_1+lambda_2, and the p parameter is n_1/(n_1+n_2)
##
##This test finds the p-value corresponding to the ratio lambda_1/lambda_2, the reasoning
##being that if the ratio is large, then that means lambda 1 is larger than lambda 2 (and 
##thus that variance of the pre EIP gas price is larger than the variance of the post EIP
## gas price) to a statistically significant degree. Hence, we perform a "greater than" 
##binomial test to determine of the if the variance is smaller in the post EIP framework
##
##This method was retrieved from the following sources: 
##
##Refference-1:https://stats.stackexchange.com/questions/109402/c-test-for-comparing-poisson-means-in-scipy
##
##Refference-2:https://cran.r-project.org/web/packages/rateratio.test/vignettes/rateratio.test.pdf
##############################################################################################################################


##grab the parameters for the test
my_x=int(np.mean(pre_gas))
my_n=int(np.mean(pre_gas))+int(np.mean(post_gas))
my_p=len(pre_gas)/(len(pre_gas)+len(post_gas))


#print the results of the test to the user
print('\n \n') 
print(binomtest((my_x),my_n,my_p,alternative='greater'))
print('\n \n' + 'This result means that we reject the null hypothesis, meaning that the' + \
      ' variance in the pre EIP framework \n' + 'is larger than the post EIP to a ' +\
      'statistically significant degree' )

the variance in the post EIP-1559 data is 2.032898487421631e-12 and the variance in the pre EIP-1559 data is 1.8730212638091124e-24. the percentage of times the variance was lower in post EIP-1559 data durring our simulation after normalizing was 0%. Note, the data had to be normalized to make up for the discrepency of size in the units

 Some summary stats: 
 	 Pre-EIP: 

	 Max: 256300001605

 	 Min: 108000000001

 	 Mean: 164944277381.53662

 	 Variance: 1.386410115119848e+21

 	 Quartile 25,50,75: 134400000383.0,158000000000.0,189000000000.0

 
 	 Post-EIP:
	 Max: 2496463.8959047617

 	 Min: 132685.97152251133

 	 Mean: 617233.1705626977

 	 Variance: 295061619654.1383

 	 Quartile 25,50,75: 178221.58082857143,413566.0087105776,828456.6181150794

 

BinomTestResult(k=164944277381, n=164944894614, alternative='greater', proportion_estimate=0.9999962579441974, pvalue=0.0)

 
This result means that we reject the null hypothesis, meaning that the variance in the pre EIP framework 
is la

  return _boost._binom_sf(k, n, p)
