## Data partitioning

In this notebook we will take the conditioned data that we have, the data should be in the form of a two collumn csv that has the block numvber on the left and the MEV quantity sorted in to none = 0, low = 1, medium = 2, and high =3. 

In [3]:
import numpy as np 
import pandas as pd
from numpy import genfromtxt
import multiprocessing as mp
from multiprocessing import Pool
from numba import jit
import json
import csv

In [4]:
#let pull the data in and so that we can check how big the files are. 

my_data = genfromtxt('..\Ethdata\MEVcategory.csv', delimiter=',')
print(my_data)
print(np.shape(my_data))
my_data = np.delete(my_data, 0,0) #just to get rid of the index row
my_data = np.delete(my_data,0,1) #and the nan values at the top
print(my_data)


[[          nan           nan           nan]
 [0.0000000e+00 1.1834049e+07 4.0000000e+00]
 [1.0000000e+00 1.1834050e+07 0.0000000e+00]
 ...
 [3.1529040e+06 1.4986953e+07 0.0000000e+00]
 [3.1529050e+06 1.4986954e+07 4.0000000e+00]
 [3.1529060e+06 1.4986955e+07 4.0000000e+00]]
(3152908, 3)
[[1.1834049e+07 4.0000000e+00]
 [1.1834050e+07 0.0000000e+00]
 [1.1834051e+07 0.0000000e+00]
 ...
 [1.4986953e+07 0.0000000e+00]
 [1.4986954e+07 4.0000000e+00]
 [1.4986955e+07 4.0000000e+00]]


In [5]:
#what this function will do is take one data point and the next two after, the next one after will be used with that one for the x part of the data and the third will be the y position, 
#the answer that we are asking here is can you predict the next MEV quantity based on the two before. This might look a little arduous, however doing it this way all in np, is much 
#much mich faster than trying to do it any other way, a for loop over 10000000 data entries would just take way too long. 


#this parameter will set what the maximum datalet length will be i.e. how many blocks back we want to go to predict the MEV level of the next block

startrange = 2

endrange = 10

#start by making a dictionary to hold all this data

datalets = {}

# Then we loop over, the default is set from 2-10, this is the length of hte data length, the x vector will be one less than this. 

for i in range(startrange,endrange+1):
    
    # Start with an all zeros np array so we can get started, it will have i columns because that is the number of 
    
    datalet = np.zeros((np.shape(my_data)[0],i))
    
    #this will clone the price column, delete the top entry to shift it all down, put some nonsense in at the bottom then add it to the datalets dictionary.
    
    for j in range(i):
        data = np.delete(my_data[:,1],range(j),0)
        data = np.append(data,range(j))
        data = data.astype(int)
        datalet[:,j] = data
        
    # to not screw up the results this next couple of lines get rid of the nonsence that we added in the bottom rows. 
        
    datalet = datalet.astype(int)
    datalet = np.delete(datalet,range(i*(-1),0),0)
        
    datalets["datalet{}".format(i)] = datalet

print(datalets)  

{'datalet2': array([[4, 0],
       [0, 0],
       [0, 0],
       ...,
       [0, 4],
       [4, 0],
       [0, 4]]), 'datalet3': array([[4, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       ...,
       [0, 0, 4],
       [0, 4, 0],
       [4, 0, 4]]), 'datalet4': array([[4, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       ...,
       [4, 0, 0, 4],
       [0, 0, 4, 0],
       [0, 4, 0, 4]]), 'datalet5': array([[4, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       ...,
       [4, 4, 0, 0, 4],
       [4, 0, 0, 4, 0],
       [0, 0, 4, 0, 4]]), 'datalet6': array([[4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0],
       ...,
       [0, 4, 4, 0, 0, 4],
       [4, 4, 0, 0, 4, 0],
       [4, 0, 0, 4, 0, 4]]), 'datalet7': array([[4, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 4, ..., 0, 0, 4],
       [0, 4, 4, ..., 0, 4, 0],
       [4, 4, 0, ..., 4, 0, 4]]), 'datalet8': array([[4

## Saving these datalets as CSVs

In [6]:
for x,y in datalets.items():
    headers = []
    for i in range(np.shape(y)[1]):
        headers.append('period{}'.format(i))
    headerstr = ','.join(map(str,headers))
    np.savetxt("..\datalets\{}.csv".format(x), y, delimiter=",", header= headerstr, comments = '')