## Data partitioning

In this notebook we will take the conditioned data that we have, the data should be in the form of a two collumn csv that has the block numvber on the left and the MEV quantity sorted in to none = 0, low = 1, medium = 2, and high =3. 

In [2]:
import numpy as np 
import pandas as pd
from numpy import genfromtxt
import multiprocessing as mp
from multiprocessing import Pool
from numba import jit
import json
import csv

In [4]:
#let pull the data in and so that we can check how big the files are. 

my_data = genfromtxt('../Ethdata/randomdata.csv', delimiter=',')
print(my_data)
print(np.shape(my_data))


[[0.000000e+00 3.000000e+00]
 [1.000000e+00 2.000000e+00]
 [2.000000e+00 0.000000e+00]
 ...
 [9.999997e+06 3.000000e+00]
 [9.999998e+06 2.000000e+00]
 [9.999999e+06 2.000000e+00]]
(10000000, 2)


In [5]:
#what this function will do is take one data point and the next two after, the next one after will be used with that one for the x part of the data and the third will be the y position, 
#the awnser that we are asking here is can you predict the next MEV quantity based on the two before. This might look a little ardious, however doing it this way all in np, is much 
#much mich faster than trying to do it any other way, a for loop over 10000000 data entries would just take way too long. 

datalets = np.zeros((np.shape(my_data)[0],3))

datalets[:,0] = my_data[:,1]

my_data2 = np.delete(my_data[:,1],0,0)

my_data2 = np.append(my_data2,[0])

datalets[:,1] = my_data2

my_data3 = np.delete(my_data[:,1],[0,1],0)

my_data3 = np.append(my_data3,[0,0])

datalets[:,2] = my_data3

#lastly we need to delete the last three rows, as I had to just input zeros there to to that the array didn't have any missing point,

datalets = np.delete(datalets,[-1,-2,-3],0)

print(datalets)


[[3. 2. 0.]
 [2. 0. 2.]
 [0. 2. 1.]
 ...
 [3. 2. 0.]
 [2. 0. 3.]
 [0. 3. 2.]]


In [10]:
# Ok now lets automate doing that a bunch more times so we aren't doing this for every datalet length. 

#start by making a dictionary to hold all this data

datalets = {}

# Then we loop over, the default is set from 2-10, this is the length of hte data length, the x vector will be one less than this. 

for i in range(2,10):
    
    # Start with an all zeros np array so we can get started, it will have i columns because that is the number of 
    
    datalet = np.zeros((np.shape(my_data)[0],i))
    
    for j in range(i):
        data = np.delete(my_data[:,1],range(j),0)
        data = np.append(data,range(j))
        data = data.astype(int)
        datalet[:,j] = data
        
    datalet = datalet.astype(int)
    datalet = np.delete(datalet,range(i*(-1),0),0)
        
    datalets["datalet{}".format(i)] = datalet

print(datalets)  

{'datalet2': array([[3, 2],
       [2, 0],
       [0, 2],
       ...,
       [2, 0],
       [0, 3],
       [3, 2]]), 'datalet3': array([[3, 2, 0],
       [2, 0, 2],
       [0, 2, 1],
       ...,
       [3, 2, 0],
       [2, 0, 3],
       [0, 3, 2]]), 'datalet4': array([[3, 2, 0, 2],
       [2, 0, 2, 1],
       [0, 2, 1, 1],
       ...,
       [3, 3, 2, 0],
       [3, 2, 0, 3],
       [2, 0, 3, 2]]), 'datalet5': array([[3, 2, 0, 2, 1],
       [2, 0, 2, 1, 1],
       [0, 2, 1, 1, 0],
       ...,
       [3, 3, 3, 2, 0],
       [3, 3, 2, 0, 3],
       [3, 2, 0, 3, 2]]), 'datalet6': array([[3, 2, 0, 2, 1, 1],
       [2, 0, 2, 1, 1, 0],
       [0, 2, 1, 1, 0, 1],
       ...,
       [0, 3, 3, 3, 2, 0],
       [3, 3, 3, 2, 0, 3],
       [3, 3, 2, 0, 3, 2]]), 'datalet7': array([[3, 2, 0, ..., 1, 1, 0],
       [2, 0, 2, ..., 1, 0, 1],
       [0, 2, 1, ..., 0, 1, 3],
       ...,
       [3, 0, 3, ..., 3, 2, 0],
       [0, 3, 3, ..., 2, 0, 3],
       [3, 3, 3, ..., 0, 3, 2]]), 'datalet8': array([[3

## Saving these datalets as CSVs

In [11]:
for x,y in datalets.items():
    headers = []
    for i in range(np.shape(y)[1]):
        headers.append('period{}'.format(i))
    headerstr = ','.join(map(str,headers))
    np.savetxt("..\datelets\{}.csv".format(x), y, delimiter=",", header= headerstr, comments = '')