## AlphaMEV
### Goal of AlphaMEV is to automate/generalise most common MEV extraction on EVM comptiable block-chains. You can read more details about the way this project initially started on our twitter.
### While true North Star of this project is very far away and it's only the beggining, we've decided to host an ML competition to gather ideas from community and compare it against benchmarks.

## Competition Information
### Goal of this competition is to predict back-runable transactions and cumulative miner's profit that this transaction would generate. There are many examples of transactions which open MEV opportunities after them:
1) Oracle updates allow to perform liquidations.
2) Large AMM swaps allow to perform cross-DEX arbitrage.
3) Accepted govenance proposals which change pool parameters.
And many others.

## Each row of the training dataset contains following columns:
1) txHash - transaction hash on Ethereum blockchain
2) txData - dictionary representing all basic transaction information
3) txTrace - Geth-style transaction trace
4) Label0 - Binary label whether this transaction is back-runable.
5) Label1 - Total amount of ETH sent to miners as bribes via MEV-bundles due to this transaction.

## You can find link to the dataset below, it's a zip archive containing 2 files: "train.csv" and "test.csv".
## For each row in "test.csv" you're expected to generate two predictions separated by comma:
1) P[Label0 == 1]
2) E[Label1 | Label0 == 1]
## You can also find most basic solution in Python which generates required predictions in correct format using the link below.



In [1]:
import pandas
import numpy as np
import xgboost as xgb
import ast
import csv
 
# Solution is kept trivial and highly inefficient on purpose as it's provided
# purely as an example which should be straightforward to beat by anyone
def convert_dataset(dataset):
  examples = []
  for blob in dataset['txData']:
    txData = ast.literal_eval(blob)
    examples.append([
      int(txData['from'], 0) % (2 ** 30),
      (int(txData['to'], 0) if txData['to'] is not None else 0) % (2 ** 30),
      int(txData['gas'], 0),
      int(txData['gasPrice'], 0),
      (int(txData['input'][:10], 0) if txData['input'] != '0x' else 0) % (2 ** 30),
      int(txData['nonce'], 0),
    ])
  return np.array(examples)
 
train = pandas.read_csv('train.csv')
test = pandas.read_csv('test.csv')
testFeatures = convert_dataset(test)
 
binaryModel = xgb.XGBClassifier(n_estimators=50)
binaryModel.fit(convert_dataset(train), train['Label0'])
binaryPredictions = binaryModel.predict_proba(testFeatures)[:, 1]
 
regressionModel = xgb.XGBRegressor(n_estimators=50)
regressionModel.fit(
  convert_dataset(train[train['Label0'] == True]),
  train[train['Label0'] == True]['Label1']
)
regressionPredictions = regressionModel.predict(testFeatures)
 
submission = csv.writer(open('submission.csv', 'w', encoding='UTF8'))
for x, y in zip(binaryPredictions, regressionPredictions):
  submission.writerow([x, y])



