# 4.6 MEV Risk Insights

In this notebook, we consolidate all MEV-related signals from:

- **4.1**: Time-based MEV activity patterns  
- **4.2**: Router frequency analysis  
- **4.3**: Centrality-based MEV indicators  
- **4.4**: Community detection  
- **4.5**: Router-path arbitrage signatures (no swap events)

Our goal is to derive a **combined MEV risk score** and highlight:

- High-risk MEV bot candidates  
- Router-driven transaction clusters  
- Same-block arbitrage bundles  
- Repeated router path signatures  
- Community-level MEV structures

This brings all the pieces together for a comprehensive MEV risk overview.

## 1. Imports + Load Processed Intermediate Features



In [1]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (10,5)
plt.rcParams["axes.grid"] = True

PROJECT_ROOT = os.path.abspath(os.path.join(os.getcwd(), "..", ".."))
sys.path.append(PROJECT_ROOT)

from src.data.load_data import load_clean_transactions

In [2]:
# Load base tx
tx = load_clean_transactions()
tx["datetime"] = pd.to_datetime(tx["block_timestamp"])
print("Loaded tx:", len(tx))

Loaded tx: 13268


### 1.1 Load MEV Path Features from 4.5

In [4]:
INTERIM = os.path.join(PROJECT_ROOT, "data", "processed")

path_global_path = os.path.join(INTERIM, "mev_router_paths_global.csv")
addr_paths_path  = os.path.join(INTERIM, "mev_router_paths_per_address.csv")
susp_bundles_path = os.path.join(INTERIM, "mev_suspicious_bundles.csv")

df_path_global = pd.read_csv(path_global_path) if os.path.exists(path_global_path) else None
df_addr_paths = pd.read_csv(addr_paths_path) if os.path.exists(addr_paths_path) else None
df_suspicious = pd.read_csv(susp_bundles_path) if os.path.exists(susp_bundles_path) else None

print("Global paths:", None if df_path_global is None else len(df_path_global))
print("Address router paths:", None if df_addr_paths is None else len(df_addr_paths))
print("Suspicious bundles:", None if df_suspicious is None else len(df_suspicious))


Global paths: 2
Address router paths: 6
Suspicious bundles: 0


## 2. Build MEV Risk Feature Table (Per Address)

We construct a unified MEV feature table where each row corresponds to an **address**,
and each column is one MEV-related signal.

In [5]:
risk_df = pd.DataFrame()

# base set of addresses: from_address in tx
unique_addrs = tx["from_address"].dropna().unique()
risk_df["address"] = unique_addrs
risk_df = risk_df.set_index("address")

risk_df.head()

0xa9264494a92ced04747ac84fc9ca5a0b9549b491
0xc0ffeebabe5d496b2dde509f9fa189c25cf29671
0xe50008c1d110da8e56982f46a9188a292ee90a7b
0xe40d548eb4fa4d9188fd21723f2fd377456c0876
0x0eb1665de6473c624dcd087fdeee27418d65ed59


### 2.1 Time-Based MEV Indicators (from 4.1)

We reuse key signals:

- Active hours distribution  
- High-activity hours  
- Burstiness (transactions when gas price/lifetime is high)

In [None]:
# hour-level activity count
tx["hour"] = tx["datetime"].dt.hour

hour_counts = tx.groupby("from_address")["hour"].nunique().rename("active_hours_count")

# merge into risk table
risk_df = risk_df.join(hour_counts, how="left")
risk_df["active_hours_count"] = risk_df["active_hours_count"].fillna(0)


ValueError: columns overlap but no suffix specified: Index(['active_hours_count'], dtype='object')