# Fetcher for debug trace data

#### Maria Silva, April 2025

In this notebook, we show an example for how to extract ad process debug tracers for a single block, using our custom tracer. If you wish to process more blocks, we advise to run the runners `tracer_raw_run.py` and `tracer_agg_run.py`. The runners store the raw data as a middle step and are more robust.

## 1. Imports and directories

In [1]:
import os
import sys
import json
import pandas as pd

In [2]:
# Main directories and files
current_path = os.getcwd()
repo_dir = os.path.abspath(os.path.join(current_path, ".."))
src_dir = os.path.join(repo_dir, "src")

In [3]:
# import internal packages
sys.path.append(src_dir)
from data.rpc import XatuClickhouse, ErigonRPC
from data.block_processor import BlockProcessor
from data.gas_cost import compute_gas_cost_for_chunk, aggregate_gas_cost_data

## 2. Setup

In [4]:
# Secrets for acessing xatu clickhouse and erigon
with open(os.path.join(repo_dir, "secrets.json"), "r") as file:
    secrets_dict = json.load(file)

# Erigon RPC
erigon_rpc_url = "https://rpc-mainnet-teku-erigon-001.utility.production.platform.ethpandaops.io"
erigon_username = secrets_dict["erigon_username"]
erigon_password = secrets_dict["erigon_password"]
erigon_rpc_response_max_size = int(1e9)
erigon_rpc = ErigonRPC(
        erigon_rpc_url, erigon_username, erigon_password, erigon_rpc_response_max_size
    )
    
# Xatu's clickhouse fetcher
xatu_username = secrets_dict["xatu_username"]
xatu_password = secrets_dict["xatu_password"]
db_url = f"clickhouse+http://{xatu_username}:{xatu_password}@clickhouse.xatu.ethpandaops.io:443/default?protocol=https"
xatu_clickhouse_fetcher = XatuClickhouse(
        db_url,
        pool_size=5,
        max_overflow=10,
        pool_timeout=30,
    )

# Block processor
raw_data_dir = "" # this path won't be needed
block_processor = BlockProcessor(
        raw_data_dir, xatu_clickhouse_fetcher, erigon_rpc, thread_pool_size=8
    )

## 3. Fetch and process data for a single block

In [5]:
# Fetch raw data from debug traces
block_height = 22000000
raw_df = block_processor.fetch_block(block_height)
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 334002 entries, 0 to 334001
Data columns (total 10 columns):
 #   Column            Non-Null Count   Dtype 
---  ------            --------------   ----- 
 0   op                334002 non-null  object
 1   gas               334002 non-null  int64 
 2   gas_cost          334002 non-null  int64 
 3   depth             334002 non-null  int64 
 4   memory_expansion  334002 non-null  int64 
 5   memory_size       334002 non-null  int64 
 6   cum_refund        334002 non-null  int64 
 7   call_address      334002 non-null  object
 8   file_row_number   334002 non-null  int64 
 9   tx_hash           334002 non-null  object
dtypes: int64(7), object(3)
memory usage: 25.5+ MB


In [6]:
# Fix issues with gas costs
clean_df = compute_gas_cost_for_chunk(raw_df)
clean_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 334002 entries, 0 to 334001
Data columns (total 11 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   op                334002 non-null  object 
 1   gas               334002 non-null  int64  
 2   gas_cost          334002 non-null  int64  
 3   depth             334002 non-null  int64  
 4   memory_expansion  334002 non-null  int64  
 5   memory_size       334002 non-null  int64  
 6   cum_refund        334002 non-null  int64  
 7   call_address      334002 non-null  object 
 8   file_row_number   334002 non-null  int64  
 9   tx_hash           334002 non-null  object 
 10  op_gas_cost       334002 non-null  float64
dtypes: float64(1), int64(7), object(3)
memory usage: 28.0+ MB


In [7]:
# Aggregate data for memory efficiency
df = aggregate_gas_cost_data(clean_df)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5698 entries, 0 to 5697
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   tx_hash            5698 non-null   object 
 1   op                 5698 non-null   object 
 2   op_gas_cost        5698 non-null   float64
 3   memory_expansion   5698 non-null   int64  
 4   memory_size        5698 non-null   int64  
 5   cum_refund         5698 non-null   int64  
 6   call_address       5698 non-null   object 
 7   op_gas_pair_count  5698 non-null   int64  
dtypes: float64(1), int64(4), object(3)
memory usage: 356.3+ KB


In [8]:
df.head()

Unnamed: 0,tx_hash,op,op_gas_cost,memory_expansion,memory_size,cum_refund,call_address,op_gas_pair_count
0,0x02dba2a7974424be6778984c2f5594189af0d7b42bc5...,ADD,3.0,0,0,0,,287
1,0x02dba2a7974424be6778984c2f5594189af0d7b42bc5...,ADDRESS,2.0,0,0,0,,4
2,0x02dba2a7974424be6778984c2f5594189af0d7b42bc5...,AND,3.0,0,0,0,,241
3,0x02dba2a7974424be6778984c2f5594189af0d7b42bc5...,CALL,100.0,0,100,0,0x88909d489678dd17aa6d9609f89b0419bf78fd9a,1
4,0x02dba2a7974424be6778984c2f5594189af0d7b42bc5...,CALL,100.0,0,132,0,0x40aa958dd87fc8305b97f2ba922cddca374bcd7f,1
