# 01 - Data Exploration: Understanding Ethereum Transactions

**Goal**: Explore the structure of Ethereum transactions and understand how different transaction types encode their operations.

In this notebook, you'll learn:
- The anatomy of an Ethereum transaction
- How to identify different transaction types
- What events and logs represent in smart contract interactions
- How to visualize transaction data for analysis

**Prerequisites**: Basic understanding of blockchain concepts

## Setup

First, let's configure our environment with automatic module reloading (helpful during development).

In [None]:
%load_ext autoreload
%autoreload 2

import json
import sys
from pathlib import Path
from typing import Any

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from web3 import Web3

# Set plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

print(f"✓ Notebook setup complete")
print(f"✓ Project root: {project_root}")

## Understanding Ethereum Transactions

### What is a Transaction?

An Ethereum transaction is a signed message that:
- **Transfers value** (ETH) from one account to another
- **Invokes smart contracts** by calling their functions
- **Changes blockchain state** in a permanent, immutable way

### Transaction Structure

Every transaction contains:

| Field | Description | Example |
|-------|-------------|--------|
| `hash` | Unique transaction identifier | `0xabc123...` |
| `from` | Sender address | `0x742d35...` |
| `to` | Recipient address (or contract) | `0x1f9840...` |
| `value` | Amount of ETH transferred (in Wei) | `1000000000000000000` (1 ETH) |
| `input` | Data sent to contract (function call) | `0xa9059cbb...` |
| `gas` | Gas limit for execution | `21000` |
| `gasPrice` | Price per gas unit (in Wei) | `20000000000` (20 Gwei) |
| `nonce` | Transaction count from sender | `42` |

### Transaction Receipt

After execution, a transaction produces a **receipt** containing:
- `status`: Success (1) or failure (0)
- `logs`: Events emitted by smart contracts
- `gasUsed`: Actual gas consumed
- `blockNumber`: Block where transaction was included

## Loading Sample Transaction Data

Let's load some real transaction examples from our fixtures.

In [None]:
# Load sample transactions
fixtures_path = project_root / "tests" / "fixtures" / "sample_transactions.json"

with open(fixtures_path, 'r') as f:
    sample_transactions = json.load(f)

print(f"✓ Loaded {len(sample_transactions)} sample transactions")
print(f"\nTransaction types available:")
for tx_hash, tx_data in sample_transactions.items():
    print(f"  - {tx_hash[:10]}...")

## Exploring Transaction Types

### Type 1: Simple ETH Transfer

The simplest transaction type: sending ETH from one address to another.

In [None]:
# Get the first transaction (ETH transfer)
eth_transfer_hash = list(sample_transactions.keys())[0]
eth_transfer = sample_transactions[eth_transfer_hash]

print("=" * 80)
print("ETH TRANSFER TRANSACTION")
print("=" * 80)
print(f"\nTransaction Hash: {eth_transfer_hash}")
print(f"From:             {eth_transfer.get('from', 'N/A')}")
print(f"To:               {eth_transfer.get('to', 'N/A')}")
print(f"Value (Wei):      {eth_transfer.get('value', 0)}")
print(f"Value (ETH):      {Web3.from_wei(eth_transfer.get('value', 0), 'ether')}")
print(f"Block Number:     {eth_transfer.get('blockNumber', 'N/A')}")
print(f"Gas Used:         {eth_transfer.get('gas', 'N/A')}")
print(f"\nInput Data:       {eth_transfer.get('input', '0x')[:20]}...")
print(f"  (Empty input = simple ETH transfer)")
print(f"\nStatus:           {'✓ Success' if eth_transfer.get('status') == 1 else '✗ Failed'}")

**Key Observations for ETH Transfers:**
- `value` field is non-zero (contains ETH amount in Wei)
- `input` field is typically `0x` (empty) or very short
- `to` field is a regular address (not a contract)
- No logs are emitted (simple value transfer)

### Type 2: ERC-20 Token Transfer

Token transfers interact with smart contracts and emit events.

In [None]:
# Get ERC-20 transaction (if available)
if len(sample_transactions) > 1:
    erc20_hash = list(sample_transactions.keys())[1]
    erc20_tx = sample_transactions[erc20_hash]
    
    print("=" * 80)
    print("ERC-20 TOKEN TRANSFER TRANSACTION")
    print("=" * 80)
    print(f"\nTransaction Hash: {erc20_hash}")
    print(f"From (sender):    {erc20_tx.get('from', 'N/A')}")
    print(f"To (contract):    {erc20_tx.get('to', 'N/A')}")
    print(f"ETH Value:        {Web3.from_wei(erc20_tx.get('value', 0), 'ether')} ETH")
    print(f"  (Note: ETH value is often 0 for token transfers)")
    print(f"\nInput Data:       {erc20_tx.get('input', '0x')[:66]}...")
    print(f"  First 4 bytes (method ID): {erc20_tx.get('input', '0x')[:10]}")
    print(f"  (0xa9059cbb = transfer function signature)")
    
    # Check for logs
    logs = erc20_tx.get('logs', [])
    print(f"\nEvents Emitted:   {len(logs)} log(s)")
    
    if logs:
        print(f"\nFirst Event:")
        first_log = logs[0]
        print(f"  Contract:       {first_log.get('address', 'N/A')}")
        print(f"  Topics:         {len(first_log.get('topics', []))} topic(s)")
        if first_log.get('topics'):
            print(f"    Topic[0]:     {first_log['topics'][0][:20]}...")
            print(f"    (Event signature hash - identifies Transfer event)")
else:
    print("No ERC-20 transaction in sample data")

**Key Observations for ERC-20 Transfers:**
- `to` field points to the token contract address
- `value` is typically 0 (no ETH transferred)
- `input` contains encoded function call:
  - First 4 bytes: function selector (e.g., `0xa9059cbb` for `transfer()`)
  - Remaining bytes: ABI-encoded parameters (recipient, amount)
- Transaction emits a `Transfer` event in logs
- Actual token transfer details are in the event logs, not transaction fields

### Type 3: Uniswap Swap (DEX Transaction)

Complex DeFi interactions involve multiple events and state changes.

In [None]:
# Get Uniswap transaction (if available)
if len(sample_transactions) > 2:
    uniswap_hash = list(sample_transactions.keys())[2]
    uniswap_tx = sample_transactions[uniswap_hash]
    
    print("=" * 80)
    print("UNISWAP SWAP TRANSACTION")
    print("=" * 80)
    print(f"\nTransaction Hash: {uniswap_hash}")
    print(f"From (trader):    {uniswap_tx.get('from', 'N/A')}")
    print(f"To (router):      {uniswap_tx.get('to', 'N/A')}")
    print(f"ETH Value:        {Web3.from_wei(uniswap_tx.get('value', 0), 'ether')} ETH")
    
    # Check for multiple logs (swaps involve several events)
    logs = uniswap_tx.get('logs', [])
    print(f"\nEvents Emitted:   {len(logs)} log(s)")
    print(f"  (Swaps typically emit multiple events: Transfer, Sync, Swap)")
    
    if logs:
        print(f"\nEvent Breakdown:")
        for i, log in enumerate(logs[:5]):  # Show first 5 events
            print(f"  Event {i+1}:")
            print(f"    From Contract: {log.get('address', 'N/A')[:10]}...")
            print(f"    Topics:        {len(log.get('topics', []))} topic(s)")
            if log.get('topics'):
                topic_hash = log['topics'][0]
                # Identify common event types by topic hash
                if topic_hash == '0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef':
                    print(f"    Type:          Transfer event")
                elif topic_hash == '0xd78ad95fa46c994b6551d0da85fc275fe613ce37657fb8d5e3d130840159d822':
                    print(f"    Type:          Swap event (Uniswap V2)")
                elif topic_hash == '0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67':
                    print(f"    Type:          Swap event (Uniswap V3)")
                else:
                    print(f"    Type:          Other event")
else:
    print("No Uniswap transaction in sample data")

**Key Observations for Uniswap Swaps:**
- Multiple events emitted in sequence
- Events come from different contracts (router, pool, tokens)
- Typical event sequence:
  1. `Transfer` events (tokens moving)
  2. `Sync` event (pool reserves updated)
  3. `Swap` event (swap details: amounts, direction)
- Decoding requires understanding event signatures and ABI

## Understanding Logs and Events

### What are Logs?

Logs (also called events) are how smart contracts communicate information about state changes.

### Log Structure

Each log entry contains:
- **`address`**: Contract that emitted the event
- **`topics`**: Indexed parameters (up to 4)
  - `topics[0]`: Event signature hash (keccak256 of event name and parameters)
  - `topics[1-3]`: Indexed event parameters (if any)
- **`data`**: Non-indexed parameters (ABI-encoded)

### Example: ERC-20 Transfer Event

```solidity
event Transfer(address indexed from, address indexed to, uint256 value);
```

When decoded:
- `topics[0]` = Event signature hash
- `topics[1]` = `from` address
- `topics[2]` = `to` address
- `data` = `value` (token amount)

In [None]:
# Visualize event structure
print("EVENT SIGNATURE HASHES (Topic[0])")
print("=" * 80)
print("\nCommon ERC-20 and DEX events:\n")

events = {
    "Transfer(address,address,uint256)": "0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef",
    "Approval(address,address,uint256)": "0x8c5be1e5ebec7d5bd14f71427d1e84f3dd0314c0f7b2291e5b200ac8c7c3b925",
    "Swap(address,uint256,uint256,uint256,uint256,address)": "0xd78ad95fa46c994b6551d0da85fc275fe613ce37657fb8d5e3d130840159d822",
    "Sync(uint112,uint112)": "0x1c411e9a96e071241c2f21f7726b17ae89e3cab4c78be50e062b03a9fffbbad1",
}

for event_sig, topic_hash in events.items():
    print(f"Event: {event_sig}")
    print(f"Hash:  {topic_hash}")
    print()

## Data Visualization

Let's visualize the characteristics of our sample transactions.

In [None]:
# Create DataFrame for analysis
tx_data = []
for tx_hash, tx in sample_transactions.items():
    tx_data.append({
        'hash': tx_hash[:10] + '...',
        'value_eth': float(Web3.from_wei(tx.get('value', 0), 'ether')),
        'gas_used': tx.get('gas', 0),
        'num_logs': len(tx.get('logs', [])),
        'input_length': len(tx.get('input', '0x')),
        'status': 'Success' if tx.get('status') == 1 else 'Failed',
    })

df = pd.DataFrame(tx_data)
print("Transaction Summary:")
print(df.to_string(index=False))

In [None]:
# Visualize transaction characteristics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: ETH Value
axes[0, 0].bar(df['hash'], df['value_eth'], color='steelblue')
axes[0, 0].set_title('ETH Value per Transaction', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Transaction')
axes[0, 0].set_ylabel('ETH Value')
axes[0, 0].tick_params(axis='x', rotation=45)

# Plot 2: Number of Events/Logs
axes[0, 1].bar(df['hash'], df['num_logs'], color='coral')
axes[0, 1].set_title('Events Emitted per Transaction', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Transaction')
axes[0, 1].set_ylabel('Number of Logs')
axes[0, 1].tick_params(axis='x', rotation=45)

# Plot 3: Gas Used
axes[1, 0].bar(df['hash'], df['gas_used'], color='lightgreen')
axes[1, 0].set_title('Gas Used per Transaction', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Transaction')
axes[1, 0].set_ylabel('Gas Units')
axes[1, 0].tick_params(axis='x', rotation=45)

# Plot 4: Input Data Size
axes[1, 1].bar(df['hash'], df['input_length'], color='mediumpurple')
axes[1, 1].set_title('Input Data Size per Transaction', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Transaction')
axes[1, 1].set_ylabel('Bytes')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

print("\n📊 Key Insights:")
print(f"  • Average gas used: {df['gas_used'].mean():.0f} units")
print(f"  • Average events per tx: {df['num_logs'].mean():.1f}")
print(f"  • Transactions with ETH value: {(df['value_eth'] > 0).sum()}")
print(f"  • Transactions with contract calls: {(df['input_length'] > 2).sum()}")

## Transaction Type Identification

Now let's create a simple classifier to identify transaction types based on characteristics.

In [None]:
def identify_transaction_type(tx: dict) -> str:
    """
    Identify transaction type based on characteristics.
    
    This is a simplified heuristic approach - real decoding
    requires ABI decoding of input data and logs.
    """
    value = tx.get('value', 0)
    input_data = tx.get('input', '0x')
    logs = tx.get('logs', [])
    
    # Check for simple ETH transfer
    if value > 0 and input_data in ['0x', '0x00']:
        return "ETH Transfer"
    
    # Check for ERC-20 transfer (transfer function signature)
    if input_data.startswith('0xa9059cbb'):
        return "ERC-20 Transfer"
    
    # Check for DEX swap (look for Swap event in logs)
    swap_topics = [
        '0xd78ad95fa46c994b6551d0da85fc275fe613ce37657fb8d5e3d130840159d822',  # Uniswap V2
        '0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67',  # Uniswap V3
    ]
    for log in logs:
        if log.get('topics') and log['topics'][0] in swap_topics:
            return "DEX Swap (Uniswap)"
    
    # Check for any contract interaction
    if len(input_data) > 10:
        return "Contract Interaction"
    
    return "Unknown"

# Classify all transactions
print("TRANSACTION CLASSIFICATION")
print("=" * 80)
for tx_hash, tx in sample_transactions.items():
    tx_type = identify_transaction_type(tx)
    print(f"\n{tx_hash[:10]}... → {tx_type}")
    print(f"  Value: {Web3.from_wei(tx.get('value', 0), 'ether')} ETH")
    print(f"  Logs:  {len(tx.get('logs', []))} events")
    print(f"  Input: {len(tx.get('input', '0x'))} bytes")

## Decoding Concepts

### Why We Need Decoders

Raw transaction data is encoded for efficiency:
- **Function calls**: Encoded as 4-byte selector + ABI-encoded parameters
- **Event logs**: Indexed topics + ABI-encoded data
- **Addresses**: 20-byte hex strings
- **Amounts**: 256-bit integers (Wei for ETH, smallest unit for tokens)

### Decoding Process

1. **Identify transaction type** (by method signature or events)
2. **Load appropriate ABI** (Application Binary Interface)
3. **Decode input data** (extract function parameters)
4. **Decode event logs** (extract event parameters)
5. **Convert to human-readable format** (Wei → ETH, addresses → checksummed)

### Example: ERC-20 Transfer Decoding

```python
# Raw input: 0xa9059cbb000000000000000000000000742d35cc6634c0532925a3b844bc9e7595f0000000000000000000000000000000000000000000000000de0b6b3a7640000

# Decoded:
{
    "function": "transfer",
    "recipient": "0x742d35Cc6634C0532925a3b844Bc9e7595f0",
    "amount": "1000000000000000000"  # 1 token (18 decimals)
}
```

## Key Takeaways

✓ **Transaction Structure**: Every transaction has standard fields (`from`, `to`, `value`, `input`, `gas`)

✓ **Transaction Types**: Can be identified by examining `value`, `input` data, and `logs`

✓ **Events/Logs**: Smart contracts emit events to signal state changes

✓ **Decoding Required**: Raw data must be decoded using ABIs to extract meaningful information

✓ **Complexity Varies**: Simple ETH transfers vs. complex DeFi operations

## Troubleshooting Tips

**Issue**: Transaction data looks like gibberish
- **Solution**: You're looking at encoded data - use ABI decoders (covered in next notebook)

**Issue**: Can't find token transfer amount in transaction `value` field
- **Solution**: Token amounts are in event logs, not transaction value (which is for ETH only)

**Issue**: Don't know which event is which in the logs
- **Solution**: Check `topics[0]` against known event signature hashes

**Issue**: Address format issues (checksum errors)
- **Solution**: Use `Web3.toChecksumAddress()` to normalize addresses

## Next Steps

In the next notebook (**02-data-extraction.ipynb**), we'll learn how to:
- Connect to Ethereum RPC nodes
- Fetch transactions programmatically
- Use our custom decoders to extract structured data
- Handle errors and edge cases

---

**Ready to continue?** → `notebooks/02-data-extraction.ipynb`