# DeFiLlama Data Backfills

This notebook allows you to backfill various DeFiLlama datasources with configurable parameters.

## Available Datasources:

1. **Volume, Fees, Revenue** - DEX trading volume, protocol fees, and revenue data by chain and protocol
2. **Protocols TVL** - Total Value Locked for individual protocols with detailed breakdowns
3. **Chain TVL** - Historical TVL data aggregated by blockchain 
4. **Stablecoins** - Stablecoin circulation and bridging data by chain
5. **Yield Pools** - Yield farming pool data and APY information
6. **Lend/Borrow Pools** - Lending protocol data including rates and volumes

## Configuration Options:

- **BACKFILL_DAYS**: Number of days to backfill (default 365)
- **SPECIFIC_CHAIN**: Filter to a specific chain (e.g. "optimism", "base") or None for all
- **SPECIFIC_PROTOCOL**: Filter to a specific protocol slug or None for all

## Usage:

1. Modify the configuration variables in the first cell
2. Run the cells for the datasources you want to backfill
3. Comment/uncomment sections as needed


In [1]:
import os
from unittest.mock import patch
from datetime import datetime, timedelta

from op_analytics.coreutils.partitioned import dailydatawrite
from op_analytics.coreutils.partitioned.location import DataLocation


def mock_location():
    return DataLocation.GCS

# Configuration
os.environ["ALLOW_WRITE"] = "true"

# Backfill Configuration
BACKFILL_DAYS = 365*5  # Number of days to backfill
SPECIFIC_CHAIN = None  # Set to chain name (e.g. "optimism") to filter, or None for all chains
SPECIFIC_PROTOCOL = None  # Set to protocol slug to filter, or None for all protocols

# Force Complete Overwrite Settings
FORCE_COMPLETE_OVERWRITE = False  # Set to True to overwrite existing data
CLEAR_EXISTING_MARKERS = False    # Set to True to ignore existing completion markers

print(f"Configuration:")
print(f"  Backfill Days: {BACKFILL_DAYS}")
print(f"  Specific Chain: {SPECIFIC_CHAIN or 'All chains'}")
print(f"  Specific Protocol: {SPECIFIC_PROTOCOL or 'All protocols'}")
print(f"  Data Location: GCS")
print(f"  Force Complete Overwrite: {FORCE_COMPLETE_OVERWRITE}")
print(f"  Clear Existing Markers: {CLEAR_EXISTING_MARKERS}")
print()


Configuration:
  Backfill Days: 1825
  Specific Chain: All chains
  Specific Protocol: All protocols
  Data Location: GCS
  Force Complete Overwrite: False
  Clear Existing Markers: False



## 1. Volume, Fees, Revenue (VFR) Data

This datasource pulls DEX volume, protocol fees, and revenue data from DeFiLlama.


In [None]:
# Volume, Fees, Revenue Backfill
print("Starting Volume, Fees, Revenue backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.volumefeesrevenue import execute as vfr_execute
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original function
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant to use our backfill days and optionally bypass markers
    with patch.object(vfr_execute, 'TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        if CLEAR_EXISTING_MARKERS:
            with patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists):
                result = vfr_execute.execute_pull()
        else:
            result = vfr_execute.execute_pull()
        
print("Volume, Fees, Revenue backfill completed!")
print(f"Result summary: {result}")
print()


Starting Volume, Fees, Revenue backfill...
  Force overwrite: False
[2m2025-09-03 16:59:27[0m [[32m[1minfo     [0m] [1mFetched from https://api.llama.fi/overview/dexs?excludeTotalDataChart=false&excludeTotalDataChartBreakdown=true&dataType=dailyVolume: 0.10 seconds[0m [36mfilename[0m=[35mrequest.py[0m [36mlineno[0m=[35m103[0m [36mprocess[0m=[35m60317[0m
[2m2025-09-03 16:59:27[0m [[32m[1minfo     [0m] [1mFetched from https://api.llama.fi/overview/dexs/Ethereum?excludeTotalDataChart=false&excludeTotalDataChartBreakdown=false&dataType=dailyVolume: 0.07 seconds[0m [36mcounter[0m=[35m001/201[0m [36meta[0m=[35mNone[0m [36mfilename[0m=[35mrequest.py[0m [36mlineno[0m=[35m103[0m [36mprocess[0m=[35m60317[0m
[2m2025-09-03 16:59:27[0m [[32m[1minfo     [0m] [1mFetched from https://api.llama.fi/overview/dexs/Base?excludeTotalDataChart=false&excludeTotalDataChartBreakdown=false&dataType=dailyVolume: 0.10 seconds[0m [36mcounter[0m=[35m004/201[0m

## 2. Protocols TVL Data

This datasource pulls detailed TVL data for individual protocols with token-level breakdowns.


In [None]:
# Protocols TVL Backfill
print("Starting Protocols TVL backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_get_buffered(*args, **kwargs):
    """Mock function to return empty set, forcing all protocols to be fetched."""
    if FORCE_COMPLETE_OVERWRITE:
        return set()  # Return empty set so all protocols are considered "pending"
    return original_get_buffered(*args, **kwargs)

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.protocolstvl import execute as protocols_execute
    from op_analytics.datasources.defillama.protocolstvl import buffereval
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original functions
    original_get_buffered = buffereval.get_buffered
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant and optionally bypass buffer/marker checks
    with patch.object(protocols_execute, 'TVL_TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        patches = []
        
        # Force all protocols to be considered "pending" (not buffered)
        if FORCE_COMPLETE_OVERWRITE:
            patches.append(patch.object(buffereval, 'get_buffered', mock_get_buffered))
            
        # Bypass existing markers
        if CLEAR_EXISTING_MARKERS:
            patches.append(patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists))
        
        # Apply all patches and run
        if patches:
            with patches[0]:
                if len(patches) > 1:
                    with patches[1]:
                        result = protocols_execute.execute_pull()
                else:
                    result = protocols_execute.execute_pull()
        else:
            result = protocols_execute.execute_pull()
        
print("Protocols TVL backfill completed!")
print(f"Result summary: {result}")
print()


In [None]:
result

## 3. Chain TVL Data

This datasource pulls historical TVL data aggregated by blockchain.


In [None]:
# Chain TVL Backfill
print("Starting Chain TVL backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.chaintvl import execute as chain_execute
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original function
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant and optionally bypass markers
    with patch.object(chain_execute, 'TVL_TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        if CLEAR_EXISTING_MARKERS:
            with patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists):
                result = chain_execute.execute_pull()
        else:
            result = chain_execute.execute_pull()
        
print("Chain TVL backfill completed!")
print(f"Result summary: {result}")
print()


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Assume 'result' is a list of dicts or DataFrame-like structure with 'dt' and 'protocol' fields
# Convert result to DataFrame if not already
if not isinstance(result, pd.DataFrame):
    df = pd.DataFrame(result)
else:
    df = result

# Ensure 'dt' is datetime for proper sorting/plotting
df['dt'] = pd.to_datetime(df['dt'])

# Group by day and count distinct protocols
protocols_by_day = df.groupby('dt')['protocol'].nunique().reset_index()
protocols_by_day = protocols_by_day.sort_values('dt')

# Plot
plt.figure(figsize=(10, 5))
plt.plot(protocols_by_day['dt'], protocols_by_day['protocol'], marker='o')
plt.title('Distinct Protocols by Day')
plt.xlabel('Date')
plt.ylabel('Number of Distinct Protocols')
plt.grid(True)
plt.tight_layout()
plt.show()


## 4. Stablecoins Data

This datasource pulls stablecoin circulation and bridging data by chain.


In [None]:
# Stablecoins Backfill
print("Starting Stablecoins backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.stablecoins import execute as stablecoins_execute
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original function
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant and optionally bypass markers
    with patch.object(stablecoins_execute, 'BALANCES_TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        if CLEAR_EXISTING_MARKERS:
            with patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists):
                result = stablecoins_execute.execute_pull()
        else:
            result = stablecoins_execute.execute_pull()
        
print("Stablecoins backfill completed!")
print(f"Result summary: {result}")
print()


## 5. Yield Pools Data

This datasource pulls yield farming pool data and APY information.


In [None]:
# Yield Pools Backfill
print("Starting Yield Pools backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.yieldpools import execute as yield_execute
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original function
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant and optionally bypass markers
    with patch.object(yield_execute, 'YIELD_TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        if CLEAR_EXISTING_MARKERS:
            with patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists):
                result = yield_execute.execute_pull()
        else:
            result = yield_execute.execute_pull()
        
print("Yield Pools backfill completed!")
print(f"Result summary: {result}")
print()


## 6. Lend/Borrow Pools Data

This datasource pulls lending protocol data including rates and volumes.


In [None]:
# Lend/Borrow Pools Backfill
print("Starting Lend/Borrow Pools backfill...")
print(f"  Force overwrite: {FORCE_COMPLETE_OVERWRITE}")

def mock_marker_exists(*args, **kwargs):
    """Mock function to always return False, forcing overwrite."""
    if CLEAR_EXISTING_MARKERS:
        return False
    return original_marker_exists(*args, **kwargs)

with patch.object(dailydatawrite, "determine_location", mock_location):
    # Import required modules
    from op_analytics.datasources.defillama.lendborrowpools import execute as lb_execute
    from op_analytics.coreutils.partitioned import dataaccess
    
    # Store original function
    original_marker_exists = dataaccess.PartitionedDataAccess.marker_exists
    
    # Patch the constant and optionally bypass markers
    with patch.object(lb_execute, 'LEND_BORROW_TABLE_LAST_N_DAYS', BACKFILL_DAYS):
        if CLEAR_EXISTING_MARKERS:
            with patch.object(dataaccess.PartitionedDataAccess, 'marker_exists', mock_marker_exists):
                result = lb_execute.execute_pull()
        else:
            result = lb_execute.execute_pull()
        
print("Lend/Borrow Pools backfill completed!")
print(f"Result summary: {result}")
print()


## Advanced Usage Examples

### Running Specific Datasources
To run only specific datasources, comment out the others and modify the configuration:

```python
# Example: Only backfill VFR data for 90 days
BACKFILL_DAYS = 90
SPECIFIC_CHAIN = None  
SPECIFIC_PROTOCOL = None
```

### Chain-Specific Backfills
To backfill data for a specific chain:

```python
BACKFILL_DAYS = 365
SPECIFIC_CHAIN = "optimism"  # or "base", "arbitrum", etc.
SPECIFIC_PROTOCOL = None
```

### Protocol-Specific Backfills
To backfill data for a specific protocol:

```python
BACKFILL_DAYS = 365
SPECIFIC_CHAIN = None
SPECIFIC_PROTOCOL = "uniswap"  # Use the protocol slug from DeFiLlama
```

### Tips for Large Backfills
- For backfills > 180 days, consider running datasources individually
- Monitor memory usage for very large backfills
- The protocols TVL datasource may take the longest due to API rate limits
- Some datasources may have daily API limits - check DeFiLlama API documentation

### Validation
After running backfills, you can validate the data using the summary returned by each execute function.
