# Ethereum Request for Comments (ERCs) AI Agent

## 1. Ingest and Index Documents

In [29]:
import io
import logging
import zipfile

import frontmatter
import requests

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

In [31]:
def read_repo_data(repo_owner, repo_name):
    """
    Download and parse all markdown files from a GitHub repository.

    Args:
        repo_owner: GitHub username or organization
        repo_name: Repository name

    Returns:
        List of dictionaries containing file content and metadata
    """
    prefix = "https://github.com"
    url = f"{prefix}/{repo_owner}/{repo_name}/archive/refs/heads/master.zip"
    resp = requests.get(url)

    if resp.status_code != 200:
        raise Exception(f"Failed to download repository: {resp.status_code}")

    repository_data = []
    zf = zipfile.ZipFile(io.BytesIO(resp.content))

    for file_info in zf.infolist():
        filename = file_info.filename
        filename_lower = filename.lower()

        if not ((filename_lower.endswith(".md")) and (filename_lower.startswith("ercs-master/ercs"))):
            continue

        try:
            with zf.open(file_info) as f_in:
                content = f_in.read().decode("utf-8", errors="ignore")
                post = frontmatter.loads(content)
                data = post.to_dict()
                data["filename"] = filename
                repository_data.append(data)
        except Exception as e:
            logger.error(f"Error processing {filename}: {e}")
            continue

    zf.close()

    return repository_data

In [32]:
erc_data = read_repo_data("ethereum", "ERCs")

In [33]:
len(erc_data)

540

In [36]:
erc_data[0]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md'}

In [69]:
erc_data[30]

{'eip': 1387,
 'title': 'Merkle Tree Attestations with Privacy enabled',
 'author': 'Weiwu Zhang <a@colourful.land>, James Sangalli <j.l.sangalli@gmail.com>',
 'discussions-to': 'https://github.com/ethereum/EIPs/issues/1387',
 'status': 'Stagnant',
 'type': 'Standards Track',
 'category': 'ERC',
 'created': datetime.date(2018, 9, 8),
 'content': '### Introduction\n\nIt\'s often needed that an Ethereum smart contract must verify a claim (I live in Australia) attested by a valid attester.\n\nFor example, an ICO contract might require that the participant, Alice, lives in Australia before she participates. Alice\'s claim of residency could come from a local Justice of the Peace who could attest that "Alice is a resident of Australia in NSW".\n\nUnlike previous attempts, we assume that the attestation is signed and issued off the blockchain in a Merkle Tree format. Only a part of the Merkle tree is revealed by Alice at each use. Therefore we avoid the privacy problem often associated with 

In [70]:
for record in erc_data:
    print(record['filename'])

ERCs-master/ERCS/eip-1.md
ERCs-master/ERCS/erc-1046.md
ERCs-master/ERCS/erc-1056.md
ERCs-master/ERCS/erc-1062.md
ERCs-master/ERCS/erc-1066.md
ERCs-master/ERCS/erc-1077.md
ERCs-master/ERCS/erc-1078.md
ERCs-master/ERCS/erc-1080.md
ERCs-master/ERCS/erc-1081.md
ERCs-master/ERCS/erc-1123.md
ERCs-master/ERCS/erc-1129.md
ERCs-master/ERCS/erc-1132.md
ERCs-master/ERCS/erc-1154.md
ERCs-master/ERCS/erc-1155.md
ERCs-master/ERCS/erc-1167.md
ERCs-master/ERCS/erc-1175.md
ERCs-master/ERCS/erc-1178.md
ERCs-master/ERCS/erc-1185.md
ERCs-master/ERCS/erc-1191.md
ERCs-master/ERCS/erc-1202.md
ERCs-master/ERCS/erc-1203.md
ERCs-master/ERCS/erc-1207.md
ERCs-master/ERCS/erc-1261.md
ERCs-master/ERCS/erc-1271.md
ERCs-master/ERCS/erc-1319.md
ERCs-master/ERCS/erc-1328.md
ERCs-master/ERCS/erc-1337.md
ERCs-master/ERCS/erc-1363.md
ERCs-master/ERCS/erc-137.md
ERCs-master/ERCS/erc-1386.md
ERCs-master/ERCS/erc-1387.md
ERCs-master/ERCS/erc-1388.md
ERCs-master/ERCS/erc-1417.md
ERCs-master/ERCS/erc-1438.md
ERCs-master/ERCS/e

## 2. Chunking and Intelligent Processing for Data

### 2.1 Splitting by Paragraphs

In [113]:
import re
import textwrap

text = erc_data[45]['content']
paragraphs = re.split(r"\n\s*\n", text.strip())

In [114]:
len(paragraphs)

60

In [115]:
paragraphs[0]

'## Simple Summary\nMake smart contracts (e.g. dapps) accessible to non-ether users by allowing contracts to accept "[collect-calls](https://en.wikipedia.org/wiki/Collect_call)", paying for incoming calls. \nLet contracts "listen" on publicly accessible channels (e.g. web URL or a whisper address). \nIncentivize nodes to run "gas stations" to facilitate this. \nRequire no network changes, and minimal contract changes.'

In [116]:
print(textwrap.fill(paragraphs[0], width=100))

## Simple Summary Make smart contracts (e.g. dapps) accessible to non-ether users by allowing
contracts to accept "[collect-calls](https://en.wikipedia.org/wiki/Collect_call)", paying for
incoming calls.  Let contracts "listen" on publicly accessible channels (e.g. web URL or a whisper
address).  Incentivize nodes to run "gas stations" to facilitate this.  Require no network changes,
and minimal contract changes.


In [117]:
print(textwrap.fill(paragraphs[1], width=100))

## Abstract Communicating with dapps currently requires paying ETH for gas, which limits dapp
adoption to ether users.  Therefore, contract owners may wish to pay for the gas to increase user
acquisition, or let their users pay for gas with fiat money.  Alternatively, a 3rd party may wish to
subsidize the gas costs of certain contracts.  Solutions such as described in
[EIP-1077](./eip-1077.md) could allow transactions from addresses that hold no ETH.


In [120]:
print(textwrap.fill(paragraphs[58], width=100))

A working implementation of the [**gas stations network**](https://github.com/tabookey-dev/tabookey-
gasless) is being developed by **TabooKey**. It consists of `RelayHub`, `RelayRecipient`, `web3
hooks`, an implementation of a gas station inside `geth`, and sample dapps using the gas stations
network.


In [121]:
print(textwrap.fill(paragraphs[59], width=100))

## Copyright Copyright and related rights waived via [CC0](../LICENSE.md).


The spliting by paragraphs doens't make sense because the text subject is lost.

### 2.2 Sliding Window Chunking

In [37]:
def sliding_window(seq, size, step):
    if size <= 0 or step <= 0:
        raise ValueError("size and step must be positive")

    n = len(seq)
    result = []
    for i in range(0, n, step):
        chunk = seq[i:i+size]
        result.append({'start': i, 'chunk': chunk})
        if i + size >= n:
            break

    return result

In [38]:
erc_data_chunks = []

for doc in erc_data:
    doc_copy = doc.copy()
    doc_content = doc_copy.pop('content')
    chunks = sliding_window(doc_content, 2000, 1000)
    for chunk in chunks:
        chunk.update(doc_copy)
    erc_data_chunks.extend(chunks)

In [39]:
len(erc_data_chunks)

7022

In [92]:
erc_data_chunks[0]

{'start': 0,
 'chunk': '## What is an EIP?\n\nEIP stands for Ethereum Improvement Proposal. An EIP is a design document providing information to the Ethereum community, or describing a new feature for Ethereum or its processes or environment. The EIP should provide a concise technical specification of the feature and a rationale for the feature. The EIP author is responsible for building consensus within the community and documenting dissenting opinions.\n\n## EIP Rationale\n\nWe intend EIPs to be the primary mechanisms for proposing new features, for collecting community technical input on an issue, and for documenting the design decisions that have gone into Ethereum. Because the EIPs are maintained as text files in a versioned repository, their revision history is the historical record of the feature proposal.\n\nFor Ethereum implementers, EIPs are a convenient way to track the progress of their implementation. Ideally each implementation maintainer would list the EIPs that they hav

In [93]:
len(erc_data_chunks[0]["chunk"])

2000

In [96]:
print(erc_data_chunks[0]["chunk"], sep="\n")

## What is an EIP?

EIP stands for Ethereum Improvement Proposal. An EIP is a design document providing information to the Ethereum community, or describing a new feature for Ethereum or its processes or environment. The EIP should provide a concise technical specification of the feature and a rationale for the feature. The EIP author is responsible for building consensus within the community and documenting dissenting opinions.

## EIP Rationale

We intend EIPs to be the primary mechanisms for proposing new features, for collecting community technical input on an issue, and for documenting the design decisions that have gone into Ethereum. Because the EIPs are maintained as text files in a versioned repository, their revision history is the historical record of the feature proposal.

For Ethereum implementers, EIPs are a convenient way to track the progress of their implementation. Ideally each implementation maintainer would list the EIPs that they have implemented. This will give en

In [81]:
print(erc_data_chunks[1]["chunk"], sep="\n")

d users a convenient way to know the current status of a given implementation or library.

## EIP Types

There are three types of EIP:

- A **Standards Track EIP** describes any change that affects most or all Ethereum implementations, such as—a change to the network protocol, a change in block or transaction validity rules, proposed application standards/conventions, or any change or addition that affects the interoperability of applications using Ethereum. Standards Track EIPs consist of three parts—a design document, an implementation, and (if warranted) an update to the [formal specification](https://github.com/ethereum/yellowpaper). Furthermore, Standards Track EIPs can be broken down into the following categories:
  - **Core**: improvements requiring a consensus fork (e.g. [EIP-5](./eip-5.md), [EIP-101](./eip-101.md)), as well as changes that are not necessarily consensus critical but may be relevant to [“core dev” discussions](https://github.com/ethereum/pm) (for example, [EIP-9

In [82]:
print(erc_data_chunks[2]["chunk"], sep="\n")

0], and the miner/node strategy changes 2, 3, and 4 of [EIP-86](./eip-86.md)).
  - **Networking**: includes improvements around [devp2p](https://github.com/ethereum/devp2p/blob/readme-spec-links/rlpx.md) ([EIP-8](./eip-8.md)) and [Light Ethereum Subprotocol](https://ethereum.org/en/developers/docs/nodes-and-clients/#light-node), as well as proposed improvements to network protocol specifications of [whisper](https://github.com/ethereum/go-ethereum/issues/16013#issuecomment-364639309) and [swarm](https://github.com/ethereum/go-ethereum/pull/2959).
  - **Interface**: includes improvements around language-level standards like method names ([EIP-6](./eip-6.md)) and [contract ABIs](https://docs.soliditylang.org/en/develop/abi-spec.html).
  - **ERC**: application-level standards and conventions, including contract standards such as token standards ([ERC-20](./eip-20.md)), name registries ([ERC-137](./eip-137.md)), URI schemes, library/package formats, and wallet formats.

- A **Meta EIP** de

It's clear that the chunking strategy didn't work well for this dataset. The chunks often start or end in the middle of sentences, making them less coherent. A better approach would be to split the text at natural boundaries, such as paragraphs or sentences, rather than using fixed character counts. This would help maintain the context and meaning of the content within each chunk.

### 2.3 Splitting by Sections

In [97]:
import re

def split_markdown_by_level(text, level=2):
    """
    Split markdown text by a specific header level.
    
    :param text: Markdown text as a string
    :param level: Header level to split on
    :return: List of sections as strings
    """
    # This regex matches markdown headers
    # For level 2, it matches lines starting with "## "
    header_pattern = r'^(#{' + str(level) + r'} )(.+)$'
    pattern = re.compile(header_pattern, re.MULTILINE)

    # Split and keep the headers
    parts = pattern.split(text)
    
    sections = []
    for i in range(1, len(parts), 3):
        # We step by 3 because regex.split() with
        # capturing groups returns:
        # [before_match, group1, group2, after_match, ...]
        # here group1 is "## ", group2 is the header text
        header = parts[i] + parts[i+1]  # "## " + "Title"
        header = header.strip()

        # Get the content after this header
        content = ""
        if i+2 < len(parts):
            content = parts[i+2].strip()

        if content:
            section = f'{header}\n\n{content}'
        else:
            section = header
        sections.append(section)
    
    return sections

In [99]:
erc_data_chunks_sections = []

for doc in erc_data:
    doc_copy = doc.copy()
    doc_content = doc_copy.pop('content')
    sections = split_markdown_by_level(doc_content, level=2)
    for section in sections:
        section_doc = doc_copy.copy()
        section_doc['section'] = section
        erc_data_chunks_sections.append(section_doc)

In [100]:
len(erc_data_chunks_sections)

4141

In [101]:
erc_data_chunks_sections[0]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md',
 'section': '## What is an EIP?\n\nEIP stands for Ethereum Improvement Proposal. An EIP is a design document providing information to the Ethereum community, or describing a new feature for Ethereum or its processes or environment. The EIP should provide a concise technical specification of the feature and a rationale for the feature. The EIP author is responsible for building consensus within the community and documenting dissenting opinions.'}

In [103]:
import textwrap

In [104]:
print(textwrap.fill(erc_data_chunks_sections[0]["section"], width=100))

## What is an EIP?  EIP stands for Ethereum Improvement Proposal. An EIP is a design document
providing information to the Ethereum community, or describing a new feature for Ethereum or its
processes or environment. The EIP should provide a concise technical specification of the feature
and a rationale for the feature. The EIP author is responsible for building consensus within the
community and documenting dissenting opinions.


In [106]:
print(textwrap.fill(erc_data_chunks_sections[1]["section"], width=100))

## EIP Rationale  We intend EIPs to be the primary mechanisms for proposing new features, for
collecting community technical input on an issue, and for documenting the design decisions that have
gone into Ethereum. Because the EIPs are maintained as text files in a versioned repository, their
revision history is the historical record of the feature proposal.  For Ethereum implementers, EIPs
are a convenient way to track the progress of their implementation. Ideally each implementation
maintainer would list the EIPs that they have implemented. This will give end users a convenient way
to know the current status of a given implementation or library.


In [107]:
print(textwrap.fill(erc_data_chunks_sections[3]["section"], width=100))

## EIP Work Flow  ### Shepherding an EIP  Parties involved in the process are you, the champion or
*EIP author*, the [*EIP editors*](#eip-editors), and the [*Ethereum Core
Developers*](https://github.com/ethereum/pm).  Before you begin writing a formal EIP, you should vet
your idea. Ask the Ethereum community first if an idea is original to avoid wasting time on
something that will be rejected based on prior research. It is thus recommended to open a discussion
thread on [the Ethereum Magicians forum](https://ethereum-magicians.org/) to do this.  Once the idea
has been vetted, your next responsibility will be to present (by means of an EIP) the idea to the
reviewers and all interested parties, invite editors, developers, and the community to give feedback
on the aforementioned channels. You should try and gauge whether the interest in your EIP is
commensurate with both the work involved in implementing it and how many parties will have to
conform to it. For example, the work required f

In [108]:
print(erc_data_chunks_sections[3]["section"], sep="\n")

## EIP Work Flow

### Shepherding an EIP

Parties involved in the process are you, the champion or *EIP author*, the [*EIP editors*](#eip-editors), and the [*Ethereum Core Developers*](https://github.com/ethereum/pm).

Before you begin writing a formal EIP, you should vet your idea. Ask the Ethereum community first if an idea is original to avoid wasting time on something that will be rejected based on prior research. It is thus recommended to open a discussion thread on [the Ethereum Magicians forum](https://ethereum-magicians.org/) to do this.

Once the idea has been vetted, your next responsibility will be to present (by means of an EIP) the idea to the reviewers and all interested parties, invite editors, developers, and the community to give feedback on the aforementioned channels. You should try and gauge whether the interest in your EIP is commensurate with both the work involved in implementing it and how many parties will have to conform to it. For example, the work required f

In [122]:
len(erc_data_chunks_sections[3]["section"])

5057

I think it worked well, but I need to verify more cases. The last section seens to be big.

### 2.4 Intelligent Chunking with LLM

In [7]:
import google.generativeai as genai
import os

from dotenv import load_dotenv

load_dotenv()

try:
    api_key = os.getenv("GOOGLE_API_KEY")
    if not api_key:
        raise ValueError("The environment variable GOOGLE_API_KEY was not found.")
    
    genai.configure(api_key=api_key)
    print("✅ Google AI configured successfully!")

except ValueError as e:
    logger.error(e)

✅ Google AI configured successfully!


In [8]:
def llm(prompt, model='gemini-2.5-flash'):
    """
    Sends a prompt to the Gemini model and returns the response as text.

    Args:
        prompt (str): The text to send to the model.
        model (str): The Gemini model name to use.
                     'gemini-1.5-flash' is a great fast and capable option.

    Returns:
        str: The generated text response from the model.
    """
    model_instance = genai.GenerativeModel(model)

    response = model_instance.generate_content(prompt)
    
    return response.text

In [9]:
prompt_template = """
    Split the provided document into logical sections that make sense for a Q&A system.
    
    Each section should be self-contained and cover a specific topic or concept.
    
    <DOCUMENT>
    {document}
    </DOCUMENT>
    
    Use this format:
    
    ## Section Name
    
    Section content with all relevant details
    
    ---
    
    ## Another Section Name
    
    Another section content
    
    ---
""".strip()


In [None]:
def intelligent_chunking(text):
    prompt = prompt_template.format(document=text)
    response = llm(prompt)
    sections = response.split('---')
    sections = [s.strip() for s in sections if s.strip()]

    return sections

In [21]:
from tqdm.auto import tqdm

erc_data_ai_chunks = []

for doc in tqdm(erc_data[:10]):  # Limiting to first 10 documents for cost and time control
    doc_copy = doc.copy()
    doc_content = doc_copy.pop('content')

    sections = intelligent_chunking(doc_content)
    for section in sections:
        count += 1
        section_doc = doc_copy.copy()
        section_doc['section'] = section
        erc_data_ai_chunks.append(section_doc)

  0%|          | 0/10 [00:00<?, ?it/s]

In [22]:
len(erc_data_ai_chunks)

256

In [23]:
erc_data_ai_chunks[0]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md',
 'section': '## What is an EIP?\n\nAn EIP, or Ethereum Improvement Proposal, is a design document that provides information to the Ethereum community or describes a new feature for Ethereum, its processes, or environment. It should contain a concise technical specification of the feature and its rationale. The EIP author is responsible for building community consensus and documenting dissenting opinions.'}

In [24]:
erc_data_ai_chunks[1]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md',
 'section': "## EIP Rationale and Purpose\n\nEIPs are intended to be the primary mechanism for proposing new features, gathering technical input from the community, and documenting Ethereum's design decisions. Since EIPs are maintained as text files in a versioned repository, their revision history serves as a historical record of feature proposals. For Ethereum implementers, EIPs offer a convenient way to track implementation progress, allowing them to list implemented EIPs for end-users to check the status of a given implementation or library."}

In [25]:
erc_data_ai_chunks[2]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md',
 'section': '## EIP Types\n\nThere are three main types of EIPs, each serving a distinct purpose:\n\n*   **Standards Track EIP**: Describes changes affecting most or all Ethereum implementations. This includes network protocol changes, block/transaction validity rules, application standards, or any change impacting interoperability. Standards Track EIPs comprise a design document, an implementation, and sometimes an update to the formal specification. They are further categorized into:\n    *   **Core**: Improvements requiring a consensus fork or relevant to "core dev" discussions (e.g., EIP-5, EIP-101). If a Core EIP proposes changes to the EVM, it must refer to instructions by their mnemonics and define their opcodes (e.g., `

In [26]:
erc_data_ai_chunks[3]

{'eip': 1,
 'title': 'EIP Purpose and Guidelines',
 'status': 'Living',
 'type': 'Meta',
 'author': 'Martin Becze <mb@ethereum.org>, Hudson Jameson <hudson@ethereum.org>, et al.',
 'created': datetime.date(2015, 10, 27),
 'filename': 'ERCs-master/ERCS/eip-1.md',
 'section': "## EIP Work Flow: Shepherding an EIP\n\nThe EIP work flow involves the EIP author (champion), EIP editors, and Ethereum Core Developers.\n\n1.  **Vetting the Idea**: Before writing a formal EIP, authors should vet their idea by opening a discussion thread on the Ethereum Magicians forum to check for originality and avoid redundant work.\n2.  **Presenting the Idea and Gathering Feedback**: Once vetted, the author presents the idea via an EIP to reviewers and interested parties, inviting feedback from editors, developers, and the community. Authors should gauge if community interest justifies the implementation work and the number of parties required to conform. Negative community feedback can prevent an EIP from adv

In [27]:
print(erc_data_ai_chunks[3]["section"], sep="\n")

## EIP Work Flow: Shepherding an EIP

The EIP work flow involves the EIP author (champion), EIP editors, and Ethereum Core Developers.

1.  **Vetting the Idea**: Before writing a formal EIP, authors should vet their idea by opening a discussion thread on the Ethereum Magicians forum to check for originality and avoid redundant work.
2.  **Presenting the Idea and Gathering Feedback**: Once vetted, the author presents the idea via an EIP to reviewers and interested parties, inviting feedback from editors, developers, and the community. Authors should gauge if community interest justifies the implementation work and the number of parties required to conform. Negative community feedback can prevent an EIP from advancing past the Draft stage.
3.  **Community Consensus**: The champion's role is to write the EIP in the prescribed format, lead discussions in appropriate forums, and build community consensus around the idea.


In [28]:
len(erc_data_ai_chunks[3]["section"])

929

The result using AI seems to have been smaller than the previous method. Did we lose a lot of information?