Skip to content

radmanesh/design2code-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Design2Code Agent

A purple agent for the Design2Code benchmark that generates HTML code from screenshot images using GPT-4o Vision. This agent implements the A2A (Agent-to-Agent) protocol and can be submitted to the Design2Code leaderboard.

Project Structure

src/
├─ server.py      # Server setup and agent card configuration
├─ executor.py    # A2A request handling
├─ agent.py       # Your agent implementation goes here
└─ messenger.py   # A2A messaging utilities
tests/
└─ test_agent.py  # Agent tests
Dockerfile        # Docker configuration
pyproject.toml    # Python dependencies
.github/
└─ workflows/
   └─ test-and-publish.yml # CI workflow

Overview

This Design2Code agent:

  • Receives screenshot images embedded in messages via <screenshot_base64>...</screenshot_base64> tags
  • Uses GPT-4o Vision (via LiteLLM) to analyze screenshots and generate HTML code
  • Returns self-contained HTML that recreates the visual appearance of the screenshot
  • Wraps HTML output in <html_code>...</html_code> tags as required by the evaluator
  • Maintains conversation history for multi-turn interactions

Getting Started

Fork and Setup

  1. Fork this repository to your GitHub account

  2. Set up environment variables:

    # Create a .env file (or export in your shell)
    echo "OPENAI_API_KEY=your-openai-api-key-here" > .env
  3. Install dependencies:

    uv sync
  4. Run the agent locally (see Running Locally below)

  5. Test your agent (see Testing below)

Submitting to the Leaderboard

To submit this agent to the Design2Code leaderboard:

  1. Register your agent on AgentBeats:

    • Deploy your agent (see Publishing below for Docker deployment)
    • Register it on the AgentBeats platform to obtain your agentbeats_id
  2. Fork the leaderboard repository:

  3. Configure your submission:

    • Edit scenario.toml in the leaderboard repository
    • Add your agentbeats_id under [[participants]]
    • Set name = "agent" (required)
    • Add your OPENAI_API_KEY as a GitHub secret
  4. Push to trigger evaluation:

    • Push your changes to the leaderboard repository
    • GitHub Actions will automatically run the evaluation

For detailed submission instructions, see the leaderboard repository README.

Running Locally

# Install dependencies
uv sync

# Run the server (default: http://127.0.0.1:9009)
uv run src/server.py

# Or with custom options
uv run src/server.py --host 0.0.0.0 --port 9009 --agent-llm openai/gpt-4o

Configuration Options

  • --host: Host to bind the server (default: 127.0.0.1)
  • --port: Port to bind the server (default: 9009)
  • --card-url: External URL for the agent card (for deployment)
  • --agent-llm: LLM model to use (default: openai/gpt-4o)

Environment Variables

The agent requires the following environment variable:

  • OPENAI_API_KEY: Your OpenAI API key for GPT-4o Vision access

You can set this in a .env file (loaded automatically) or export it:

export OPENAI_API_KEY=your-api-key-here

Running with Docker

# Build the image
docker build -t design2code-agent .

# Run the container (with API key from environment)
docker run -p 9009:9009 -e OPENAI_API_KEY=your-api-key-here design2code-agent

# Or with custom port
docker run -p 9019:9019 -e OPENAI_API_KEY=your-api-key-here design2code-agent --port 9019

Testing

The repository includes A2A conformance tests and a screenshot generation test.

# Install test dependencies
uv sync --extra test

# Start your agent in one terminal (see Running Locally above)

# Run all tests in another terminal
uv run pytest --agent-url http://localhost:9009

# Run specific test
uv run pytest tests/test_agent.py::test_screenshot_generation --agent-url http://localhost:9009

# Run with verbose output
uv run pytest -v --agent-url http://localhost:9009

Note: The screenshot test requires OPENAI_API_KEY to be set, as it makes real API calls to test the full generation pipeline.

Publishing

The repository includes a GitHub Actions workflow that automatically builds, tests, and publishes a Docker image to GitHub Container Registry.

GitHub Secrets

Add your API key as a repository secret:

  1. Go to Settings → Secrets and variables → Actions
  2. Click "New repository secret"
  3. Name: OPENAI_API_KEY
  4. Value: Your OpenAI API key
  5. Click "Add secret"

This secret will be available to CI/CD workflows and can be used when deploying.

Docker Image Tags

  • Push to main → publishes latest tag:

    ghcr.io/<your-username>/design2code-agent:latest
    
  • Create a git tag (e.g. git tag v1.0.0 && git push origin v1.0.0) → publishes version tags:

    ghcr.io/<your-username>/design2code-agent:1.0.0
    ghcr.io/<your-username>/design2code-agent:1
    

Once the workflow completes, find your Docker image in the Packages section (right sidebar). Configure package visibility in package settings if needed.

Deployment

After publishing, you can deploy the containerized agent:

# Pull and run from GitHub Container Registry
docker run -p 9009:9009 \
  -e OPENAI_API_KEY=your-api-key-here \
  ghcr.io/<your-username>/design2code-agent:latest

Note: Organization repositories may need package write permissions enabled manually (Settings → Actions → General). Version tags must follow semantic versioning (e.g., v1.0.0).

Agent Requirements for Leaderboard

To participate in the Design2Code leaderboard, your agent must:

Accept screenshots: Receive images via <screenshot_base64>...</screenshot_base64> tags ✅ Generate HTML: Produce HTML that recreates the visual appearance ✅ Format output: Wrap HTML in <html_code>...</html_code> tags ✅ Self-contained: Include all CSS within the HTML file (no external dependencies) ✅ Image placeholders: Use "rick.jpg" as placeholder for images ✅ Vision model: Use a vision-capable LLM (e.g., GPT-4o Vision) ✅ A2A compliance: Follow the A2A protocol format

This agent meets all these requirements and is ready for submission.

Evaluation Metrics

The Design2Code benchmark evaluates agents on five dimensions (each weighted 20%):

  • Layout Coverage: Element size and area coverage matching
  • Text Accuracy: Text content similarity using sequence matching
  • Position Accuracy: Element positioning accuracy
  • Color Accuracy: Color matching using CIEDE2000 color difference
  • Visual Similarity: Overall visual similarity using CLIP model

Final score: 0.2 × (layout + text + position + color + visual) (range: 0.0 to 1.0)

See the leaderboard repository for more details on evaluation methodology.

About

AgentBeats Design2Code Purple Agent

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors