Design2Code Agent

A purple agent for the Design2Code benchmark that generates HTML code from screenshot images using GPT-4o Vision. This agent implements the A2A (Agent-to-Agent) protocol and can be submitted to the Design2Code leaderboard.

Project Structure

src/
├─ server.py      # Server setup and agent card configuration
├─ executor.py    # A2A request handling
├─ agent.py       # Your agent implementation goes here
└─ messenger.py   # A2A messaging utilities
tests/
└─ test_agent.py  # Agent tests
Dockerfile        # Docker configuration
pyproject.toml    # Python dependencies
.github/
└─ workflows/
   └─ test-and-publish.yml # CI workflow

Overview

This Design2Code agent:

Receives screenshot images embedded in messages via <screenshot_base64>...</screenshot_base64> tags
Uses GPT-4o Vision (via LiteLLM) to analyze screenshots and generate HTML code
Returns self-contained HTML that recreates the visual appearance of the screenshot
Wraps HTML output in <html_code>...</html_code> tags as required by the evaluator
Maintains conversation history for multi-turn interactions

Getting Started

Fork and Setup

Fork this repository to your GitHub account

Set up environment variables:

# Create a .env file (or export in your shell)
echo "OPENAI_API_KEY=your-openai-api-key-here" > .env

Install dependencies:
```
uv sync
```
Run the agent locally (see Running Locally below)
Test your agent (see Testing below)

Submitting to the Leaderboard

To submit this agent to the Design2Code leaderboard:

Register your agent on AgentBeats:
- Deploy your agent (see Publishing below for Docker deployment)
- Register it on the AgentBeats platform to obtain your agentbeats_id
Fork the leaderboard repository:
- Fork radmanesh/design2code-bench
Configure your submission:
- Edit scenario.toml in the leaderboard repository
- Add your agentbeats_id under [[participants]]
- Set name = "agent" (required)
- Add your OPENAI_API_KEY as a GitHub secret
Push to trigger evaluation:
- Push your changes to the leaderboard repository
- GitHub Actions will automatically run the evaluation

For detailed submission instructions, see the leaderboard repository README.

Running Locally

# Install dependencies
uv sync

# Run the server (default: http://127.0.0.1:9009)
uv run src/server.py

# Or with custom options
uv run src/server.py --host 0.0.0.0 --port 9009 --agent-llm openai/gpt-4o

Configuration Options

--host: Host to bind the server (default: 127.0.0.1)
--port: Port to bind the server (default: 9009)
--card-url: External URL for the agent card (for deployment)
--agent-llm: LLM model to use (default: openai/gpt-4o)

Environment Variables

The agent requires the following environment variable:

OPENAI_API_KEY: Your OpenAI API key for GPT-4o Vision access

You can set this in a .env file (loaded automatically) or export it:

export OPENAI_API_KEY=your-api-key-here

Running with Docker

# Build the image
docker build -t design2code-agent .

# Run the container (with API key from environment)
docker run -p 9009:9009 -e OPENAI_API_KEY=your-api-key-here design2code-agent

# Or with custom port
docker run -p 9019:9019 -e OPENAI_API_KEY=your-api-key-here design2code-agent --port 9019

Testing

The repository includes A2A conformance tests and a screenshot generation test.

# Install test dependencies
uv sync --extra test

# Start your agent in one terminal (see Running Locally above)

# Run all tests in another terminal
uv run pytest --agent-url http://localhost:9009

# Run specific test
uv run pytest tests/test_agent.py::test_screenshot_generation --agent-url http://localhost:9009

# Run with verbose output
uv run pytest -v --agent-url http://localhost:9009

Note: The screenshot test requires OPENAI_API_KEY to be set, as it makes real API calls to test the full generation pipeline.

Publishing

The repository includes a GitHub Actions workflow that automatically builds, tests, and publishes a Docker image to GitHub Container Registry.

GitHub Secrets

Add your API key as a repository secret:

Go to Settings → Secrets and variables → Actions
Click "New repository secret"
Name: OPENAI_API_KEY
Value: Your OpenAI API key
Click "Add secret"

This secret will be available to CI/CD workflows and can be used when deploying.

Docker Image Tags

Push to main → publishes latest tag:

ghcr.io/<your-username>/design2code-agent:latest

Create a git tag (e.g. git tag v1.0.0 && git push origin v1.0.0) → publishes version tags:

ghcr.io/<your-username>/design2code-agent:1.0.0
ghcr.io/<your-username>/design2code-agent:1

Once the workflow completes, find your Docker image in the Packages section (right sidebar). Configure package visibility in package settings if needed.

Deployment

After publishing, you can deploy the containerized agent:

# Pull and run from GitHub Container Registry
docker run -p 9009:9009 \
  -e OPENAI_API_KEY=your-api-key-here \
  ghcr.io/<your-username>/design2code-agent:latest

Note: Organization repositories may need package write permissions enabled manually (Settings → Actions → General). Version tags must follow semantic versioning (e.g., v1.0.0).

Agent Requirements for Leaderboard

To participate in the Design2Code leaderboard, your agent must:

✅ Accept screenshots: Receive images via <screenshot_base64>...</screenshot_base64> tags ✅ Generate HTML: Produce HTML that recreates the visual appearance ✅ Format output: Wrap HTML in <html_code>...</html_code> tags ✅ Self-contained: Include all CSS within the HTML file (no external dependencies) ✅ Image placeholders: Use "rick.jpg" as placeholder for images ✅ Vision model: Use a vision-capable LLM (e.g., GPT-4o Vision) ✅ A2A compliance: Follow the A2A protocol format

This agent meets all these requirements and is ready for submission.

Evaluation Metrics

The Design2Code benchmark evaluates agents on five dimensions (each weighted 20%):

Layout Coverage: Element size and area coverage matching
Text Accuracy: Text content similarity using sequence matching
Position Accuracy: Element positioning accuracy
Color Accuracy: Color matching using CIEDE2000 color difference
Visual Similarity: Overall visual similarity using CLIP model

Final score: 0.2 × (layout + text + position + color + visual) (range: 0.0 to 1.0)

See the leaderboard repository for more details on evaluation methodology.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Design2Code Agent

Project Structure

Overview

Getting Started

Fork and Setup

Submitting to the Leaderboard

Running Locally

Configuration Options

Environment Variables

Running with Docker

Testing

Publishing

GitHub Secrets

Docker Image Tags

Deployment

Agent Requirements for Leaderboard

Evaluation Metrics

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Design2Code Agent

Project Structure

Overview

Getting Started

Fork and Setup

Submitting to the Leaderboard

Running Locally

Configuration Options

Environment Variables

Running with Docker

Testing

Publishing

GitHub Secrets

Docker Image Tags

Deployment

Agent Requirements for Leaderboard

Evaluation Metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages