Multi-Agent Document Translation App with Google ADK and A2A Protocol #4

codegen-sh · 2025-06-23T08:52:49Z

🌐 Multi-Agent Document Translation App

This PR introduces a sophisticated document translation system that preserves layout integrity using Google's Agent Development Kit (ADK) and A2A protocol.

🎯 Problem Solved

Standard document translation tools often fail with visually complex documents (technical manuals, marketing brochures, academic papers with diagrams). They extract text, translate it, and try to reflow it into the document, which:

Breaks the original layout
Misplaces captions
Destroys visual integrity

🏗️ Solution Architecture

3-Agent System:

📄 Document-to-Image Converter Agent
- Converts PDF pages to high-quality images
- Maintains original resolution and formatting
- Supports PDF, DOCX, DOC, TXT formats
🌐 Multimodal Translation Agent
- Uses Google Gemini Vision API for image-based translation
- Preserves layout, fonts, and visual elements
- Translates text while maintaining spatial relationships
✅ Quality Validation Agent
- Validates translation accuracy and layout preservation
- Uses structural similarity metrics and visual comparison
- Provides quality grades and improvement recommendations

🚀 Features

Layout Preservation: Maintains original document formatting with high fidelity
Multi-format Support: PDF, DOCX, and other document formats
Quality Assurance: Built-in validation with confidence scoring
Scalable Architecture: Agent-based system using A2A protocol
Google AI Integration: Leverages Gemini's multimodal capabilities
Web Interfaces: Both FastAPI REST API and Streamlit UI
Batch Processing: Handle multiple documents efficiently
12 Language Support: Including auto-detection

📁 Key Files

multi_agent_document_translator/orchestrator.py - Main orchestration logic
multi_agent_document_translator/agents/ - Individual agent implementations
multi_agent_document_translator/api.py - FastAPI web service
multi_agent_document_translator/streamlit_app.py - Web UI
multi_agent_document_translator/config.py - Configuration management

🛠️ Usage

Simple Usage:

from multi_agent_document_translator import orchestrator

result = await orchestrator.translate_document(
    document_path="input.pdf",
    target_language="es",  # Spanish
    output_path="translated_output.pdf"
)

Web API:

python multi_agent_document_translator/run_api.py
# Visit http://localhost:8000/docs for API documentation

Web UI:

python multi_agent_document_translator/run_streamlit.py
# Visit http://localhost:8501 for web interface

🧪 Testing

Comprehensive test suite included:

Agent unit tests
Orchestrator integration tests
Error handling validation
Mock API testing

📋 Requirements

Python 3.8+
Google Cloud credentials
Gemini API access
See requirements.txt for full dependencies

🔧 Configuration

Copy .env.example to .env and configure:

GEMINI_API_KEY: Your Gemini API key
GOOGLE_CLOUD_PROJECT: Your GCP project ID
Other optional settings for customization

This implementation provides a production-ready solution for layout-preserving document translation using cutting-edge AI and multi-agent architecture.

💻 View my work • About Codegen

Summary by Sourcery

Introduce a production-ready multi-agent document translation application using Google ADK and A2A protocol with image-based translation, layout preservation, and quality validation, exposed via FastAPI and Streamlit interfaces with full documentation, examples, and testing

New Features:

Implement a multi-agent orchestration system with converter, translator, and validator agents for layout-preserving document translation
Integrate Google Gemini Vision API for image-based translation while maintaining original formatting and spatial layout
Provide FastAPI REST endpoints and a Streamlit web UI to upload documents and download translated results
Support PDF, DOCX, and TXT formats with batch and simple usage examples
Include built-in quality validation with structural similarity, visual similarity, text completeness, and translation quality metrics

Documentation:

Add comprehensive README and inline documentation detailing architecture, usage, and configuration

Tests:

Add a suite of unit and integration tests for document converter, translation, and validation agents, as well as orchestrator workflows

Chores:

Add example scripts for simple and batch translation use cases
Include requirements.txt for dependencies and .env.example for environment configuration

…ocol - Implemented 3-agent architecture for layout-preserving document translation - Agent 1: Document-to-Image Converter (PDF, DOCX, TXT support) - Agent 2: Multimodal Translation Agent using Google Gemini Vision - Agent 3: Quality Validation Agent with layout preservation checks - Added FastAPI web service and Streamlit UI - Comprehensive configuration system with environment variables - Batch processing capabilities and usage examples - Full test suite for agents and orchestrator - Support for 12 languages with auto-detection - Quality assessment with layout similarity metrics

sourcery-ai · 2025-06-23T08:52:53Z

Reviewer's Guide

This PR implements a full multi-agent document translation pipeline using Google’s ADK and A2A protocol: it converts input documents to images, translates them via Google Gemini Vision while preserving layout, validates translation quality and layout fidelity, and exposes the workflow via FastAPI and Streamlit interfaces.

Sequence diagram for document translation workflow

sequenceDiagram
    actor User
    participant UI as Web UI/API
    participant Orchestrator
    participant Converter as DocumentConverterAgent
    participant Translator as TranslationAgent
    participant Validator as ValidationAgent
    participant Gemini as Google Gemini Vision API

    User->>UI: Upload document & request translation
    UI->>Orchestrator: translate_document(document, target_lang)
    Orchestrator->>Converter: process(document)
    Converter-->>Orchestrator: images
    Orchestrator->>Translator: process(images, target_lang)
    Translator->>Gemini: generate_content(prompt, image)
    Gemini-->>Translator: translation response
    Translator-->>Orchestrator: translated images, metadata
    Orchestrator->>Validator: process(original images, translated images, metadata)
    Validator->>Gemini: generate_content(validation prompt, images)
    Gemini-->>Validator: validation response
    Validator-->>Orchestrator: validation results
    Orchestrator-->>UI: results (output files, quality, etc.)
    UI-->>User: Download/display translated document

Class diagram for agent classes and orchestrator

classDiagram
    class BaseAgent {
        +agent_id: str
        +config: dict
        +is_running: bool
        +start()
        +stop()
        +process(input_data)
    }
    class DocumentConverterAgent {
        +process(input_data)
    }
    class TranslationAgent {
        +process(input_data)
    }
    class ValidationAgent {
        +process(input_data)
    }
    class TranslationOrchestrator {
        +agents: dict
        +initialize()
        +shutdown()
        +translate_document(...)
        +get_supported_languages()
        +get_system_status()
    }
    BaseAgent <|-- DocumentConverterAgent
    BaseAgent <|-- TranslationAgent
    BaseAgent <|-- ValidationAgent
    TranslationOrchestrator o-- DocumentConverterAgent
    TranslationOrchestrator o-- TranslationAgent
    TranslationOrchestrator o-- ValidationAgent

File-Level Changes

Change	Details	Files
Orchestrator and multi-agent framework	Initialize and manage DocumentConverter, Translation, and Validation agents Coordinate three-step pipeline: convert → translate → validate Generate final outputs and metadata including translation ID and timing Handle agent lifecycle (start, stop, shutdown)	`multi_agent_document_translator/orchestrator.py` `multi_agent_document_translator/__init__.py`
Document-to-image conversion agent	Convert PDF, DOCX, and TXT to high-resolution images Enforce file size limits and supported formats Use pdf2image, python-docx, and Pillow for conversion Manage per-agent temp directories and cleanup	`multi_agent_document_translator/agents/document_converter_agent.py`
Multimodal translation agent	Invoke Google Gemini Vision API with detailed prompts Process API responses to overlay translated text on images Maintain spatial positioning, formatting hints, and confidence metadata Clean up and export translated image files	`multi_agent_document_translator/agents/translation_agent.py`
Quality validation agent	Assess layout preservation via SSIM and histogram comparisons Check text completeness using translation metadata heuristics Optionally call Gemini for translation quality scoring Aggregate per-page scores, compute overall grade and recommendations	`multi_agent_document_translator/agents/validation_agent.py`
Web API and Streamlit UI	Expose translation endpoints, status, and downloads via FastAPI Add startup/shutdown hooks integrating the orchestrator Build Streamlit interface for file upload, progress, and results Provide run scripts for both API (`run_api.py`) and UI (`run_streamlit.py`)	`multi_agent_document_translator/api.py` `multi_agent_document_translator/run_api.py` `multi_agent_document_translator/streamlit_app.py` `multi_agent_document_translator/run_streamlit.py`
Centralized configuration	Define Pydantic settings for API keys, file limits, and languages Generate agent-specific configs via helper function Load environment variables from `.env` file Share supported formats and languages across modules	`multi_agent_document_translator/config.py`
Tests, examples, and docs	Add pytest suites for agents and orchestrator Include simple and batch usage scripts demonstrating API and orchestrator Provide README overview and requirements.txt for dependencies Organize examples and tests under respective directories	`multi_agent_document_translator/tests/` `multi_agent_document_translator/examples/` `multi_agent_document_translator/README.md` `multi_agent_document_translator/requirements.txt`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

code-review-doctor

Some things to consider. View full project report here.

code-review-doctor · 2025-06-23T08:53:01Z

multi_agent_document_translator/streamlit_app.py

+                st.markdown(
+                    f'<div class="agent-status {status_class}">'
+                    f'{status_icon} {agent_name.title()}'
+                    f'</div>',


Suggested change

f'</div>',

'</div>',

f-string is unnecessary here. This can just be a string. More info.

code-review-doctor · 2025-06-23T08:53:01Z

multi_agent_document_translator/streamlit_app.py

+                        f'<div class="{quality_class}">'
+                        f'**Quality Assessment:** {quality["grade"]} '
+                        f'({quality.get("overall_score", 0):.2f})'
+                        f'</div>',


Suggested change

f'</div>',

'</div>',

Likewise, f-string is unnecessary here.

code-review-doctor bot suggested changes Jun 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi-Agent Document Translation App with Google ADK and A2A Protocol #4

Multi-Agent Document Translation App with Google ADK and A2A Protocol #4

Uh oh!

codegen-sh bot commented Jun 23, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jun 23, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

code-review-doctor bot left a comment

Uh oh!

code-review-doctor bot Jun 23, 2025

Uh oh!

code-review-doctor bot Jun 23, 2025

Uh oh!

Uh oh!

Multi-Agent Document Translation App with Google ADK and A2A Protocol #4

Are you sure you want to change the base?

Multi-Agent Document Translation App with Google ADK and A2A Protocol #4

Uh oh!

Conversation

codegen-sh bot commented Jun 23, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!