Skip to content

Conversation

codegen-sh[bot]
Copy link

@codegen-sh codegen-sh bot commented Jun 23, 2025

🌐 Multi-Agent Document Translation App

This PR introduces a sophisticated document translation system that preserves layout integrity using Google's Agent Development Kit (ADK) and A2A protocol.

🎯 Problem Solved

Standard document translation tools often fail with visually complex documents (technical manuals, marketing brochures, academic papers with diagrams). They extract text, translate it, and try to reflow it into the document, which:

  • Breaks the original layout
  • Misplaces captions
  • Destroys visual integrity

🏗️ Solution Architecture

3-Agent System:

  1. 📄 Document-to-Image Converter Agent

    • Converts PDF pages to high-quality images
    • Maintains original resolution and formatting
    • Supports PDF, DOCX, DOC, TXT formats
  2. 🌐 Multimodal Translation Agent

    • Uses Google Gemini Vision API for image-based translation
    • Preserves layout, fonts, and visual elements
    • Translates text while maintaining spatial relationships
  3. ✅ Quality Validation Agent

    • Validates translation accuracy and layout preservation
    • Uses structural similarity metrics and visual comparison
    • Provides quality grades and improvement recommendations

🚀 Features

  • Layout Preservation: Maintains original document formatting with high fidelity
  • Multi-format Support: PDF, DOCX, and other document formats
  • Quality Assurance: Built-in validation with confidence scoring
  • Scalable Architecture: Agent-based system using A2A protocol
  • Google AI Integration: Leverages Gemini's multimodal capabilities
  • Web Interfaces: Both FastAPI REST API and Streamlit UI
  • Batch Processing: Handle multiple documents efficiently
  • 12 Language Support: Including auto-detection

📁 Key Files

🛠️ Usage

Simple Usage:

from multi_agent_document_translator import orchestrator

result = await orchestrator.translate_document(
    document_path="input.pdf",
    target_language="es",  # Spanish
    output_path="translated_output.pdf"
)

Web API:

python multi_agent_document_translator/run_api.py
# Visit http://localhost:8000/docs for API documentation

Web UI:

python multi_agent_document_translator/run_streamlit.py
# Visit http://localhost:8501 for web interface

🧪 Testing

Comprehensive test suite included:

  • Agent unit tests
  • Orchestrator integration tests
  • Error handling validation
  • Mock API testing

📋 Requirements

  • Python 3.8+
  • Google Cloud credentials
  • Gemini API access
  • See requirements.txt for full dependencies

🔧 Configuration

Copy .env.example to .env and configure:

  • GEMINI_API_KEY: Your Gemini API key
  • GOOGLE_CLOUD_PROJECT: Your GCP project ID
  • Other optional settings for customization

This implementation provides a production-ready solution for layout-preserving document translation using cutting-edge AI and multi-agent architecture.


💻 View my workAbout Codegen

Summary by Sourcery

Introduce a production-ready multi-agent document translation application using Google ADK and A2A protocol with image-based translation, layout preservation, and quality validation, exposed via FastAPI and Streamlit interfaces with full documentation, examples, and testing

New Features:

  • Implement a multi-agent orchestration system with converter, translator, and validator agents for layout-preserving document translation
  • Integrate Google Gemini Vision API for image-based translation while maintaining original formatting and spatial layout
  • Provide FastAPI REST endpoints and a Streamlit web UI to upload documents and download translated results
  • Support PDF, DOCX, and TXT formats with batch and simple usage examples
  • Include built-in quality validation with structural similarity, visual similarity, text completeness, and translation quality metrics

Documentation:

  • Add comprehensive README and inline documentation detailing architecture, usage, and configuration

Tests:

  • Add a suite of unit and integration tests for document converter, translation, and validation agents, as well as orchestrator workflows

Chores:

  • Add example scripts for simple and batch translation use cases
  • Include requirements.txt for dependencies and .env.example for environment configuration

…ocol

- Implemented 3-agent architecture for layout-preserving document translation
- Agent 1: Document-to-Image Converter (PDF, DOCX, TXT support)
- Agent 2: Multimodal Translation Agent using Google Gemini Vision
- Agent 3: Quality Validation Agent with layout preservation checks
- Added FastAPI web service and Streamlit UI
- Comprehensive configuration system with environment variables
- Batch processing capabilities and usage examples
- Full test suite for agents and orchestrator
- Support for 12 languages with auto-detection
- Quality assessment with layout similarity metrics
Copy link

sourcery-ai bot commented Jun 23, 2025

Reviewer's Guide

This PR implements a full multi-agent document translation pipeline using Google’s ADK and A2A protocol: it converts input documents to images, translates them via Google Gemini Vision while preserving layout, validates translation quality and layout fidelity, and exposes the workflow via FastAPI and Streamlit interfaces.

Sequence diagram for document translation workflow

sequenceDiagram
    actor User
    participant UI as Web UI/API
    participant Orchestrator
    participant Converter as DocumentConverterAgent
    participant Translator as TranslationAgent
    participant Validator as ValidationAgent
    participant Gemini as Google Gemini Vision API

    User->>UI: Upload document & request translation
    UI->>Orchestrator: translate_document(document, target_lang)
    Orchestrator->>Converter: process(document)
    Converter-->>Orchestrator: images
    Orchestrator->>Translator: process(images, target_lang)
    Translator->>Gemini: generate_content(prompt, image)
    Gemini-->>Translator: translation response
    Translator-->>Orchestrator: translated images, metadata
    Orchestrator->>Validator: process(original images, translated images, metadata)
    Validator->>Gemini: generate_content(validation prompt, images)
    Gemini-->>Validator: validation response
    Validator-->>Orchestrator: validation results
    Orchestrator-->>UI: results (output files, quality, etc.)
    UI-->>User: Download/display translated document
Loading

Class diagram for agent classes and orchestrator

classDiagram
    class BaseAgent {
        +agent_id: str
        +config: dict
        +is_running: bool
        +start()
        +stop()
        +process(input_data)
    }
    class DocumentConverterAgent {
        +process(input_data)
    }
    class TranslationAgent {
        +process(input_data)
    }
    class ValidationAgent {
        +process(input_data)
    }
    class TranslationOrchestrator {
        +agents: dict
        +initialize()
        +shutdown()
        +translate_document(...)
        +get_supported_languages()
        +get_system_status()
    }
    BaseAgent <|-- DocumentConverterAgent
    BaseAgent <|-- TranslationAgent
    BaseAgent <|-- ValidationAgent
    TranslationOrchestrator o-- DocumentConverterAgent
    TranslationOrchestrator o-- TranslationAgent
    TranslationOrchestrator o-- ValidationAgent
Loading

File-Level Changes

Change Details Files
Orchestrator and multi-agent framework
  • Initialize and manage DocumentConverter, Translation, and Validation agents
  • Coordinate three-step pipeline: convert → translate → validate
  • Generate final outputs and metadata including translation ID and timing
  • Handle agent lifecycle (start, stop, shutdown)
multi_agent_document_translator/orchestrator.py
multi_agent_document_translator/__init__.py
Document-to-image conversion agent
  • Convert PDF, DOCX, and TXT to high-resolution images
  • Enforce file size limits and supported formats
  • Use pdf2image, python-docx, and Pillow for conversion
  • Manage per-agent temp directories and cleanup
multi_agent_document_translator/agents/document_converter_agent.py
Multimodal translation agent
  • Invoke Google Gemini Vision API with detailed prompts
  • Process API responses to overlay translated text on images
  • Maintain spatial positioning, formatting hints, and confidence metadata
  • Clean up and export translated image files
multi_agent_document_translator/agents/translation_agent.py
Quality validation agent
  • Assess layout preservation via SSIM and histogram comparisons
  • Check text completeness using translation metadata heuristics
  • Optionally call Gemini for translation quality scoring
  • Aggregate per-page scores, compute overall grade and recommendations
multi_agent_document_translator/agents/validation_agent.py
Web API and Streamlit UI
  • Expose translation endpoints, status, and downloads via FastAPI
  • Add startup/shutdown hooks integrating the orchestrator
  • Build Streamlit interface for file upload, progress, and results
  • Provide run scripts for both API (run_api.py) and UI (run_streamlit.py)
multi_agent_document_translator/api.py
multi_agent_document_translator/run_api.py
multi_agent_document_translator/streamlit_app.py
multi_agent_document_translator/run_streamlit.py
Centralized configuration
  • Define Pydantic settings for API keys, file limits, and languages
  • Generate agent-specific configs via helper function
  • Load environment variables from .env file
  • Share supported formats and languages across modules
multi_agent_document_translator/config.py
Tests, examples, and docs
  • Add pytest suites for agents and orchestrator
  • Include simple and batch usage scripts demonstrating API and orchestrator
  • Provide README overview and requirements.txt for dependencies
  • Organize examples and tests under respective directories
multi_agent_document_translator/tests/
multi_agent_document_translator/examples/
multi_agent_document_translator/README.md
multi_agent_document_translator/requirements.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@code-review-doctor code-review-doctor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to consider. View full project report here.

st.markdown(
f'<div class="agent-status {status_class}">'
f'{status_icon} {agent_name.title()}'
f'</div>',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'</div>',
'</div>',

f-string is unnecessary here. This can just be a string. More info.

f'<div class="{quality_class}">'
f'**Quality Assessment:** {quality["grade"]} '
f'({quality.get("overall_score", 0):.2f})'
f'</div>',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f'</div>',
'</div>',

Likewise, f-string is unnecessary here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants