-
Notifications
You must be signed in to change notification settings - Fork 0
Multi-Agent Document Translation App with Google ADK and A2A Protocol #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Multi-Agent Document Translation App with Google ADK and A2A Protocol #4
Conversation
…ocol - Implemented 3-agent architecture for layout-preserving document translation - Agent 1: Document-to-Image Converter (PDF, DOCX, TXT support) - Agent 2: Multimodal Translation Agent using Google Gemini Vision - Agent 3: Quality Validation Agent with layout preservation checks - Added FastAPI web service and Streamlit UI - Comprehensive configuration system with environment variables - Batch processing capabilities and usage examples - Full test suite for agents and orchestrator - Support for 12 languages with auto-detection - Quality assessment with layout similarity metrics
Reviewer's GuideThis PR implements a full multi-agent document translation pipeline using Google’s ADK and A2A protocol: it converts input documents to images, translates them via Google Gemini Vision while preserving layout, validates translation quality and layout fidelity, and exposes the workflow via FastAPI and Streamlit interfaces. Sequence diagram for document translation workflowsequenceDiagram
actor User
participant UI as Web UI/API
participant Orchestrator
participant Converter as DocumentConverterAgent
participant Translator as TranslationAgent
participant Validator as ValidationAgent
participant Gemini as Google Gemini Vision API
User->>UI: Upload document & request translation
UI->>Orchestrator: translate_document(document, target_lang)
Orchestrator->>Converter: process(document)
Converter-->>Orchestrator: images
Orchestrator->>Translator: process(images, target_lang)
Translator->>Gemini: generate_content(prompt, image)
Gemini-->>Translator: translation response
Translator-->>Orchestrator: translated images, metadata
Orchestrator->>Validator: process(original images, translated images, metadata)
Validator->>Gemini: generate_content(validation prompt, images)
Gemini-->>Validator: validation response
Validator-->>Orchestrator: validation results
Orchestrator-->>UI: results (output files, quality, etc.)
UI-->>User: Download/display translated document
Class diagram for agent classes and orchestratorclassDiagram
class BaseAgent {
+agent_id: str
+config: dict
+is_running: bool
+start()
+stop()
+process(input_data)
}
class DocumentConverterAgent {
+process(input_data)
}
class TranslationAgent {
+process(input_data)
}
class ValidationAgent {
+process(input_data)
}
class TranslationOrchestrator {
+agents: dict
+initialize()
+shutdown()
+translate_document(...)
+get_supported_languages()
+get_system_status()
}
BaseAgent <|-- DocumentConverterAgent
BaseAgent <|-- TranslationAgent
BaseAgent <|-- ValidationAgent
TranslationOrchestrator o-- DocumentConverterAgent
TranslationOrchestrator o-- TranslationAgent
TranslationOrchestrator o-- ValidationAgent
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to consider. View full project report here.
st.markdown( | ||
f'<div class="agent-status {status_class}">' | ||
f'{status_icon} {agent_name.title()}' | ||
f'</div>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f'</div>', | |
'</div>', |
f-string is unnecessary here. This can just be a string. More info.
f'<div class="{quality_class}">' | ||
f'**Quality Assessment:** {quality["grade"]} ' | ||
f'({quality.get("overall_score", 0):.2f})' | ||
f'</div>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f'</div>', | |
'</div>', |
Likewise, f-string is unnecessary here.
🌐 Multi-Agent Document Translation App
This PR introduces a sophisticated document translation system that preserves layout integrity using Google's Agent Development Kit (ADK) and A2A protocol.
🎯 Problem Solved
Standard document translation tools often fail with visually complex documents (technical manuals, marketing brochures, academic papers with diagrams). They extract text, translate it, and try to reflow it into the document, which:
🏗️ Solution Architecture
3-Agent System:
📄 Document-to-Image Converter Agent
🌐 Multimodal Translation Agent
✅ Quality Validation Agent
🚀 Features
📁 Key Files
multi_agent_document_translator/orchestrator.py
- Main orchestration logicmulti_agent_document_translator/agents/
- Individual agent implementationsmulti_agent_document_translator/api.py
- FastAPI web servicemulti_agent_document_translator/streamlit_app.py
- Web UImulti_agent_document_translator/config.py
- Configuration management🛠️ Usage
Simple Usage:
Web API:
python multi_agent_document_translator/run_api.py # Visit http://localhost:8000/docs for API documentation
Web UI:
python multi_agent_document_translator/run_streamlit.py # Visit http://localhost:8501 for web interface
🧪 Testing
Comprehensive test suite included:
📋 Requirements
requirements.txt
for full dependencies🔧 Configuration
Copy
.env.example
to.env
and configure:GEMINI_API_KEY
: Your Gemini API keyGOOGLE_CLOUD_PROJECT
: Your GCP project IDThis implementation provides a production-ready solution for layout-preserving document translation using cutting-edge AI and multi-agent architecture.
💻 View my work • About Codegen
Summary by Sourcery
Introduce a production-ready multi-agent document translation application using Google ADK and A2A protocol with image-based translation, layout preservation, and quality validation, exposed via FastAPI and Streamlit interfaces with full documentation, examples, and testing
New Features:
Documentation:
Tests:
Chores:
.env.example
for environment configuration