-
Notifications
You must be signed in to change notification settings - Fork 0
Multi-Agent Document Translation App with Google ADK and A2A Protocol #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Multi-Agent Document Translation App with Google ADK and A2A Protocol #3
Conversation
…ocol - Implemented 4-agent architecture: Orchestrator, Conversion, Translation, Validation - Orchestrator Agent: Manages workflow, coordinates agents, handles job lifecycle - Conversion Agent: Converts PDF/PPTX/DOCX to high-resolution page images - Image Translation Agent: OCR, translation, and layout-preserving text re-rendering - Validation Agent: Quality assessment via back-translation and layout comparison - FastAPI web interface with RESTful endpoints - Google Cloud integration (Vision, Translation, Storage APIs) - Comprehensive quality metrics and validation - Async processing with retry logic and error handling - Complete test suite and documentation - Example usage scripts and configuration templates Features: - Layout preservation with high fidelity - Multi-format document support - Semantic validation through back-translation - Visual consistency checks using SSIM - Scalable multi-agent architecture - Quality scoring and issue reporting
Reviewer's GuideThis PR introduces a full multi‐agent document translation system built on Google’s ADK and A2A protocol. It defines a BaseAgent abstraction, then implements Conversion, ImageTranslation, and Validation agents with end-to-end async workflows orchestrated by an OrchestratorAgent. A FastAPI front-end wires everything together, backed by a GCSHelper utility, comprehensive configuration, documentation, usage examples, and targeted tests. Sequence diagram for document translation workflowsequenceDiagram
actor User
participant API as FastAPI
participant Orchestrator as OrchestratorAgent
participant Conversion as ConversionAgent
participant ImageTranslation as ImageTranslationAgent
participant Validation as ValidationAgent
participant GCS as Google Cloud Storage
participant Vision as Cloud Vision API
participant Translate as Cloud Translation API
User->>API: POST /translate (document_path, target_language)
API->>Orchestrator: process(translation request)
Orchestrator->>Conversion: convert_document(document_path)
Conversion->>GCS: Upload page images
Conversion-->>Orchestrator: page_images URIs
loop For each page
Orchestrator->>ImageTranslation: translate_page(image_uri, target_language)
ImageTranslation->>GCS: Download image
ImageTranslation->>Vision: OCR (extract text)
ImageTranslation->>Translate: Translate text
ImageTranslation->>GCS: Upload translated image
ImageTranslation-->>Orchestrator: translated_image_uri
Orchestrator->>Validation: validate(original_image_uri, translated_image_uri)
Validation->>GCS: Download images
Validation->>Translate: Back-translate for semantic check
Validation-->>Orchestrator: quality score
end
Orchestrator->>GCS: Assemble final document
Orchestrator-->>API: Translation result (output_document, quality scores)
API-->>User: Response
Class diagram for agent abstractions and main agentsclassDiagram
class BaseAgent {
+agent_id: str
+name: str
+status: AgentStatus
+message_queue: asyncio.Queue
+logger
+start()
+stop()
+send_message()
+receive_message()
+execute_with_retry()
+process(input_data)
+validate_input(input_data)
+get_status()
}
class OrchestratorAgent {
+active_jobs: Dict[str, TranslationJob]
+process(input_data)
+get_job_status(job_id)
+list_active_jobs()
}
class ConversionAgent {
+process(input_data)
+download_document()
+convert_pdf_to_images()
+convert_pptx_to_images()
+convert_docx_to_images()
+upload_page_images()
}
class ImageTranslationAgent {
+process(input_data)
+download_image()
+extract_text_with_ocr()
+group_text_blocks()
+create_clean_image()
+translate_text_blocks()
+render_translated_text()
+upload_translated_image()
}
class ValidationAgent {
+process(input_data)
+download_image()
+check_layout_consistency()
+calculate_structural_similarity()
+check_semantic_consistency()
+calculate_text_similarity()
+check_text_completeness()
+calculate_overall_quality()
+generate_issues_report()
}
class TranslationJob {
+job_id: str
+document_path: str
+target_language: str
+source_language: str
+status: str
+page_images: List[str]
+translated_images: List[str]
+output_document: Optional[str]
+quality_scores: List[float]
+error_message: Optional[str]
}
class AgentResult {
+success: bool
+data: Optional[Dict[str, Any]]
+error: Optional[str]
+metadata: Optional[Dict[str, Any]]
}
class TextBlock {
+text: str
+bounding_box: List[Tuple[int, int]]
+confidence: float
+translated_text: str
+font_size: int
+font_color: Tuple[int, int, int]
}
BaseAgent <|-- OrchestratorAgent
BaseAgent <|-- ConversionAgent
BaseAgent <|-- ImageTranslationAgent
BaseAgent <|-- ValidationAgent
OrchestratorAgent o-- TranslationJob
ImageTranslationAgent o-- TextBlock
OrchestratorAgent o-- AgentResult
ConversionAgent o-- AgentResult
ImageTranslationAgent o-- AgentResult
ValidationAgent o-- AgentResult
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth considering. View full project report here.
RETRYING = "retrying" | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dataclass | |
@dataclass(frozen=True) |
Use frozen=True
to make the dataclasses
immutable and hashable. More details.
timestamp: Optional[float] = None | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dataclass | |
@dataclass(frozen=True) |
Again, Use frozen=True
.
from config.settings import settings, get_gcs_config | ||
|
||
|
||
@dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dataclass | |
@dataclass(frozen=True) |
Use frozen=True
to make the dataclasses
immutable and hashable. Read more.
"job_id": f"local_file_{int(asyncio.get_event_loop().time())}" | ||
} | ||
|
||
print(f"🔄 Starting translation to French...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"🔄 Starting translation to French...") | |
print("🔄 Starting translation to French...") |
f-string is unnecessary here. This can just be a string. More info.
🚀 Multi-Agent Document Translation App
This PR implements a sophisticated document translation system that preserves layout and visual integrity using Google's Agent Development Kit (ADK) and A2A protocol.
🎯 Problem Solved
Standard document translation tools often fail with visually complex documents (technical manuals, marketing brochures, academic papers with diagrams). They extract text, translate it, and try to reflow it into the document, which:
🏗️ Architecture
4-Agent System Design:
1. Orchestrator Agent 🎭
2. Conversion Agent 📄➡️🖼️
3. Image Translation Agent 🔤🌍
4. Validation Agent ✅📊
🌟 Key Features
📁 Project Structure
🔧 Technical Implementation
Google Cloud Integration:
Quality Assurance:
Error Handling:
🚀 Usage
Start the application:
API Endpoints:
POST /upload
- Upload documentsPOST /translate
- Start translationGET /job/{job_id}
- Check job statusGET /jobs
- List all jobsGET /languages
- Supported languagesExample:
🧪 Testing
📊 Quality Metrics
The system provides detailed quality metrics:
🔮 Future Enhancements
📚 Documentation
This implementation follows the exact design specifications provided and creates a production-ready multi-agent document translation system that preserves layout integrity while providing high-quality translations.
💻 View my work • About Codegen
Summary by Sourcery
Introduce a production-ready multi-agent document translation application that orchestrates document conversion, image-based OCR translation, and quality validation while preserving original layout and styling using Google Cloud services.
New Features:
Build:
Documentation:
Tests: