A robust Python service that extracts data from Excel files and merges it into PowerPoint templates using Jinja-style merge fields. Designed for local development, testing, and Google Cloud Function deployment.
- Dynamic Excel Data Extraction: Flexible table detection with configurable search criteria
- PowerPoint Template Processing: Jinja-style merge field replacement (
{{field_name}}
) - Dynamic Slide Duplication: Create multiple slides from templates based on list data
- Slide Filtering: Include or exclude specific slides in final output
- Multiple Data Orientations: Support for horizontal (key-value) and vertical (table) data layouts
- Image Handling: Extract images from Excel and replace image placeholders in PowerPoint with aspect ratio preservation
- Explicit Field Type Support: Configure field types (text, image, number, date, boolean) for precise data handling
- Environment-Based Configuration: Development, testing, and production configurations
- Temporary File Management: Intelligent cleanup with configurable retention policies
- API Endpoints: RESTful API with health checks, preview, and processing endpoints
- Google Cloud Ready: Deploy as Cloud Function with one command
- Comprehensive Testing: Unit tests, integration tests, and fixtures
- Docker Support: Containerized development and deployment
- Python 3.9+
- uv package manager (recommended) or pip
- Google Cloud SDK (for deployment)
- Docker (optional, for containerized development)
git clone <repository-url>
cd excel_pptx_merger
# Run the automated setup script
python scripts/setup_dev.py
# Copy environment template
cp .env.example .env
# Edit with your configuration
nano .env
The application uses a standardized folder structure for temporary files:
.temp/ # Local development temporary directory
βββ excel_pptx_merger_* # Unique session directory
βββ input/ # Uploaded Excel and PowerPoint files
βββ output/ # Generated merged PowerPoint files
βββ images/ # Extracted images from Excel files
βββ debug/ # Debug information (in development mode)
This structure is automatically created when you run the application. The .temp
directory is included in .gitignore
to prevent temporary files from being committed.
# Using the convenience script
python scripts/run_local_server.py --debug
# Or using uv directly
uv run python -m src.main serve --debug
# Or using Docker
docker-compose -f docker/docker-compose.yml up
# Health check
curl http://localhost:5000/api/v1/health
# Get default configuration
curl http://localhost:5000/api/v1/config
The service uses JSON configuration to define how data should be extracted from Excel files:
{
"version": "1.0",
"sheet_configs": {
"Order Form": {
"subtables": [
{
"name": "client_info",
"type": "key_value_pairs",
"header_search": {
"method": "contains_text",
"text": "Client",
"column": "A",
"search_range": "A1:A10"
},
"data_extraction": {
"orientation": "horizontal",
"headers_row_offset": 0,
"data_row_offset": 1,
"max_columns": 6,
"column_mappings": {
"Client": {
"name": "client_name",
"type": "text"
},
"Word Or Image": {
"name": "search_type",
"type": "text"
}
}
}
},
{
"name": "word_search",
"type": "table",
"header_search": {
"method": "contains_text",
"text": "Word",
"column": "A",
"search_range": "A1:A50"
},
"data_extraction": {
"headers_row_offset": 0,
"data_row_offset": 1,
"max_rows": 10,
"column_mappings": {
"Word": {
"name": "word",
"type": "text"
},
"Search Criteria": {
"name": "search_criteria",
"type": "text"
}
}
}
}
]
}
}
}
Merge fields in PowerPoint templates use the Jinja-style double curly braces syntax:
{{field_name}}
For nested data or arrays, use dot notation:
{{table_name.0.field_name}}
To create an image placeholder:
- Create a text box in PowerPoint
- Add a merge field with the name of an image field:
{{image_field_name}}
- Format the text box to the desired size and position
- The image will be inserted maintaining its original aspect ratio and centered within the placeholder
- Sizing: Make image placeholders slightly larger than needed to accommodate various image sizes
- Text Alternatives: Consider adding conditional text for cases where images might be missing
- Field Naming: Use descriptive field names that match your Excel data structure
- Testing: Test templates with sample data to ensure proper field replacement
The Excel processor supports two main data structures:
- Key-Value Pairs: For client info and other metadata (horizontal or vertical orientation)
- Tables: For lists of items with multiple columns
To include images in your Excel data:
- Insert images into Excel cells
- Configure the corresponding fields as
"type": "image"
in the configuration - The processor will extract these images and make them available for PowerPoint insertion
- Sheet Names: Use consistent sheet names that match your configuration
- Headers: Include clear headers that match your configuration's search criteria
- Data Formatting: Keep data consistent with expected types (text, numbers, dates)
- Images: Insert images properly into cells rather than floating them
The configuration now supports explicit field type information:
"column_mappings": {
"Header Name": {
"name": "field_name",
"type": "text|image|number|date|boolean"
}
}
Supported field types:
- text: Text content (default)
- image: Image content (path, URL, or binary data)
- number: Numeric values
- date: Date values
- boolean: Boolean values
- sheet_configs: Defines which Excel sheets to process
- subtables: Defines tables or key-value sections within each sheet
- header_search: Defines how to locate the start of each data section
- data_extraction: Defines how to extract and map data from the section
See the config/default_config.json
file for comprehensive examples of:
- Key-value pair extraction
- Table data extraction
- Image field configuration
- Search criteria options
Create multiple slides from a template slide based on list data:
{
"global_settings": {
"powerpoint": {
"dynamic_slides": {
"enabled": true,
"template_marker": "{{#list:",
"remove_template_slides": true
}
}
}
}
Template slide example:
{{#list:products}}
Product: {{name}}
Price: {{price}}
Stock: {{quantity}}
This will create one slide for each item in the "products" list, replacing merge fields with data from each item.
Control which slides appear in the final output:
{
"global_settings": {
"powerpoint": {
"slide_filter": {
"include_slides": [1, 3, 5], // Only include these slides
"exclude_slides": [2, 4] // Or exclude specific slides
}
}
}
}
Notes:
- Slide numbers are 1-based (matching PowerPoint UI)
- If
include_slides
is specified, only those slides are included - If only
exclude_slides
is specified, all slides except those are included - Empty configuration includes all slides
-
Excel Processing:
- Locate sheets based on configuration
- Find headers using search criteria
- Extract data according to extraction rules
- Apply column mappings and field type information
- Extract embedded images
-
PowerPoint Processing:
- Load template presentation
- Find merge fields in text shapes
- Replace text fields with corresponding data
- Replace image placeholders with actual images (maintaining aspect ratio)
- Save the resulting presentation
# Run all tests
python -m pytest
# Run specific test file
python -m pytest tests/test_excel_processor.py
# Run with coverage
python -m pytest --cov=src
# Deploy to Google Cloud
python scripts/deploy_gcp.py --project your-project-id
# Build Docker image
docker build -f docker/Dockerfile -t excel-pptx-merger:latest .
# Run Docker container
docker run -p 5000:5000 excel-pptx-merger:latest
- API Documentation: Available at
/api/v1/
endpoints when server is running - Configuration Guide: See
config/default_config.json
for examples - Deployment Guide: See
scripts/deploy_gcp.py
for Google Cloud deployment
- Validate all input files before processing
- Use proper authentication for API endpoints
- Sanitize file paths to prevent directory traversal
- Consider encryption for sensitive data
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.