# S3 Trigger Setup - Automatic Document Processing with ADE

In [1]:
# ---
# LandingAI Applied AI Content Notebook Template
# ---
# Title: S3 Trigger Setup - Automatic Document Processing with ADE
# Author: Ava Xia
# Description: Streamlined notebook for testing and using the deployed Lambda function
# Target Audience: [Developers, Partners, Customers]
# Content Type: [Tutorial, How-To]
# Publish Date: 2025-09-23
# ADE Version: v0.1.5
# Change Log:
#    - v1.0: Initial draft
#    - v1.1: Modularized with utility functions
# ---

This notebook configures automatic processing when documents are uploaded to S3.

## How S3 Triggers Work

```
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê      ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê      ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê      ‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ  Upload  ‚îÇ ---> ‚îÇ    S3    ‚îÇ ---> ‚îÇ  Trigger ‚îÇ ---> ‚îÇ  Lambda  ‚îÇ
‚îÇ   PDF    ‚îÇ      ‚îÇ  Bucket  ‚îÇ      ‚îÇ  Event   ‚îÇ      ‚îÇ Process  ‚îÇ
‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò      ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò      ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò      ‚îî‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îò
                                           ‚îÇ                 ‚îÇ
                                           ‚Üì                 ‚Üì
                                    Automatic         Save Results
```

When you upload a PDF to the configured S3 folder:
1. S3 generates an event
2. Event triggers Lambda function
3. Lambda processes the document
4. Results saved to `ade-results/` folder

## 1. Initialize Environment

In [2]:
import time
from datetime import datetime
from utils import (
    setup_aws_environment,
    setup_s3_trigger,
    list_s3_files,
    get_lambda_invocation_stats,
    get_error_logs
)

# Initialize AWS environment
config, clients, account_id, aws_session = setup_aws_environment()

BUCKET_NAME = config['bucket_name']
FUNCTION_NAME = config['function_name']

print(f"\n‚úÖ Ready to configure S3 triggers!")
print(f"   Bucket: {BUCKET_NAME}")
print(f"   Lambda: {FUNCTION_NAME}")

‚úÖ AWS Environment configured
   Profile: workload-dev-2
   Region: us-east-2
   Account: 970073041993

‚úÖ Ready to configure S3 triggers!
   Bucket: cf-mle-testing
   Lambda: ade-lambda-s3


## 2. Configure S3 Trigger

Set up automatic processing for a specific folder.

In [3]:
# Configure S3 trigger for invoices folder
trigger_folder = "invoices/"

print(f"üéØ Setting up S3 trigger for: {trigger_folder}")
print()

success = setup_s3_trigger(
    clients['s3'],
    clients['lambda'],
    BUCKET_NAME,
    FUNCTION_NAME,
    folder=trigger_folder
)

if success:
    print("\n‚úÖ S3 trigger is active!")
    print(f"   Any PDF uploaded to s3://{BUCKET_NAME}/{trigger_folder}")
    print("   will be automatically processed")

üéØ Setting up S3 trigger for: invoices/

‚úÖ Added S3 permission to Lambda
‚úÖ S3 trigger configured!
   üì§ Upload PDFs to: s3://cf-mle-testing/invoices/
   ‚ö° They will auto-process with Lambda

‚úÖ S3 trigger is active!
   Any PDF uploaded to s3://cf-mle-testing/invoices/
   will be automatically processed


## 3. Test the Trigger

Upload a test file to verify the trigger works.

In [4]:
# Test file upload
test_file_path = "input_folder/invoice_1.pdf"  # Local file
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
s3_key = f"{trigger_folder}test_trigger_{timestamp}.pdf"

print("üì§ Uploading test file...")
print(f"   From: {test_file_path}")
print(f"   To: s3://{BUCKET_NAME}/{s3_key}")
print()

try:
    # Upload the file
    clients['s3'].upload_file(test_file_path, BUCKET_NAME, s3_key)
    print("‚úÖ File uploaded successfully!")
    print("‚è≥ Waiting for Lambda to process...")
    
    # Wait for processing
    time.sleep(10)
    
    # Check for results
    results = list_s3_files(
        clients['s3'],
        BUCKET_NAME,
        "ade-results/",
        max_files=5
    )
    
    if results:
        print("\n‚úÖ Processing complete!")
        print("   Recent results:")
        for r in results[:3]:
            print(f"   ‚Ä¢ {r['File']}")
    else:
        print("‚ö†Ô∏è Still processing or no results yet")
        print("   Check CloudWatch logs for details")
        
except FileNotFoundError:
    print(f"‚ùå Test file not found: {test_file_path}")
    print("   Please update the path to a valid PDF file")
except Exception as e:
    print(f"‚ùå Upload failed: {e}")

üì§ Uploading test file...
   From: input_folder/invoice_1.pdf
   To: s3://cf-mle-testing/invoices/test_trigger_20250925_134718.pdf

‚úÖ File uploaded successfully!
‚è≥ Waiting for Lambda to process...
üìÇ Files in s3://cf-mle-testing/ade-results/
Found 5 files

‚úÖ Processing complete!
   Recent results:
   ‚Ä¢ ade-results/batch_extracted_20250924_195832.json
   ‚Ä¢ ade-results/batch_extracted_20250925_013951.json
   ‚Ä¢ ade-results/batch_extracted_20250925_014051.json


## 4. Monitor Lambda Invocations

Check Lambda statistics to see trigger activity.

In [5]:
# Get Lambda invocation stats
stats = get_lambda_invocation_stats(
    clients['logs'],
    FUNCTION_NAME,
    hours_back=1  # Last hour
)

if stats.get('total_invocations', 0) > 0:
    print(f"\nüìä Recent Activity (last hour):")
    print(f"   Invocations: {stats['total_invocations']}")
    print(f"   Success rate: {stats.get('success_rate', 0):.0f}%")
else:
    print("\n‚ÑπÔ∏è No recent invocations")
    print("   Upload a file to trigger processing")

üìä Lambda Invocation Statistics (last 1 hours)
   Total Invocations: 4

üìä Recent Activity (last hour):
   Invocations: 4
   Success rate: 100%


## 5. Check for Errors

In [6]:
# Check for recent errors
errors = get_error_logs(
    clients['logs'],
    FUNCTION_NAME,
    hours_back=1
)

if not errors:
    print("‚úÖ No errors in the last hour")
else:
    print(f"‚ö†Ô∏è Found {len(errors)} errors")
    print("   Check CloudWatch logs for details")

‚ùå Error Logs (last 1 hours)
‚úÖ No errors found
‚úÖ No errors in the last hour


## 6. Multiple Trigger Folders

Set up different folders for different processing modes.

In [7]:
# Configure multiple triggers for different document types
trigger_configs = [
    {"folder": "invoices/", "description": "Invoice extraction"},
    {"folder": "receipts/", "description": "Receipt extraction"},
    {"folder": "documents/", "description": "General parsing"}
]

print("üéØ Setting up multiple triggers:")
print()

for config in trigger_configs:
    folder = config['folder']
    desc = config['description']
    
    print(f"üìÅ {folder} - {desc}")
    
    # Note: In production, you'd modify the Lambda to detect
    # document type based on folder
    success = setup_s3_trigger(
        clients['s3'],
        clients['lambda'],
        BUCKET_NAME,
        FUNCTION_NAME,
        folder=folder
    )
    
    if success:
        print(f"   ‚úÖ Configured")
    else:
        print(f"   ‚ùå Failed")
    print()

print("üí° Note: Lambda will process based on folder:")
print("   ‚Ä¢ invoices/ ‚Üí Invoice extraction")
print("   ‚Ä¢ receipts/ ‚Üí Receipt extraction")
print("   ‚Ä¢ documents/ ‚Üí General parsing")

üéØ Setting up multiple triggers:

üìÅ invoices/ - Invoice extraction
‚ÑπÔ∏è  S3 permission already exists
‚úÖ S3 trigger configured!
   üì§ Upload PDFs to: s3://cf-mle-testing/invoices/
   ‚ö° They will auto-process with Lambda
   ‚úÖ Configured

üìÅ receipts/ - Receipt extraction
‚úÖ Added S3 permission to Lambda
‚úÖ S3 trigger configured!
   üì§ Upload PDFs to: s3://cf-mle-testing/receipts/
   ‚ö° They will auto-process with Lambda
   ‚úÖ Configured

üìÅ documents/ - General parsing
‚úÖ Added S3 permission to Lambda
‚úÖ S3 trigger configured!
   üì§ Upload PDFs to: s3://cf-mle-testing/documents/
   ‚ö° They will auto-process with Lambda
   ‚úÖ Configured

üí° Note: Lambda will process based on folder:
   ‚Ä¢ invoices/ ‚Üí Invoice extraction
   ‚Ä¢ receipts/ ‚Üí Receipt extraction
   ‚Ä¢ documents/ ‚Üí General parsing


## 7. Remove S3 Trigger

If needed, remove the S3 trigger configuration.

In [8]:
# Remove S3 trigger (uncomment to run)
"""
print("üóëÔ∏è Removing S3 trigger...")

try:
    # Clear bucket notifications
    clients['s3'].put_bucket_notification_configuration(
        Bucket=BUCKET_NAME,
        NotificationConfiguration={}
    )
    print("‚úÖ S3 trigger removed")
except Exception as e:
    print(f"‚ùå Error removing trigger: {e}")
"""

print("‚ÑπÔ∏è To remove triggers, uncomment the code above")

‚ÑπÔ∏è To remove triggers, uncomment the code above


## 8. Best Practices

### Folder Organization
```
s3://bucket/
‚îú‚îÄ‚îÄ invoices/        # Auto-process with invoice schema
‚îú‚îÄ‚îÄ receipts/        # Auto-process with receipt schema
‚îú‚îÄ‚îÄ documents/       # Auto-process with parsing mode
‚îú‚îÄ‚îÄ ade-results/     # Processing results
‚îî‚îÄ‚îÄ archive/         # Processed files (optional)
```

### Tips
1. **Use specific folders** for different document types
2. **Monitor CloudWatch** for processing logs
3. **Set up SNS** for processing notifications
4. **Archive processed files** to avoid reprocessing
5. **Use versioning** on S3 bucket for safety