# Question 7 - OQ – How does the code handle corrupted image files?

To ensure robustness, the analysis pipeline should handle corrupted or unreadable image files.

For this, we can wrap the image loading logic in a `try/except` block inside the `analyze_batch()` function:

```python
try:
    img = np.load(filepath)
except Exception as e:
    print(f"⚠️ Skipping corrupted file {filename}: {e}")
    continue

# Question 10 - OQ – AWS Design of Deployment

To deploy this solution in a scalable and serverless way, I propose the following architecture on AWS:

# AWS Components

- **S3** to store `.npy` images (batches of 20)
- **S3 Event Notifications + SQS** to track when new images are uploaded
- **Lambda or Step Functions** to check if 100 images are present
- **AWS Batch** to run the ETL logic in a container
- **RDS (PostgreSQL)** to store summary statistics
- **SNS** to notify users when processing is complete

# Event Flow

1. Images are uploaded to S3 (`raw/batch_x/img_y.npy`)
2. S3 triggers events → SQS
3. Lambda monitors the SQS or counts objects in S3
4. When 100 images are present:
   - Triggers an AWS Batch job (or Lambda Step Function)
5. Job runs ETL logic (similar to current Docker setup)
6. Results are stored in PostgreSQL
7. Notification sent to user via SNS

# Database Schema

Table: `batch_results`

| Field        | Type        

| id           | UUID / PK   
| batch_id     | TEXT        
| white_avg    | FLOAT       
| white_std    | FLOAT       
| white_min    | INT         
| white_max    | INT         
| black_avg    | FLOAT       
| black_std    | FLOAT       
| black_min    | INT         
| black_max    | INT         
| processed_at | TIMESTAMP   