-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
awsdocumentationImprovements or additions to documentationImprovements or additions to documentationphase-6
Description
Summary
Final production readiness: load testing, operational runbooks, security review, and go-live checklist.
Epic: #174
Architecture: docs/architecture/planned/aws-ecs-cdk.md
Tasks
Load Testing
Use the existing llm-proxy benchmark tool:
# Basic load test
llm-proxy benchmark \
--base-url "https://llm-proxy.example.com" \
--endpoint "/v1/chat/completions" \
--token "$PROXY_TOKEN" \
--requests 100 --concurrency 10 \
--json '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"test"}]}'
# Cache performance test
llm-proxy benchmark \
--base-url "https://llm-proxy.example.com" \
--endpoint "/v1/chat/completions" \
--token "$PROXY_TOKEN" \
--requests 50 --concurrency 10 \
--cache --cache-ttl 300 \
--debug \
--json '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"cached"}]}'- Test with expected production load (requests, concurrency)
- Test auto-scaling behavior (ramp up load, observe task count)
- Test cache hit performance (Redis)
- Identify and address bottlenecks
- Document baseline performance metrics (latency p50/p95/p99)
Security Review
- Review IAM policies (least privilege)
- Verify all secrets in Secrets Manager
- Verify encryption at rest (Aurora, Redis, logs)
- Verify encryption in transit (TLS everywhere)
- Review security group rules
- Check for exposed resources
Operational Runbooks
- Create runbook: Deployment process
- Create runbook: Scaling (manual and auto)
- Create runbook: Database operations (backup, restore)
- Create runbook: Incident response
- Create runbook: Log analysis
- Create runbook: Secret rotation
Documentation
- Update main README with AWS deployment info
- Document environment variables mapping
- Document monitoring and alerting
- Document cost breakdown and optimization tips
Go-Live Checklist
- All previous stories completed
- Load testing passed
- Security review passed
- Runbooks reviewed by team
- Alerting tested (trigger test alarm)
- Rollback procedure tested
- DNS cutover plan (if applicable)
Production Checklist
Infrastructure
- VPC and networking configured
- Aurora PostgreSQL running and accessible
- ElastiCache Redis running with TLS
- ECS services healthy
- ALB routing correctly
- CloudWatch dashboards populated
Security
- HTTPS only (HTTP redirects)
- Secrets in Secrets Manager
- No hardcoded credentials
- Security groups restrictive
- IAM roles scoped appropriately
Operations
- Alarms configured and tested
- Log retention set
- Backup policy verified
- Auto-scaling tested
- CI/CD pipeline tested
Acceptance Criteria
- Load test shows acceptable performance
- Security review has no critical findings
- All runbooks created and reviewed
- Go-live checklist fully complete
- Production deployment successful
Dependencies
- All previous stories (1-6)
Estimated Effort
Medium-Large - 3-4 days
Notes
- Use existing
llm-proxy benchmarktool (no k6 needed) - Consider phased rollout (internal users first)
- Have rollback plan ready
- Monitor closely for first 24-48 hours
Metadata
Metadata
Assignees
Labels
awsdocumentationImprovements or additions to documentationImprovements or additions to documentationphase-6