Skip to content

Conversation

@alexluong
Copy link
Collaborator

Summary

Adds deployment_id configuration to enable multiple Outpost deployments to share Redis infrastructure. When DEPLOYMENT_ID is set, all Redis keys are prefixed with deployment:{id}:,
ensuring tenants across different deployments never conflict even with identical IDs.

Key Design

Deployment-scoped keys (prefixed when DEPLOYMENT_ID set):

  • Entity store: tenant:{TENANT_ID}:*, destination:{DESTINATION_ID}:*
  • Alert store: alert:{DESTINATION_ID}:consecutive_failures
  • RSMQ queues: rsmq:QUEUES, rsmq:deliverymq-retry:*

Global keys (never prefixed):

  • Installation metadata: outpostrc
  • Infrastructure locks: outpost:lock
  • Migration state: outpost:migration:*, .outpost:migration:lock

Pattern: deployment:dp_001:tenant:{ID}:* when set, tenant:{ID}:* when empty

Implementation

  1. Added config field - DeploymentID
  2. Scoped entity store - Tenant and destination keys prefixed via deploymentPrefix() method
  3. Scoped alert store - Failure counter keys prefixed via deploymentPrefix() method
  4. Scoped RSMQ via namespace - Retry scheduler creates RSMQ client with deployment-prefixed namespace instead of prefixing individual queue names
  5. Wired through services - Passed cfg.DeploymentID to entity store, alert monitor, and retry scheduler
  6. Updated migration docs - Documented that deployments with DEPLOYMENT_ID can skip hash tag migration
  7. Added E2E test - Full test suite runs with deployment_id: "dp_e2e_test" to verify no regression

@vercel
Copy link

vercel bot commented Oct 7, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
outpost-docs Ready Ready Preview Comment Oct 7, 2025 1:37pm
outpost-website Ready Ready Preview Comment Oct 7, 2025 1:37pm

@claude
Copy link

claude bot commented Oct 7, 2025

PR Review - Deployment ID Support for Multi-Deployment

This PR implements a clean solution for enabling multiple Outpost deployments to share Redis infrastructure. Overall well-designed and tested.

Strengths

  • Excellent design consistency with deploymentPrefix() helper pattern
  • Strong test coverage including E2E and isolation tests
  • Full backward compatibility when deploymentID is empty
  • Clear documentation of key patterns and migration impact

Critical Issues

1. Panic on Redis Connection Failure (HIGH PRIORITY)
Location: internal/deliverymq/retry.go:20

The panic() call will crash the application if Redis is temporarily unavailable during startup. This should return an error instead for graceful handling.

2. Missing Input Validation (MEDIUM PRIORITY)
Location: internal/config/config.go:55

No validation exists for DeploymentID format. Special characters like colons, braces, or asterisks could interfere with Redis key patterns or hash tags. Recommend adding validation for alphanumeric, hyphens, and underscores only with reasonable length limits.

Minor Suggestions

  • deploymentPrefix() method is duplicated across stores - consider extracting to shared utility
  • Document TESTINFRA variable usage in Makefile
  • Add logging of deployment ID on startup for debugging

Performance & Security

Performance: Efficient implementation with no extra Redis round trips. Key prefix concatenation is minimal overhead.

Security: Good isolation between deployments. Main concern is lack of input validation allowing potential special character injection.

Test Coverage

Excellent coverage with E2E tests, isolation tests, and backward compatibility tests. Consider adding tests for invalid deployment ID formats once validation is added.

Recommendation

Approve after fixing the panic issue and adding deployment ID validation. The remaining suggestions can be addressed in follow-up PRs.

Great work on this feature!

@claude
Copy link

claude bot commented Oct 7, 2025

PR Review: Add deployment_id Support for Multi-Deployment

Summary

This PR successfully implements deployment-scoped Redis keys to enable multiple Outpost deployments to share the same Redis infrastructure while maintaining complete data isolation. The implementation is well-designed, thoroughly tested, and follows good practices.

✅ Strengths

1. Excellent Architecture

  • Clean separation between deployment-scoped and global keys (installation metadata, locks, migrations remain global)
  • Consistent key prefixing pattern: deployment:{id}:tenant:{tenant_id}:*
  • Hash tags properly preserved for Redis Cluster compatibility: {tenant_id}

2. Comprehensive Test Coverage

  • E2E test with deployment_id: TestBasicSuiteWithDeploymentID (cmd/e2e/suites_test.go:210)
  • Entity store isolation tests at both unit and integration levels (internal/models/entity_test.go:23-26)
  • Alert store isolation tests verify complete separation (internal/alert/store_test.go:81-124)
  • Test infrastructure properly parameterized with deployment_id option

3. Proper Error Handling

  • NewRetryScheduler now returns error for Redis client initialization failures (internal/deliverymq/retry.go:15)
  • Config validation with clear error messages (internal/config/validation.go:149-169)
  • All call sites properly handle the new error return

4. Good Documentation

  • Clear migration guide noting deployment_id users can skip hash tag migration
  • Inline comments explaining key formats and design decisions
  • Configuration documentation updated

🔍 Observations & Suggestions

1. Minor: Alert Store Parameter Inconsistency
The alert.Monitor uses WithDeploymentID option pattern, but NewRedisAlertStore takes deploymentID as a direct parameter.

// alert/monitor.go:130
alertMonitor.store = NewRedisAlertStore(redisClient, alertMonitor.deploymentID)

// vs. entity store pattern:
entityStore := models.NewEntityStore(redisClient,
    models.WithDeploymentID(cfg.DeploymentID),
)

Suggestion: Consider making NewRedisAlertStore use options pattern for consistency, though this is a minor style issue and not blocking.

2. Config Validation: Character Restrictions
The deployment_id validation restricts to [a-zA-Z0-9_-] with max 64 chars (internal/config/validation.go:160).

Question: Have you considered whether this needs to allow additional characters (dots, colons) for compatibility with existing deployment naming schemes? The current restrictions seem reasonable, but worth confirming with your deployment workflows.

3. RSMQ Namespace Approach
The PR uses RSMQ namespace prefixing for the retry queue rather than individual key prefixing:

// internal/deliverymq/retry.go:29-32
namespace := "rsmq"
if deploymentID != "" {
    namespace = fmt.Sprintf("deployment:%s:rsmq", deploymentID)
}

This is clean and avoids touching queue name logic. ✅

4. Test Infrastructure Enhancement
Nice addition of RUN makefile variable for selective test execution (Makefile:5):

RUN?=

This will improve developer workflow.

🛡️ Security Considerations

  • ✅ Deployment isolation properly enforced at the storage layer
  • ✅ No credential handling changes
  • ✅ No new attack surface introduced
  • ✅ Validation prevents injection via deployment_id (alphanumeric + hyphens/underscores only)

⚡ Performance Considerations

  • ✅ Key prefix addition is minimal overhead (string concatenation)
  • ✅ Redis Cluster hash tags preserved for proper slot routing
  • ✅ No additional Redis round trips introduced
  • ✅ Transaction boundaries remain the same (same-tenant keys still in same slots)

🐛 Potential Issues

None identified. The implementation is solid.

📝 Minor Nits

  1. Formatting inconsistency (internal/app/app.go:202-207):
// MIGRATION LOCK BEHAVIOR:
//   1. One node acquires the lock...
//   2. Other nodes fail with lock errors...

The indentation changed from spaces to numbered list indentation. While this improves readability, it appears to be unrelated formatting cleanup in the diff.

  1. Test utility naming: redis.NewClientredis.New (cmd/e2e/configs/basic.go:137)
    This is a good simplification but appears throughout the codebase. Ensure all references are updated.

✅ Test Coverage Assessment

Excellent coverage:

  • Unit tests for entity store with/without deployment_id
  • Unit tests for alert store isolation between deployments
  • Integration tests for RSMQ retry scheduler
  • E2E test running full suite with deployment_id
  • Config validation tests covering edge cases

🎯 Recommendations

Before Merge:

  1. ✅ All tests passing (verified in PR)
  2. ✅ Documentation complete
  3. ✅ No breaking changes for existing deployments (empty string = no prefix)

Post-Merge:

  1. Consider documenting deployment_id naming conventions in production deployment guides
  2. Monitor Redis key space growth in multi-tenant deployments
  3. Consider adding observability/metrics tagged with deployment_id

Final Verdict: ✅ APPROVE

This is a well-crafted PR that solves a real multi-tenancy problem with a clean, testable implementation. The code quality is high, test coverage is comprehensive, and the approach is sound for both standalone and Redis Cluster deployments.

Great work! 🎉


Review generated with Claude Code

@alexbouchardd
Copy link
Contributor

To save precious bytes I'd tend to leave out the deployment: prefix and simply prefix with the deployment id itself

@alexluong
Copy link
Collaborator Author

@alexbouchardd updated

@claude
Copy link

claude bot commented Oct 7, 2025

Pull Request Review: Multi-Deployment Support via deployment_id

This PR adds deployment ID support to enable multiple Outpost deployments to share Redis infrastructure while maintaining data isolation. Overall, this is a well-designed and thoroughly implemented feature with excellent test coverage. Here's my detailed review:


Strengths

1. Excellent Architecture & Design

  • Clean abstraction: The deploymentPrefix() helper method provides a single source of truth for prefixing logic across all stores
  • Backward compatible: When DEPLOYMENT_ID is empty, behavior is unchanged (no prefix)
  • Redis Cluster compatible: Keys maintain hash tags {tenant_id} ensuring proper slot allocation
  • Correct scoping: Global keys (migrations, locks, installation metadata) properly excluded from deployment prefixing

2. Comprehensive Test Coverage

  • Parallel test suites: Both with and without deployment IDs (TestEntityStore_WithoutDeploymentID, TestEntityStore_WithDeploymentID)
  • Isolation tests: TestAlertStoreIsolation verifies complete isolation between deployments
  • E2E validation: Full test suite runs with deployment_id: "dp_e2e_test"
  • Edge cases covered: Encryption, max destinations, deployment ID format validation

3. Strong Security Practices

  • Input validation: Deployment ID restricted to alphanumeric, hyphens, underscores (regex: ^[a-zA-Z0-9_-]+$)
  • Length limits: 64 character maximum prevents excessive key sizes
  • No injection risks: Validation prevents Redis key injection attacks via special characters (:, *, {, })

4. Excellent Documentation

  • Migration guide updated with clear guidance on when migration is/isn't needed
  • Configuration docs include the new parameter
  • Code comments explain key format patterns

🔍 Issues & Recommendations

Critical: Error Handling in Service Initialization (internal/services/api/api.go:133, internal/services/delivery/delivery.go:117)

The retry scheduler creation now returns an error but it's not being handled:

// Current (problematic):
deliverymqRetryScheduler := deliverymq.NewRetryScheduler(deliveryMQ, cfg.Redis.ToConfig(), logger)

// Should be:
deliverymqRetryScheduler, err := deliverymq.NewRetryScheduler(deliveryMQ, cfg.Redis.ToConfig(), cfg.DeploymentID, logger)
if err != nil {
    logger.Error("retry scheduler initialization failed", zap.Error(err))
    return nil, err
}

Impact: If Redis connection fails during retry scheduler setup, the application will panic instead of gracefully handling the error.

Location:

  • internal/services/api/api.go:133
  • internal/services/delivery/delivery.go:117

Minor: Potential Redis Client Duplication (internal/deliverymq/retry.go:15-21)

The NewRetryScheduler function creates a new Redis client, but the caller (services) already has a Redis client. This creates duplicate connections.

Recommendation: Consider accepting a redis.Cmdable interface instead of *redis.RedisConfig to reuse existing connections:

// Suggested improvement:
func NewRetryScheduler(deliverymq *DeliveryMQ, redisClient redis.Cmdable, deploymentID string, logger *logging.Logger) (scheduler.Scheduler, error) {
    adapter := rsmq.NewRedisAdapter(redisClient)
    // ... rest of implementation
}

Impact: Minor performance overhead (extra connection pool), but not a blocker.


Minor: Inconsistent Parameter Order (internal/deliverymq/retry.go:15)

Function signature has deploymentID between redisConfig and logger:

func NewRetryScheduler(deliverymq *DeliveryMQ, redisConfig *redis.RedisConfig, deploymentID string, logger *logging.Logger)

Recommendation: Move deploymentID to the end or use functional options pattern for consistency with other constructors:

func NewRetryScheduler(deliverymq *DeliveryMQ, redisConfig *redis.RedisConfig, logger *logging.Logger, opts ...SchedulerOption)

Code Quality: Test Organization (internal/models/entity_test.go, internal/models/entitysuite_test.go)

The test suite refactor (moving from entity_test.go to entitysuite_test.go) is excellent, but the diff shows -697 lines in entity_test.go and +637 lines in entitysuite_test.go.

Question: Were any tests removed or is this purely a refactor? Consider adding a comment in the PR description confirming no test coverage was lost.


🚀 Performance Considerations

  1. Key Size Impact: Deployment IDs add prefix overhead (e.g., dp_production_001: = 18 bytes). With 64-char max, this is reasonable.
  2. Redis Cluster: Hash tags properly maintained, ensuring optimal slot distribution.
  3. No N+1 queries: Pipelining preserved in entity store operations.
  4. RSMQ Namespace: Deployment prefix correctly applied to RSMQ namespace, preventing queue name collisions.

Verdict: ✅ No performance concerns


🔒 Security Assessment

  1. Input validation: ✅ Properly restricted character set and length
  2. Injection protection: ✅ Special Redis characters (*, {, }, :) properly blocked
  3. Credential handling: ✅ Encryption unchanged, still properly handled
  4. Access control: ✅ Deployment isolation properly enforced at data layer

Verdict: ✅ No security concerns


📋 Summary & Recommendation

Overall Assessment: APPROVE with minor fixes required

This is a well-executed feature with:

  • ✅ Excellent test coverage (parallel suites, isolation tests, E2E)
  • ✅ Clean architecture (deployment prefix abstraction)
  • ✅ Strong security (input validation, injection protection)
  • ✅ Good documentation (migration guide, config docs)

Required before merge:

  1. ❗ Fix error handling in api.go and delivery.go for retry scheduler initialization

Nice to have:
2. Consider reusing Redis client in NewRetryScheduler instead of creating new one
3. Consider consistent parameter ordering or functional options pattern

Testing Recommendation:
Run the E2E suite with Redis Cluster configuration to validate deployment ID behavior in clustered environment:

make test/e2e/rediscluster

Great work on this feature! The multi-deployment support is implemented thoughtfully with proper isolation guarantees.


Review Details:

  • Code quality: ⭐⭐⭐⭐⭐ (5/5)
  • Test coverage: ⭐⭐⭐⭐⭐ (5/5)
  • Documentation: ⭐⭐⭐⭐⭐ (5/5)
  • Security: ⭐⭐⭐⭐⭐ (5/5)
  • Performance: ⭐⭐⭐⭐⭐ (5/5)

@alexluong alexluong merged commit 2b89ab3 into main Oct 7, 2025
5 checks passed
@alexluong alexluong deleted the deployment-redis branch October 7, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants