-
Notifications
You must be signed in to change notification settings - Fork 156
Add proposal for operator-managed Valkey integration #1770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This proposal introduces a design for integrating Valkey (Redis-compatible) distributed session storage into the ToolHive operator. The design focuses on simplicity and security by providing automatic, secure-by-default configuration. Key features: - Separate SessionStorage CRD to keep MCPServer CRD focused - Zero-configuration deployment with simple size presets (small/medium/large) - Automatic security configuration (auth, network policies, TLS) - Seamless integration with existing MCPServer resources - Support for both operator-managed and external Valkey instances The design enables horizontal scaling and resilience by externalizing session state from proxy pods, allowing them to scale elastically and restart without losing user sessions. This transforms the ToolHive proxy layer into a truly stateless, cloud-native system. Implementation follows a phased approach starting with core CRD and controller, then adding MCPServer integration, and finally production features like monitoring and backups.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1770 +/- ##
==========================================
+ Coverage 40.56% 40.60% +0.04%
==========================================
Files 184 184
Lines 21380 21380
==========================================
+ Hits 8672 8682 +10
+ Misses 12056 12040 -16
- Partials 652 658 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| A: Yes, it's in the secret `{storage-name}-auth` but you don't need it - MCPServers use it automatically. | ||
|
|
||
| **Q: What happens if a Valkey pod crashes?** | ||
| A: For medium/large sizes, data is persisted and will be restored. For small (dev), data is ephemeral. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes are needed in the application to handle the possibility that state can potentially be destroyed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, in today's state of things, we always have this possibility. If something happens to the proxy runner pod, then the session is lost and it needs to recreate it. This gives us the possibility of surviving such scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, these are the changes that build to that #1771 . They're in several commits. So we don't need to merge that big chunk and it can be split.
yrobla
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have concerns about adding an external dependency there. Mostly related to maintenance, size of deployment, what happens with data if there is some connection problem, etc... is it really needed to add valkey there, or can we have in simpler approaches?
|
@yrobla this is the lightest I could think of to have session persistence across scale-ups and restarts. The alternative is to build it in-process but that would be more complicated than this tbh. |
This PR adds a design proposal for integrating Valkey (Redis-compatible) distributed session storage into the ToolHive operator.
The proposal introduces a new SessionStorage CRD that enables automatic deployment and management of Valkey instances for distributed session storage. This allows ToolHive proxy pods to scale horizontally while maintaining session state.
Key features:
The design enables proxy pods to become truly stateless, supporting elastic scaling, rolling updates, and improved resilience.