Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache CR Reconciliation #747

Closed
ryanemerson opened this issue Jan 18, 2021 · 12 comments
Closed

Cache CR Reconciliation #747

ryanemerson opened this issue Jan 18, 2021 · 12 comments
Labels
discussion Product architecture and enhancements discussion enhancement New feature or request

Comments

@ryanemerson
Copy link
Contributor

The Cache CR allows Infinispan caches to be created by other k8 resources, e.g. other operator. However in it's current form a Cache CR does not reflect the current status of a cache on the server:

  • A cache can be removed via a Infinispan client (CLI, Console, HotRod, Rest) and the CR will remain
  • Two-way mapping of caches and CRs requires polling of Infinispan state by the operator which does not scale as the number of caches increases

Proposal

Reconcile Cache CR state by consuming REST events ISPN-12606 and updating/creating Cache CRs. This allows CRs to be updated when CRUD operations are performed on a cache's state without polling.

Creation of new caches via any of the Infinispan clients will result in a corresponding Cache CR being created. Similarly, the removal of any caches will result in their CR being removed (or their status updated?).

Implementation

Deploy a stateless "config-reconciler" pod per Infinispan CR. This pod is responsible for consuming events from the Infinispan service and updating/creating a corresponding Cache CR. On startup/restart/failover, this pod must set includeCurrentState=true when connecting to the events endpoint to ensure that the latest state is reconciled.

Utilising an independent pod has the following advantages:

  • Consuming events directly in the operator does not scale in multi-namespace environments where multiple Infinispan clusters are deployed simultaneously (potentially wtih many caches per cluster).
  • It has no impact on horizontal scaling of clusters. Utilising a sidecar in the existing Infinispan pod's would require consensus amongst the pod's to determine who was responsible for reconciling configuration state.

Config Reconciler Pod

The config-reconciler should have as small a footprint as possible, so we should deploy this as a natively compiled executable based on a scratch image.

Reconciler per namespace

In order to reduce the total number of pods required by Infinispan, it could be possible to allow a "config-reconciler" pod to consume events from multiple Infinispan CRs in the same namespace. This could be configurable in the Infinispan CR. If enabled, it would be necessary for the config-reconciler pod spec to be updated and the pod restarted in order for the new cluster to be watched.

Future Work

We should extend the Operator and config-reconciler to support the following types:

  • templates
  • counters
  • schemas
  • tasks
@ryanemerson ryanemerson added discussion Product architecture and enhancements discussion enhancement New feature or request labels Jan 18, 2021
This was referenced Jan 18, 2021
@dmvolod
Copy link
Member

dmvolod commented Jan 18, 2021

The config-reconciler should have as small a footprint as possible, so we should deploy this as a natively compiled executable based on a scratch image.

@ryanemerson, can we utilize golang for this implementation?

@ryanemerson
Copy link
Contributor Author

Definitely. Imo it's probably the best way to go due to the following:

  • We can re-use the Cache CR structs used in the operator
  • The teams familiarity with the language
  • golang k8s apis

@dmvolod
Copy link
Member

dmvolod commented Jan 21, 2021

Also we should define separate container build and run strategy: place code for config-reconciler pod in the same image as operator and run it depends of the env var setup or create a separate image.

@rgordill
Copy link

How about the Infinispan operator moves to a meta-operator (one that deploys other operators), and in case a new cluster is deployed, the "config-reconciler" is implemented as a cache-operator?

Strimzi has a similar approach, and it has 3 operators: cluster (meta), topic and user.

@tristantarrant
Copy link
Member

tristantarrant commented Jan 21, 2021 via email

@dmvolod
Copy link
Member

dmvolod commented Jan 21, 2021

I think the reason why this implemented, that's Strimzi operator was written with Java before Quarkus was realised and consumers much more memory and resources than golang implementation.

@rgordill
Copy link

With this approach, you have:

  1. An specialised operator for an object (in our case, cache), that can listen to CRs of that kind and updates on the backend, so it can maintain the consistency for those objects both directions.

  2. A filtering for each cluster-related CRs, so the infinispan cluster operator only needs to handle the cluster lifecycle, and operators per-cluster handle objects on this particular cluster.

AFAIK, it was not related with memory and resources, but maybe @scholzj or @ppatierno can help on the reasons for strimzi.

@scholzj
Copy link

scholzj commented Jan 21, 2021

I think the reason why this implemented, that's Strimzi operator was written with Java before Quarkus was realised and consumers much more memory and resources than golang implementation.

That was absolutely not the case and it had nothing to do with this decision.

The Topic and User operators have ultimately different scope then the cluster operator. They are bound to the Kafka cluster where they operate and their lifecycle. Without the Kafka cluster, the have not much to do. Having them as separate operator and deploying them as part of the Kafka cluster it self simplifies the code a bit. It also makes them much easier to use with Kafka clusters run by something else then Strimzi - this was not planned, but turned out to be fairly popular as well.

That said, it has also some disadvantages - it is probably a bit harder to understand by users. You need to explain to them that they have to configure the additional operators etc. Also, in our case, we use label to tell in which cluster should the user or topic be created. And if the CR is created with a wrong label, nobody sees it and nothing happens ... which again - is a bit confusing to the user.

Overall ... I don't think this was a bad decision which we regret or which is causing problems. But at the same time, not sure it gives us some huge advantages. When we later implemented the Connector operator for Kafka Connect, we did not follow the same approach and integrated it into the main operator.

The bidirectional feature mentioned by Ramon is used only in our Topic Operator. We did this because there are too many applications creating topics directly in Kafka and if it worked only as traditional operator I do not think it would have had too much value and use. So we implemented the bidirectional sync. But it is quite complicated and brings a lot of issues. So I would recommend to avoid it unless you are in the same situation and really need it.

@ryanemerson
Copy link
Contributor Author

Thanks for the insights @rgordill and @scholzj.

It also makes them much easier to use with Kafka clusters run by something else then Strimzi - this was not planned, but turned out to be fairly popular as well.

Very interesting! A big win for us from the support perspective is how much the operator unifies cluster deployment and removes many of the variables. So while this flexibility sounds good for some users, in our case I think it could potentially cause more problems.

But it is quite complicated and brings a lot of issues. So I would recommend to avoid it unless you are in the same situation and really need it.

This was our original intention 🙂, however there is a lot of demand for this feature.

Having an operator per resource is an interesting idea, however I have reservations about the following:

  • The number of operators required as the number of resources increase. The Cache CR is the first of several resources we plan to add in the future Additonal Container CRs #748.

  • Scalability implications with global operators. OLM is moving away from per-namespace installs, so this would mean that a Cache operator has to be deployed globally and manage the bidirectional sync of all caches in the cluster. As the number of clusters increased we could scale the number of operator pods, however we would then also have todo this for other resources that have their own operator. The "config-reconciler" pod approach does not have this limitation, as it's a single pod per cluster for all resources.

  • Added complexity for the user as they have to install multiple operators.

  • It's expected that some resources, e.g. tasks, will be updated infrequently. So having dedicated operator pod(s) purely for this task is overkill.

@scholzj
Copy link

scholzj commented Jan 26, 2021

Just some clarifications ... not necessarily trying to convince you to do it one way or another.

This was our original intention 🙂, however there is a lot of demand for this feature.

Tough decision. I'm not sure how exactly Infinispan is being used and how it works. Just count with it being harder than it might seem on a first look :-)

The number of operators required as the number of resources increase. The Cache CR is the first of several resources we plan to add in the future #748.

This is not how it works with Strimzi either. We have about 9 custom resource types. Only two of them are treated by the separate operator. The rest is done all by the main operator.

Scalability implications with global operators. OLM is moving away from per-namespace installs, so this would mean that a Cache operator has to be deployed globally and manage the bidirectional sync of all caches in the cluster. As the number of clusters increased we could scale the number of operator pods, however we would then also have todo this for other resources that have their own operator. The "config-reconciler" pod approach does not have this limitation, as it's a single pod per cluster for all resources.

Added complexity for the user as they have to install multiple operators.

Again, it works a bit differently in Strimzi. There is only one OLM deployment - for the main Strimzi operator whihc manages the clusters (Kafka cluster, Kafka Connect cluster, Kafka Mirror Maker cluster etc.). It installs all the CRD including the user and topic CRDs. The topic and user operators are then (optionally) deployed as part of the Kafka cluster. So it just creates another deployment. It does not deploy another OLM operator or anything.

@ryanemerson
Copy link
Contributor Author

ryanemerson commented Jan 26, 2021

The number of operators required as the number of resources increase. The Cache CR is the first of several resources we plan to add in the future #748.

This is not how it works with Strimzi either. We have about 9 custom resource types. Only two of them are treated by the separate operator. The rest is done all by the main operator.

It wasn't clear from my original message, but all of the above resources that I referenced would require bidirectional sync. I didn't think that it would need to be one operator per every CR, e.g. Backup/Restore CRs we already have, just those that require bidirectional sync.

My original reservation about 1 operator per bidirectional resource was miss-placed anyway, as we could have the cluster operator and then a bidrectional operator (naming things is hard) that manages all of the bidirectional resources.

Again, it works a bit differently in Strimzi. There is only one OLM deployment - for the main Strimzi operator whihc manages the clusters (Kafka cluster, Kafka Connect cluster, Kafka Mirror Maker cluster etc.). It installs all the CRD including the user and topic CRDs. The topic and user operators are then (optionally) deployed as part of the Kafka cluster. So it just creates another deployment. It does not deploy another OLM operator or anything.

Thanks for clarifying, this makes a lot more sense now.

So assuming we took a similar approach we could have a Cluster Operator (Infinispan, Backup, Restore CRs) and Resources Operator (Caches, Templates, Counters, Scripts). OLM installation would be the same as now.

The Resources operator is then deployed per Infinispan CR [1]. The purpose of this operator would be to bidirectionally reconcile the various CRs by consuming events from the cluster it's associated with and updating/creating the respective CRs. This is actually similar to the "config-reconciler" pod idea I initially described as it addresses scalability concerns by being per cluster, except it also provides the benefits of providing additional separation of concerns between the Cluster and resource objects.

[1] Should be configurable in the Infinispan CR if no resource CRs are required to avoid the overhead of an additional pod.

@scholzj
Copy link

scholzj commented Jan 26, 2021

The Resources operator is then deployed per Infinispan CR [1]. The purpose of this operator would be to bidirectionally reconcile the various CRs by consuming events from the cluster it's associated with and updating/creating the respective CRs. This is actually similar to the "config-reconciler" pod idea I initially described as it addresses scalability concerns by being per cluster, except it also provides the benefits of providing additional separation of concerns between the Cluster and resource objects.

In our case, the fact that one of the separate operators is bidirectional is just a coincidence. That has nothing to do with it being separate. The bidirectional nature will be PITA regardless :-/.

Should be configurable in the Infinispan CR if no resource CRs are required to avoid the overhead of an additional pod.

Yeah, that is how we have it as well. Some users just do not want to use it and prefer other tooling. So it can be enabled / disabled easily.

ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 5, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 8, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 8, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 8, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 8, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 9, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 10, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 10, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 11, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 11, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 11, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 15, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 16, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 16, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 17, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 17, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 17, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 19, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 19, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 19, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 19, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 22, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Nov 26, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 1, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 2, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 2, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 3, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 6, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 6, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 7, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 22, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 23, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
ryanemerson added a commit to ryanemerson/infinispan-operator that referenced this issue Dec 23, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
rigazilla pushed a commit that referenced this issue Dec 23, 2021
- ConfigListener Deployment added to consume server-side cache lifecycle
  events and create/update/delete corresponding k8s Cache CR
- Server caches now removed on Cache CR deletion
- Cache CR spec.Template can be updated and changes are reflected on the
  server if runtime update of the configuration is possible.
oraNod added a commit to oraNod/infinispan-operator that referenced this issue Mar 24, 2022
oraNod added a commit to oraNod/infinispan-operator that referenced this issue Mar 24, 2022
ryanemerson pushed a commit that referenced this issue Mar 25, 2022
oraNod added a commit to oraNod/infinispan-operator that referenced this issue Mar 25, 2022
ryanemerson pushed a commit that referenced this issue Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Product architecture and enhancements discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants