Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not remove PVC when removing Elasticsearch resource with Operator #2328

Closed
andrii-litvinov opened this issue Dec 25, 2019 · 20 comments · Fixed by #4050
Closed

Do not remove PVC when removing Elasticsearch resource with Operator #2328

andrii-litvinov opened this issue Dec 25, 2019 · 20 comments · Fixed by #4050
Assignees
Labels
>enhancement Enhancement of existing functionality v1.5.0

Comments

@andrii-litvinov
Copy link

Relates to #1288. When removing Elasticsearch I don't necessary want to remove PVC, because it will lead to a data loss.

Expected behavior from k8s is to keep PVCs:

Deleting the Pods in a StatefulSet will not delete the associated volumes. This is to ensure that you have the chance to copy data off the volume before deleting it. Deleting the PVC after the pods have left the terminating state might trigger deletion of the backing Persistent Volumes depending on the storage class and reclaim policy. You should never assume ability to access a volume after claim deletion.

It may be convenient for development, but in production there will be scenarios when users would want to remove the Elasticsearch and reinstall it agaian. Maybe there will be issues with upgrading the Opeartor or other Elasticsearch resources to new version. I am working with MongoDB Operator and there are plenty of scenarios when I would want to keep PVCs, for example when I convert sharded cluster to a replica set. And it is also expected behavior for other k8s users who worked with StatfulSets before.

Here is an open issue with the discussion in k8s repository #55045. I would suggest leaving it up to a StatefulSet or at least introduce a flag whether to remove the PVCs or not. Or maybe make a decision about removing PVCs depending on there is it is scale-down or removal of a ReplicaSet.

@anyasabo
Copy link
Contributor

Since the reclaim policy is decided by the storage class, and you can specify the storage class for the PVCs, it is possible to retain volumes on scale down or delete. Can you go into more details on use cases when this is not sufficient?

There's an upstream issue here for allowing ssets to delete PVCs as well (also linked in the original issue you mentioned): kubernetes/kubernetes#55045 (comment)

@andrii-litvinov
Copy link
Author

In the documentation of StatefulSet they explicitly mention that one shouldn't rely on volumes:

You should never assume ability to access a volume after claim deletion.

When PVC gets removed and new PVC is created with same name it gets bound to a new volume, not the existing one. So in this case I cannot have Elasticsearch attached to existing data automatically. I use https://github.com/rancher/local-path-provisioner.

@barkbay
Copy link
Contributor

barkbay commented Dec 26, 2019

Using StatefulSet is an implementation detail, it has been different in the past, it could be different in the future. Deleting the volume is done on purpose because it is not a good practice to reuse an old volume when scaling down/up an Elasticsearch cluster.

I agree that it is hard to reuse a PV, even if the reclaim policy is set to Retain, but I'm not sure to understand your use cases, could you give more details about what you have in mind ? If any of them makes sense then they probably should be solved by the operator itself, not as some side effects of an implementation choice.

@andrii-litvinov
Copy link
Author

I don't have any real examples with elastic operator yet, because I use it for only couple of days so far. I shared an example with mongodb operator when I wanted to convert sharded cluster to replica set (or vice versa) and preserve the data. Knowing that there is a replica set under the hood I expected same standard behavior from elastic operator as well when deleted the resource in my dev environment.

Even though StatefulSet is an implementation detail, I would expect service deployed as a StatefulSet to deal with PVCs in standard way.

I do agree that it is up to operator to decide, but if such decision deviates from expected default behavior of underlying building blocks and leads to a data loss (in case if resource removal), I would expect there to be a configuration option to chose such behavior.

@andrii-litvinov
Copy link
Author

andrii-litvinov commented Dec 26, 2019

Sorry, accidentally closed the issue by pressing wrong button. One other reason can be to look from perspective on running Elasticsearch without any containers and orchestrators. As I cluster admin I don't expect to have my data deleted if I remove the service from the host operating system. It is up to me to decide when to delete the data or what to do with the data further on.

@anyasabo
Copy link
Contributor

@andrii-litvinov thanks for bringing that up. I can see how it would be useful for scaling down to 0 replicas. At this time that is not a use case we support though (and in fact explicitly prevent). If we do implement that then we will also have to revisit this discussion for PVCs, as well as the overall downscale logic.

@andrii-litvinov
Copy link
Author

Well, it means that in case of some issue/bug or unexpected behavior I won't have an option to delete the Elasticsearch resource and recreate it without data loss or special handling, which is unfortunate. I know that it's an extreme scenario, but how will operator handle the case when someone accidentally removed the PVC of the first replica in StatefulSet? SatefulSet itself is not capable of recreating of PVC. Will Operator be able to handle such scenarios?

@anyasabo
Copy link
Contributor

The recommended path for deleting and recreating elasticsearch resources is using snapshots. The sset operator recreates PVCs, though not perfectly

@andrii-litvinov
Copy link
Author

Using snapshots means copying terabytes of data, which is not instant. It means that in case if upgrade of operator 1.0-beta to newer version will require deletion and recreation of resources, some significant downtime will be required to restore the snapshot, which is not something desirable in production environment. I guess that's also not good enough argument. Well, let's see if others will come up with reasons not to delete PVCs. Maybe that's not a concern at all.

@andrii-litvinov
Copy link
Author

From this issue #2308 (comment) there is one more case when removing and recreating Elasticsearch resource can be useful.

We could document that if a user has deployed previous Elasticsearch resources then these resources must be deleted before upgrading if they are running on an impacted K8S version.

So keeping PVCs when removing StatefulSets will make resolution of such issues way easier instead of dealing with snapshots and trying to keep volumes and manually re-creating PVCs before SatefulSet creation.

@jgoeres
Copy link

jgoeres commented Feb 7, 2020

I just ran into the very same issue described here.
In our stack, we have multiple stateful services that we handle with statefulsets and volume claim templates, one of these services is Elasticsearch. So far, we had deployed ES with home-built statefulsets (as we did and still do for all other stateful services).
The feature of statefulsets not deleting the PVCs allocated via volume claim templates has proven to be very useful, as it separates the life time of data from the live time of the pods. We often used this to completely remove an installation from the cluster but keep the PVCs around to later reinstall the same installation and retain all its data.
Now I am investigating to replace our ES statefulsets with the ES operator. So far, the operator looks very promising (e.g., scale up and in particular scale down works without the cluster even turning yellow, which is a pain to do manually). Alas, the PVC deletion "feature" might spoil using the operator for us.
So I would like to second the request of @andrii-litvinov and request @anyasabo to reopen this issue.
Could you not create an option to allow preserving and re-using PVCs? I guess that deleting the Elasticsearch cluster (where I would want the PVCs to be preserved) and scaling it down (where after successful shard reallocation I do not need the PVCs of the deleted nodes anymore and probably would want them to be deleted) are currently very similar operations, so it might not be as straightforward as it seems.

@sebgl
Copy link
Contributor

sebgl commented Feb 7, 2020

Thanks folks for raising your concerns here. I think it's worth reopening and discussing options.

As you point out @jgoeres, there are 2 different mechanisms in place to remove volumes:

  1. We set an ownerReference on the StatefulSets volumeClaimTemplate to match the corresponding Elasticsearch resource. Which means that when that Elasticsearch resource is deleted, all PVCs are deleted automatically by Kubernetes controllers (not by ECK).
  2. When the cluster is downscaled (eg. from 5 to 3 nodes), we garbage collect PVCs that are not used by any Pods. This is directly controlled by ECK.

We can technically add new optional fields to the Elasticsearch CRD, such as:

preservePvcOnDeletion: true # default to false, covers 1)
preservePvcOnDownscale: true # defaults to false, covers 2)

If I understand your use case correctly @jgoeres, you would like to use something similar to the first option. Is that correct?

The volumeClaimTemplate section of StatefulSets is immutable, which means once we set an ownerReference there, we cannot remove it, and the other way around. Which means we need to make those settings immutable in the Elasticsearch spec. This is hard to achieve in Kubernetes. We can check this with the validation webhook, but this webhook may be disabled.

There are corner cases to consider when PVCs are kept around:

  • deleting then creating a cluster with the same name will lead to reusing the volumes, hence reusing the cluster metadata. If the old cluster spec does not match the new cluster spec (version, number of master nodes, etc.), the new cluster will not start correctly, and potentially corrupt the existing data.
  • a node may start with some existing (old) data that is not supposed to exist anymore in the cluster. I think this is an undefined behaviour in Elasticsearch, the existing data will probably not be cleaned up.

@sebgl sebgl reopened this Feb 7, 2020
@jgoeres
Copy link

jgoeres commented Feb 7, 2020

@sebgl
Thanks for reopening this. Yes, the "preservePvcOnDeletion: true" option would be what I need.
Since I already make use of the PVC preservation behavior of statefulsets as described, I am aware of the first corner case you describe - we have a section in the documentation of our helm chart that explicitly states that when one wishes to reuse existing data via the leftover PVCs, configuration changes to the stateful services are not allowed for this to work. So I personally am perfectly fine with this.
Basically I would have an installation with, say, a 5-node cluster. I delete it, the PVCs are preserved. Once I recreate it, I would keep all relevant settings (number of nodes, node types etc.), so the new pods would simply find the data from before (which should be consistent) and continue were they left off.
I personally do not see a use case for the second setting preservePvcOnDownscale, as I would expect the same problems you describe in your second corner case. Only when going from x nodes to 0 and back to x again and NOT making any changes during the scale up the problem should not occur, otherwise it almost definitely will. Will ES notice if the copies of the shards it finds on the "old" PVC are out of date?

@sebgl
Copy link
Contributor

sebgl commented Feb 13, 2020

@andrii-litvinov @jgoeres we created a script that allows recreating an Elasticsearch cluster from its old Released PersistentVolumes: https://github.com/elastic/cloud-on-k8s/tree/master/hack/reattach-pv

Could you take a look at it (the README especially)? I'm wondering if that would be good enough to cover your use case.

@jgoeres
Copy link

jgoeres commented Apr 3, 2020

Sorry for letting this topic sit idle for so long. I will give this script a try ASAP.

@vassilvk
Copy link

vassilvk commented Jun 26, 2020

I'd like to second this request.

In our use case, we deploy Elasticsearch as part of a bigger system through a single Helm chart.
There are a number of stateful components in that system and Elasticsearch is one of them (it serves as the storage engine behind a Jaeger deployment).

All of the stateful components we use (RabbitMQ, Consul, Cassandra) follow the same practice of leaving PVCs behind on uninstall, which makes a lot of sense as we have a number of scenarios where we install and uninstall the system as part of our maintenance routines. All stateful components except Elasticsearch persist their data and that data is available the next time we deploy our system.

The CRD addition of preservePvcOnDeletion: true|false suggested by @sebgl would allow us to perform maintenance without losing all our Elasticsearch indexes, thus all our Jaeger traces.

@jmorcar
Copy link

jmorcar commented Aug 14, 2020

preservePvcOnDeletion

@sebgl
When you say, add new feature, Do you means add this in yaml of output command "kubectl get crd elasticsearch" ... ? Coud you indicate the level of yaml shoud be this?

I don't find any documentation about this.

Thanks

@sebgl sebgl added the >enhancement Enhancement of existing functionality label Aug 24, 2020
@sebgl
Copy link
Contributor

sebgl commented Aug 24, 2020

I don't find any documentation about this.

It does not exist (yet) at the moment.

@vassilvk
Copy link

vassilvk commented Oct 15, 2020

I was able to make PVCs remain after Elasticsearch is removed with the help of a KubeMod ModRule which intercepts the deployment of the PVCs created by Elasticsearch operator and removes their ownerReference before they are deployed to the cluster.

This makes the PVCs persist and their data is picked up and available in the next deployment of Elasticsearch as long as the Elasticsearch cluster's scale is not changed between deployments.

apiVersion: api.kubemod.io/v1beta1
kind: ModRule
metadata:
  name: elasticsearch-pvc-modrule
spec:
  type: Patch

  match:
    - select: '$.kind'
      matchValue: PersistentVolumeClaim
    - select: '$.metadata.labels["common.k8s.elastic.co/type"]'
      matchValue: elasticsearch

  patch:
    - op: remove
      path: /metadata/ownerReferences/0

@sebgl
Copy link
Contributor

sebgl commented Feb 25, 2021

kubernetes/enhancements#2440 got merged to make it optional to automatically remove PVCs along with StatefulSets, or when StatefulSets are downscaled.

It does not include a combination that would I think be valuable to us: remove PVCs on downscale (data has been migrated away from those volumes anyway), but preserve PVCs on deletion of the entire cluster. This would allow restoring a cluster with existing data later on.
I added a comment to the (already merged) KEP: kubernetes/enhancements#2440 (comment).

@thbkrkr thbkrkr added the v1.5.0 label Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality v1.5.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants