Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Snapshot/Backup capabilities for Air Gapped deployments via direct downloads or allowing individuals to push snapshots to repositories such as GitLab, Nexus or Artifactory #98728

Closed
sheldonmcclung opened this issue Aug 22, 2023 · 4 comments
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement feedback_needed Team:Distributed Meta label for distributed team

Comments

@sheldonmcclung
Copy link

Description

The below image shows the docs with the current snapshot repository options:

0672867affa7547fc828c581b79a60b93c6847da

For an environment that is air gapped, there are not many options available as far as registries to backup the elastic data.

I propose that allowing direct downloading of the data onto a system that could handle it OR allowing admins to push the download to a url that would accept the input it would be a great addition.

The shared file system repository does not work well for me as this is an ECK stack so there is no guarantee that the data will exist on a node that contains the mounted folder that would be used for backup.

I will grant that deploying MinIO to emulate an S3 bucket is an option, but I would rather not have to support another tool just to do Elastic backups.

I believe that additional options for snapshot repositories would be well appreciated!

@sheldonmcclung sheldonmcclung added >enhancement needs:triage Requires assignment of a team area label labels Aug 22, 2023
@sheldonmcclung
Copy link
Author

@DaveCTurner DaveCTurner added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed needs:triage Requires assignment of a team area label labels Aug 22, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Aug 22, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor

I think there might be a misconception here. Deduplication is a key feature of snapshots: each snapshot will re-use existing data in the repository rather than copying it all over the wire each time. That means it doesn't really make sense to directly download the data somewhere external: Elasticsearch needs to interact with the existing data in order to avoid unnecessary snapshotting. Moreover I don't think it would be easy to restore from that kind of snapshot: it certainly wouldn't be reasonable to upload the (possibly PBs of) data back into Elasticsearch just to retrieve a single index.

Pushing the data out will work better, but again it's more than just sending one enormous POST request to a HTTP endpoint, there needs to be some protocol for determining what data does/doesn't need copying each time, and then the data nodes all take part in the pushing process concurrently. And then we'd expect to be able to access the data at fairly fine granularity in order to do efficient restores too. There's lots of different protocols you could choose for this; Elasticsearch already supports several common options, but it's relatively easy to write a plugin to support a new one if you need something bespoke.

But however you do it you're going to need some nontrivial server to implement the other end of the protocol. In other words, if you're not using MinIO you're going to have to use something equivalent.

@DaveCTurner
Copy link
Contributor

No response, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement feedback_needed Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

3 participants