Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Repository API should validate file system repository paths for collisions #78276

Open
cjcenizal opened this issue Sep 23, 2021 · 8 comments
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team

Comments

@cjcenizal
Copy link
Contributor

Per convo with @original-brownbear, we advise users against mounting the same path in multiple repositories, or else this will result in weird errors or corruption.

We can improve the UX and help users avoid this situation by updating the Create Repository API to comparing the path defined in the request against those defined in existing repositories, and rejecting the request if there's a collision.

@cjcenizal cjcenizal added :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed Meta label for distributed team needs:triage Requires assignment of a team area label labels Sep 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner DaveCTurner removed the needs:triage Requires assignment of a team area label label Sep 27, 2021
@DaveCTurner
Copy link
Contributor

Can you give us a sense of how often users encounter problems that this proposal would fix @cjcenizal? Also a sense of where the protections that exist today break down? I think the errors you get in recent versions are fairly clear, they'll indicate that the repository was concurrently modified.

Avoiding collisions at registration time is a hard problem to solve: we can check for literal path equality against other repositories in the cluster but should we also try and resolve symlinks? What about duplicate NFS mounts? With other repository types there are other ways to alias the same storage to different locations, do we need to detect them too? And we can't easily prevent a different cluster from mounting the same repository either. I see that we can make it a little stricter at registration time but it won't be much stricter, certainly not watertight, so the question is how valuable this is to add.

@cjcenizal
Copy link
Contributor Author

Can you give us a sense of how often users encounter problems that this proposal would fix

No, sorry, I don't have any data on how frequently users hit this error or create file system repositories with the UI. Does the ES team have any telemetry on the percentage of repos that use the file system? That would give a rough idea of the potential for hitting this problem among our users.

we can check for literal path equality against other repositories in the cluster but should we also try and resolve symlinks? What about duplicate NFS mounts? With other repository types there are other ways to alias the same storage to different locations, do we need to detect them too?

Just a check for literal path equality would have helped me in this case. I'd settle for solving just the concrete problem I hit --"progress over perfection." :)

@Leaf-Lin
Copy link
Contributor

I'm seeing quite a few of this happening lately, some of these are with frozen nodes. I'm guessing with the searchable snapshot setup, it may not be entirely clear to users that they can re-use the same repo?

It would be great if Elasticsearch is able to return errors at register time if the repo has been written to and is not registered with a readonly flag. Is it possible to create a marker in the path so that when another snapshot getting registered on the same root path, the marker should refuse to be overwritten and result in an error to be returned to user?

@kunisen
Copy link
Contributor

kunisen commented Jan 31, 2023

Previously it seems the issue happened more when duplicate snapshot repositories are registered. However, @renshuki and I also observed it may happen even when there's a single snapshot repository.
Error message search took me here.
Thanks!

@DaveCTurner
Copy link
Contributor

What error message do you mean @kunisen?

@kunisen
Copy link
Contributor

kunisen commented Jan 31, 2023

Thanks and sorry I forgot to mention that @DaveCTurner
It's this error message:

[2023-01-31T04:00:00,001][ERROR][org.elasticsearch.xpack.slm.SnapshotLifecycleTask] 
[instance-00000000XX] failed to create snapshot for snapshot lifecycle policy [cloud-snapshot-policy]: 
RepositoryException[[found-snapshots] Could not read repository data because the contents of the repository do not match its expected state. 
This is likely the result of either concurrently modifying the contents of the repository by a process other than this cluster or an issue with the repository's underlying storage. 
The repository has been disabled to prevent corrupting its contents. 
To re-enable it and continue using it please remove the repository from the cluster and add it again to make the cluster recover the known state of the repository from its physical contents.]

Basically, we've seen this several times due to registering dup repositories.
But this time we found it's a single repository.

We didn't actually know the reason but unmounting and remounting the repository did the trick.
I will tag you in the internal ticket in case you have interest to look at it.

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jan 31, 2023

The internal ticket was a case where multiple clusters ended up writing to the same repository.

arteam pushed a commit that referenced this issue Feb 22, 2023
👋🏼 Regardless of if we decide to validation enforce #78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).
arteam pushed a commit to arteam/elasticsearch that referenced this issue Feb 22, 2023
👋🏼 Regardless of if we decide to validation enforce elastic#78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).
arteam pushed a commit to arteam/elasticsearch that referenced this issue Feb 22, 2023
👋🏼 Regardless of if we decide to validation enforce elastic#78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).
arteam added a commit that referenced this issue Feb 22, 2023
👋🏼 Regardless of if we decide to validation enforce #78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).

Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
arteam added a commit that referenced this issue Feb 22, 2023
👋🏼 Regardless of if we decide to validation enforce #78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).

Co-authored-by: Stef Nestor <steffanie.nestor@gmail.com>
saarikabhasi pushed a commit to saarikabhasi/elasticsearch that referenced this issue Apr 10, 2023
👋🏼 Regardless of if we decide to validation enforce elastic#78276, may we please drop a doc note that users should avoid duplicating repositories (particularly bucket / base paths).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement Team:Distributed Meta label for distributed team
Projects
None yet
Development

No branches or pull requests

6 participants