Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local backend is disqualified because one namespace has broken fragments #550

Closed
kinvaris opened this issue Jan 5, 2017 · 8 comments
Closed
Labels

Comments

@kinvaris
Copy link

kinvaris commented Jan 5, 2017

We saw the following error in a proxy on the OVH setup on a global proxy:

Jan 05 14:30:28 perf-roub-01 alba[61295]: 2017-01-05 14:30:28 923020 +0100 - perf-roub-01 - 61295/0 - alba/maintenance - 20161060 - info - Disqualifying osd 1: Alba_client_errors.Error.Exn(8); backtrace:; Raised at file "map.ml", line 122, characters 16-25; Called from file "src/conv.ml", line 190, characters 10-37

OSD 1 is in this case a local backend and is disqualified because 1 namespace did not find enough fragments. This is maybe too painful for all other namespaces, because of the following example:

In this example we have 3 local backends with a policy of 2,1,2,1.
We have namespace A and this namespace is stored on all local backends
We have namespace B and this namespace is stored on the first two backends because of a timeout in backend 3 and policy requires only 2 backends to complete.
After some time, a disk breaks on a local backend and namespace A has Not Enough Fragments on backend 2, so that backend is disqualified.

--> At this point namespace B will also come to a halt because he needs at least 2 backends to read.

@wimpers wimpers added this to the Gilbert milestone Jan 10, 2017
@kinvaris
Copy link
Author

@wimpers this is not of type enhancement, this is of type bug because this can cripple your setup in an instant

@domsj
Copy link
Contributor

domsj commented Jan 10, 2017

Some remarks:

  • a disk breaking on a local backend shouldn't immediately result in NotEnoughFragments ... usually there's some redundancy on the local backends too.
  • disqualifying of an osd in the proxy only results in us not using it for new uploads, downloads will still happily try to use it
  • you added the type_enhancement label yourself ;-)

So it's not that bad, but nonetheless I'll have a look at it in the near future

@wimpers
Copy link

wimpers commented May 30, 2017

@domsj what do you have in mind to fix/improve this?

Based upon a read failure, you disqualify for writes but still happily try reads. Isn't this a bit strange?

@wimpers wimpers removed this from the G milestone May 30, 2017
@domsj
Copy link
Contributor

domsj commented Jun 1, 2017

@wimpers that is indeed a bit strange. We could change the behaviour so that only errors on write result in disqualifying the osd for new writes

@wimpers
Copy link

wimpers commented Jun 15, 2017

Did the fix for #737 also fix this one? Prob not?

@wimpers wimpers added this to the H milestone Jun 15, 2017
@domsj
Copy link
Contributor

domsj commented Jun 15, 2017

no it did not also fix this one

@wimpers
Copy link

wimpers commented Oct 19, 2017

@domsj @toolslive is this still an issue?

@wimpers wimpers modified the milestones: I, J Nov 28, 2017
@wimpers wimpers removed this from the J milestone Mar 6, 2018
@wimpers
Copy link

wimpers commented Jun 6, 2018

Fixed in EE version but not in OSE version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants