Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multibucket objectstore #1084

Closed
icewind1991 opened this issue Aug 26, 2016 · 11 comments
Closed

Multibucket objectstore #1084

icewind1991 opened this issue Aug 26, 2016 · 11 comments

Comments

@icewind1991
Copy link
Member

The current primary objectstore runs in scaleability issues with very large instances since it stores all objects into a single bucket and not all s3/swift implementations seem to like having multiple million objects in a bucket.

To work around this the plan is to add the ability balance objects between multiple buckets

Balance methods

There are 2 main distinct options to handle the balancing

  1. Per user balancing

    Either each user gets their own bucket (need to ensure that the used objectstore can handle very large amounts of buckets) or spread users evenly over N buckets (larger but less buckets).
    Either way, all user files stay on a single bucket which has the disadvantage that, since users probably don't have the same usage patterns, files are distributed over buckets unevenly

  2. Per file balancing

    Unlike 1. this doesn't keep user files together, instead this spreads all files evenly over N buckets (something like $bucketId = $fileId % $numberOfBuckets), this makes sure all files are spread evenly no matter what the usage pattern is, even if a single user has tens of millions of files it will still balance.

Personally I favor 2. since it's a simpler solution and I feel that it solves the problem (the storage not handling very large buckets well) better, although it's not without it's downsides

Changing balance methods

A thing that should be taken into account is whether we want to support changing balance methods (like increasing the number of buckets used) on an existing system.

Since moving all objects around according to the updated balancing scheme is not practical we would need to add some way where existing files/users still use the old scheme while new ones are on the updated scheme.

For per-user balancing this can simply be done by storing the bucket id per user. For per-file balancing storing the bucket for each file is probably not practical.
One way to handle per-file balancing is instead storing the balancing scheme user per range of file ids. Since ids are incremental we only need to store that files 1 to 100000 use 10 buckets, and all newer files use 20

@icewind1991 icewind1991 added this to the Nextcloud 11.0 milestone Aug 26, 2016
@icewind1991
Copy link
Member Author

One of the main reasons why I feel per-file balancing is "easier" is that it can be done in the objectstore implementation level without any additional information (such as the user id for the owner of a file)

@MorrisJobke MorrisJobke mentioned this issue Aug 26, 2016
47 tasks
@rullzer
Copy link
Member

rullzer commented Aug 26, 2016

As discussed the big problem with 2 is that you have to be very very careful when every chaning the number of buckets.

@despens
Copy link

despens commented Sep 4, 2016

I believe solution 2 will make it more difficult to migrate to another object store at a later point, or to migrate away from nextcloud.

@icewind1991
Copy link
Member Author

migrate away from nextcloud

migration is not a supported usecase in the first place when using a primary object store, since we use file ids as object names in the first place

@despens
Copy link

despens commented Sep 6, 2016

If that means I can store .htaccess files on object store, count me in :)

@benmichael
Copy link

benmichael commented Oct 25, 2016

Either way, we need to keep in mind that AWS S3 (which will be the majority of the users of this) has a default limit of 100 buckets per account. This can be lifted by contacting customer support, but that's a poor experience for the users of NextCloud.

Also, obviously, we'll need to have some bucket name resolution when we try create a bucket, and find that the name is taken.

@rullzer
Copy link
Member

rullzer commented Oct 31, 2016

@benmichael ah that is a good remark. That would probabaly mean we need to make it configureable. But that should be doable.

@icewind1991
Copy link
Member Author

per-user multibucket is currently merged for Nextcloud 11

@jospoortvliet
Copy link
Member

@icewind1991 what about the issue @benmichael mentioned, of a potential limit on the nr of separate buckets? Does this mean Nextcloud 11 can't handle more than 100 users per account on S3 right now?

@icewind1991
Copy link
Member Author

you can limit the number of buckets used

@petejodo
Copy link

This is an old issue but I'm curious what occurs when I have a single objectstore as primary storage (currently configured using a Digitalocean space) and want to change it to multi-bucket and add an additional Digitalocean space, and then down the line I again want to add another one? Specifically in the case of group folders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants