Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve distribution of previews and avatars to multiple object store buckets #22033

Closed
MorrisJobke opened this issue Jul 29, 2020 · 4 comments
Closed
Labels
Milestone

Comments

@MorrisJobke
Copy link
Member

MorrisJobke commented Jul 29, 2020

Problem

  • right now all previews (and avatars) are stored in the app data folder
  • this folder is in oc_storages inside the root storage
  • in a multibucket setup each storage is put into one bucket
  • the root storage is always put into the first bucket - -0
    $config['arguments']['bucket'] .= '0';
  • this results in a quite uneven distribution of objects across the buckets, because all previews of all files are put into one bucket, which limits the usability of the object store as there are recommended upper limits of objects for a single bucket

Ideas

  • first there was the idea to move the previews back to the user storage as one knows from the fileID the user home storage and can look up the correct storage and bucket. @icewind1991 highlighted that this was the case already in the past and we moved away from it, because it resulted in a lot of complexity for sharing of files that are on an external storage. See Improve previews #1741
  • @icewind1991 brought up another idea, that is now possible due to Move to subfolders for preview files #19214 - we could add to this app data wrapper for the preview folder a config check to change the bucket depending on preview subfolder. This would allow to not need to move the previews out of the app data (to avoid the issues with sharing and external storages) and at the same time distribute the previews. This would most likely happen in
    return implode('/', str_split(substr(md5($name), 0, 7))) . '/' . $name;

Things to keep in mind

  • have a migration path - we could just drop all previews and regenerate, but this could cause problems on bigger instances right after the migration due to the massive amount of preview generations.
  • the migration path should be transparent and step by step (i.e. per user or even per preview/previewed file):
    • having a flag in the user config/filecache that indicates if the new or old approach is used
    • after the upgrade all use the old approach
    • having a background job (that only runs on CLI or in very small batches) that takes unprocessed previews and locks their "migration happening" state
    • the job then copies the previews, changes the flag and then deletes the old previews
    • having a way to see the status of this migration somewhere in the admin panel or the CLI
  • maybe also move this directly to an explicit way of indicating which preview is on which bucket to make it easier later to extend the bucket count and being able to move new files into new buckets (See Mark object store bucket as full and move new files into new bucket #22039 )

@kesselb @rullzer @icewind1991 Feedback welcome

@MorrisJobke MorrisJobke added enhancement 1. to develop Accepted and waiting to be taken care of feature: object storage labels Jul 29, 2020
@MorrisJobke MorrisJobke added this to the Nextcloud 20 milestone Jul 29, 2020
@averdecia
Copy link

averdecia commented Jul 29, 2020

Hi @MorrisJobke as I can see, the idea is to split the preview folder into a small number of folders than the default (fileId) using the md5-first-7-letters approach.
Till that, no big problem, the "error" solved is the load on the directory listing(mostly on local filesystem environments).
My concern on this ticket is related to bucket configuration and the limit of thems.
The idea is to have a configuration (like the current user bucket preference) that will save the previews randomly inside the buckets, using the md5 first seven letters. Even tho maybe an "on the fly" approach can be used, and avoid some database inner joins.

A math approach can be:

  1. First, check the multi bucket config and get the number of buckets
  2. Take the first md5 7 letters and convert them to numbers (from hexadecimal to decimal)
  3. Divide the decimal result by the number of buckets and get rest of the division
  4. Dynamically set the bucket number for uploading and downloading the preview, based on the calc made

This will indeed divide the storage for previews, around the buckets.
Something that I would mention is that this will only work with a static bucket number. If bucket number is increased, then is mandatory to save the calculated bucket by id in an additional table.
Regards

@MorrisJobke
Copy link
Member Author

Hi @MorrisJobke as I can see, the idea is to split the preview folder into a small number of folders than the default (fileId) using the md5-first-7-letters approach.

This is already implemented. And as this is done as a layer we can reuse this layer to also do the distribution across multiple buckets.

A math approach can be:

  1. First, check the multi bucket config and get the number of buckets
  2. Take the first md5 7 letters and convert them to numbers (from hexadecimal to decimal)
  3. Divide the decimal result by the number of buckets and get rest of the division
  4. Dynamically set the bucket number for uploading and downloading the preview, based on the calc made

Yep - that would also be our naive approach. We maybe join the efforts in here with the ideas of #22039 and make this a bit more permanent. Otherwise changing the logic or changing the number of buckets will lead to wrongly calculated bucket numbers. So storing the result of the formula together with the preview makes it possible to change the number of buckets or the formula itself later on. But maybe this is also something we could do afterwards and have a solution for the first problem.

Something that I would mention is that this will only work with a static bucket number. If bucket number is increased, then is mandatory to save the calculated bucket by id in an additional table.

There we plan to use the filecache_extended table most likely.

Thanks for the feedback.

MorrisJobke added a commit that referenced this issue Jul 30, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
MorrisJobke added a commit that referenced this issue Jul 30, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
@MorrisJobke
Copy link
Member Author

First proof of concept is in #22063 - this introduces the storages and already stores new previews in there.

MorrisJobke added a commit that referenced this issue Aug 3, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
MorrisJobke added a commit that referenced this issue Aug 6, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
MorrisJobke added a commit that referenced this issue Aug 7, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
MorrisJobke added a commit that referenced this issue Aug 7, 2020
…e buckets

* introduces a new IRootMountProvider to register mount points inside the root storage
* adds a AppdataPreviewObjectStoreStorage to handle the split between preview folders and bucket number

Ref #22033

Signed-off-by: Morris Jobke <hey@morrisjobke.de>
@MorrisJobke
Copy link
Member Author

Implemented in #22063. And there is a migration tool, that migrates pre-Nextcloud 19 previews to the new preview folders that #19214 (this also works for non-multibucket setups): #22135. If there are previews in the folder structure of #19214 already, there is no migration path yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants