Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change "bucket" algorithm #3

Merged
merged 3 commits into from
Apr 9, 2015
Merged

Change "bucket" algorithm #3

merged 3 commits into from
Apr 9, 2015

Conversation

jechols
Copy link
Contributor

@jechols jechols commented Apr 9, 2015

Changes the system to use MD5 to produce base-16 values, instead of relying on IDs directly. Using MD5 ensures very random-like buckets for better distribution of files across buckets, and in a perfect world, you won't see very many directories/files in any given "bucket" until you have a lot of files.

The existing system, since it uses just the id as-is, had a few concerns for us:

  • It'll put most assets into the same few directories, especially with alphanumeric noids. Testing 1000 assets, we ended up with all of them showing up in 25/14/nX, where "X" is one of about ten letters. Further testing suggests it would take a long time before a new bucket at levels one or two would be created (the "25" or "14" in /25/14/nX).
  • If the id is 8 digits or fewer, every asset will live in its own directory - probably not a real problem, but it seems like a high directory-to-file ratio
  • If the id is extremely long (as an example, 14 digits), way too many assets will live in the same directory: 12345678901234 turns into /12/34/56/78 - and that directory will hold 12345678000000 through 12345678999999, or nearly a million assets.

I'm sure this approach is not the best way to do it, but at the least, a configurable "treeifier" seems like something worthwhile.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.2%) to 97.3% when pulling 389638a on jechols:feature/fix-buckets into 5eb5617 on projecthydra-labs:master.

@jechols
Copy link
Contributor Author

jechols commented Apr 9, 2015

Oops, my comment above is wrong - I used base-16 not 36.

@mjgiarlo
Copy link
Member

mjgiarlo commented Apr 9, 2015

Edited your comment to reflect that, @jechols

@mjgiarlo
Copy link
Member

mjgiarlo commented Apr 9, 2015

This is excellent. Thanks!

mjgiarlo added a commit that referenced this pull request Apr 9, 2015
@mjgiarlo mjgiarlo merged commit 12ae236 into samvera:master Apr 9, 2015
@jechols jechols deleted the feature/fix-buckets branch April 10, 2015 18:10
mjgiarlo added a commit that referenced this pull request Aug 4, 2015
The treeifier was added in #3 as a workaround to a bug in the underlying 'noid' gem. The workaround spawned a discussion of how best to bucket assets that generated a number of possible solutions, none of which seemed satisfactory. The underlying noid bug is now resolved, and the treeifier functionality was released temporarily in AF::Noid 1.0.0 which was yanked, so no one's using the functionality. Therefore, be it resolved that the treeifier is now put to rest, may it rest in peace. (Thanks, @jechols, @HackMasterA, and others for helping me through this.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants