Make ActiveStorage::Blob keys lowercase #34818

julik · 2018-12-28T15:36:34Z

Summary

When used with case-insensitive filesystems ActiveStorage blobs might cause non-portable Disk storage directories to be created, and the UNIQUE index that gets configured on the active_storage_blobs table is database-dependent in terms of case-sensitivity. If the database index is case-sensitive, but the underlying filesystem is not, blobs might be overwritten and the available key space for uploaded blobs shrinks drastically. This does happen on macOS filesystems already.

This patch changes the ActiveStorage key generation from mixed-case base-58 to lowercase base-36, and increases the key size from 24 bytes to 28. When using base58 there are 58**24 possible key values, and to have the same or more possible keys with base36 we need to have 36**28, increasing the storage requirement for the key by 4 bytes. Since the database migrations for the blobs table are not CHAR()-size-restricted (and the fix worked for me both on SQLite and MySQL) I believe this is a reasonable compromise.

Note that this will not alter the storage keys that have already been generated, their treatment wills tay unchanged. Neither will it correct the storage directory structures that were created using mixed-case keys on macOS since you need to "rekey" the stored objects if you want the storage directory to be portable to a case-sensitive filesystem.

Closes #34804 , addresses the underlying issue in #33864 for freshly-generated uploaded blobs. Unfortunately it cannot be excluded that for some people some blobs already did get overwritten if SQLite was used in combination with a case-insensitive filesystem and the Disk storage service. Neither does this change existing blob keys.

This change does have an implication that for existing S3 buckets and for other services that potentially use "naive partitioning" (first N bytes of the given object store key) partition usage is going to become less "fair". However, AWS recently dropped the recommendation on manual key distribution for the key prefix (https://www.infoq.com/news/2018/10/amazon-s3-performance-increase) as it seems they have adopted hashing for the keys the user provides, so this would not be a problem for buckets that get created.

The advantages of using a lowercase-only key:

Less potential issues with collation-sensitive database indices
Removes issues with potentially case-insensitive blob stores
Makes it easier to download blobs from a remote store to a case-insensitive filesystem for debugging

activestorage/test/models/blob_test.rb

georgeclaghorn · 2018-12-30T04:24:02Z

This also merits a changelog entry. Can you please add one?

sikachu

Sorry, but I think base58.downcase is not really the right way to fix this.

The reason is that by generating base58 string, we could ended up with Aaaaa... and aaaaa... which when downcase will cause a key collision.

If we want to make it case insensitive, that means we have to throw away all lowercase or all uppercase characters altogether, which means we need to switch to something else. I was looking around and seems like maybe we can implement Base32 then downcase, or z-base-32 instead.

I'll let @georgeclaghorn decide what we should do next, but base58.downcase is definitely not what we're looking for for sure.

julik · 2018-12-30T11:37:39Z

@sikachu @georgeclaghorn thank you for your comments. base58(k).downcase is a cheap solution indeed, because otherwise I would have to add a base36 implementation to SecureRandom which would increase the API surface there (while not reused anywhere else). I can copy-paste the implementation for base58 with base36 into the Blob class proper if that will be the preference.

@sikachu as it stands now ActiveStorage::Blob keys are not immune to collisions and this is actually a severe issue, see #34805 - but I do get your point that we should probably use the entropy source better.

sikachu · 2018-12-30T12:01:55Z

I think it's fine to add base32 just for Active Storage. base58 was added only for Active Record's secure token as well. I think making SecureRandom generate base32 will still be safer than base58 and downcase.

However, I'm not really sure that base32 or base36 should be used here. Base58 actually exclude some characters that are ambiguous, same goes with base32. We may be able to use base32 and conform to RFC 4648, then just make sure that the length we choose is multiplication of 8 so we don't have to add padding.

I actually found this implementation of base32 here which is under MIT license, so I think we may be able to use as a reference: https://github.com/stesla/base32/blob/a65a6cded0b37a61d10c5a1bef400464eb353cb1/lib/base32.rb#L56-L62

instead of downcasing the Base58 variant

julik · 2018-12-30T12:12:52Z

Why would you want the key to be base64 and/or base32 conformant? It does not give you extra guarantees of uniqueness or the like, but it does force you to do bookkeeping to have N-complement key lengths (28 is 4-complement) and creates those =s. Base58 does omit chars that look similar ("lowercase-L versus 1" etc) but I don't know if this has value when you are dealing with a "bag of keys" situation at all.

sikachu · 2018-12-30T12:38:09Z

Ah, I agree with you that this is a bag of key situation and hence it does not really matter. I just thought that base32 is closest to what we want to archive (single-case alphabets and numeric) and is one of the standard so adding this to the core would have a high probability of it being reused, despite this is its first usage.

I think I'm going to have to ask @georgeclaghorn if "omitting the similar looking characters" is one of the feature we want or is it just that base58 is already available so we just use it. If it doesn't matter, then maybe base36 will be fine, otherwise I think base32 will be a great addition to the core as well.

keys

georgeclaghorn · 2018-12-30T15:04:07Z

Base36 sounds fine to me, but the implementation as it stands now needs to move to the SecureRandom extension, where it can live alongside SecureRandom.base58.

activesupport/lib/active_support/core_ext/securerandom.rb

activesupport/test/core_ext/secure_random_test.rb

activesupport/lib/active_support/core_ext/securerandom.rb

activestorage/app/models/active_storage/blob.rb

Braces are to avoid warnings when running tests as we start the arguments with a /

julik · 2018-12-30T16:09:55Z

🎉 please squash-merge since there are quite a few interim commits

georgeclaghorn · 2018-12-30T16:56:35Z

Thank you! ❤️

matthewd · 2018-12-30T18:29:12Z

activesupport/lib/active_support/core_ext/securerandom.rb

+  #   p SecureRandom.base36(24) # => "77tmhrhjfvfdwodq8w7ev2m7"
+  def self.base36(n = 16)
+    SecureRandom.random_bytes(n).unpack("C*").map do |byte|
+      idx = byte % 64


This has a nearly 50% chance of being unable to use the random byte and having to fall back to random number... it'd presumably be worth benchmarking before spending any real effort on it, but a quick shot at something that should be cheaper: (Someone want to check my maths?)

idx = byte / 7

Might be, I would actually rewrite both base58 and base36 to use a much more pedestrian

n.times.map { alphabet[SecureRandom.random_number(alphabet.length) }.join

More method calls but probably "less clever" is better in this case. I didn't touch the implementation because I was thinking "this byte trickery for obtaining bytes from a potentially-slow SecureRandom is here for a reason"

Just didn't want to touch it since this is not what the PR is about and this would need microbenchmarks etc.

I was curious enough to follow through, so:

Benchmark: https://gist.github.com/matthewd/6827623f4e0308943e8c82db490cc118

We do generally avoid programming to microbenchmarks in high level library code, but I think the other side of that policy is being more accepting of minor ugliness when we're building lower level primitives.

(As a point of comparison, it'd be entirely silly to switch to base32 just because we can generate them twice as fast. This is purely about "there's a measurably-faster way to achieve the same thing for a similar amount of code".)

Noted, I'll open a patch for this once I'm done with #34827

Make ActiveStorage::Blob keys lowercase rails/rails#34818

Make ActiveStorage::Blob keys lowercase

eafd932

rails-bot bot added the activestorage label Dec 28, 2018

julik added 2 commits December 28, 2018 17:33

Alter test to compensate for key growth

fc52c21

...and off-by-1 errors

971cbd6

georgeclaghorn suggested changes Dec 30, 2018

View reviewed changes

activestorage/test/models/blob_test.rb Show resolved Hide resolved

georgeclaghorn self-assigned this Dec 30, 2018

sikachu requested changes Dec 30, 2018

View reviewed changes

julik added 3 commits December 30, 2018 13:04

Add changelog entry

dd3aeaf

Simplify test

5aa22f1

Use SecureRandom better when generating the key

799fc52

instead of downcasing the Base58 variant

julik added 2 commits December 30, 2018 13:13

Double quotes

73c0340

Update the #key method comment

ae68b6b

Keys keys

cd611b9

keys

Move base36 into AS core extension

e1d5b41

rails-bot bot added the activesupport label Dec 30, 2018

georgeclaghorn suggested changes Dec 30, 2018

View reviewed changes

georgeclaghorn reviewed Dec 30, 2018

View reviewed changes

activestorage/app/models/active_storage/blob.rb Outdated Show resolved Hide resolved

julik added 2 commits December 30, 2018 17:01

Remove unused constant

12e4ef0

All the edits

c8ed90b

Braces are to avoid warnings when running tests as we start the arguments with a /

georgeclaghorn approved these changes Dec 30, 2018

View reviewed changes

georgeclaghorn merged commit e5f4162 into rails:master Dec 30, 2018

georgeclaghorn mentioned this pull request Dec 30, 2018

In the mac system, the case of the file under the storage is inconsistent in the Linux system. #33864

Closed

matthewd reviewed Dec 30, 2018

View reviewed changes

julik deleted the activestorage-lowercase-keys branch December 31, 2018 01:36

suketa added a commit to suketa/rails_sandbox that referenced this pull request Jul 6, 2019

try052

c1270bc

Make ActiveStorage::Blob keys lowercase rails/rails#34818

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ActiveStorage::Blob keys lowercase #34818

Make ActiveStorage::Blob keys lowercase #34818

julik commented Dec 28, 2018 •

edited

georgeclaghorn commented Dec 30, 2018

sikachu left a comment

julik commented Dec 30, 2018

sikachu commented Dec 30, 2018

julik commented Dec 30, 2018

sikachu commented Dec 30, 2018

georgeclaghorn commented Dec 30, 2018

julik commented Dec 30, 2018

georgeclaghorn commented Dec 30, 2018

matthewd Dec 30, 2018

julik Dec 30, 2018

julik Dec 30, 2018

matthewd Dec 30, 2018

julik Dec 30, 2018

Make ActiveStorage::Blob keys lowercase #34818

Make ActiveStorage::Blob keys lowercase #34818

Conversation

julik commented Dec 28, 2018 • edited

Summary

georgeclaghorn commented Dec 30, 2018

sikachu left a comment

Choose a reason for hiding this comment

julik commented Dec 30, 2018

sikachu commented Dec 30, 2018

julik commented Dec 30, 2018

sikachu commented Dec 30, 2018

georgeclaghorn commented Dec 30, 2018

julik commented Dec 30, 2018

georgeclaghorn commented Dec 30, 2018

matthewd Dec 30, 2018

Choose a reason for hiding this comment

julik Dec 30, 2018

Choose a reason for hiding this comment

julik Dec 30, 2018

Choose a reason for hiding this comment

matthewd Dec 30, 2018

Choose a reason for hiding this comment

julik Dec 30, 2018

Choose a reason for hiding this comment

julik commented Dec 28, 2018 •

edited