-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add encryption to Active Record #41659
Conversation
b6c6cf4
to
3b916d7
Compare
activerecord/test/cases/encryption/performance/encryption_performance_test.rb
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this.
The general architecture of the feature looks correct but there are some major points that I think we need to fix before releasing this.
- The first is the coupling to Action Text that is now inside Active Record. I know the dependency is not needed but I don't think Active Record should have any code that know about Action Text inside it. We can inject Action Text specific behavior in the same way we did with
has_rich_text
methods. - The install of query extensions is loading
ActiveRecord::Base
too early in the boot process. That means that configs set insideconfig/initializers
will not be applied anymore. The fix is easy, we just need to move those extensions to insideon_load
blocks.
# @TODO It should not patch anything if not needed (no previous schemes or no support for previous encryption schemes) | ||
module ActiveRecord | ||
module Encryption | ||
module ExtendedDeterministicQueries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would make us comfortable to move the implementation here to the right places and not put it behind a flog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rafaelfranca I'd like to see other apps using the extended queries and see if it works well for them. The system has worked well for us, without issues, but I wouldn't be surprised if there are queries that it's not covering, for example. Said this, we can certainly invert the default: set the flag to true and let the apps disable it if they want. Maybe that's a better way to get reports during the beta phase.
Although not as part of this PR, I'd like to explore a more robust approach. Now that encryption is part of Active Record, there is no need to patch existing methods I think. I suspect it should be possible to come with a better and maybe simpler implementation of the same idea.
I'm excited to see this sort of thing become built into the framework. Having maintained attr_encrypted, and developing a few different implementations of crypto at Coupa, I have learned a few things that make life more difficult or more easy. I would like to offer something that we learned during our journey. I hope that you find it useful. If I've understood this implementation correctly based on https://github.com/basecamp/rails/blob/active-record-encryption/guides/source/active_record_encryption.md#encryption-contexts, the encryption context is not available at the record level, which is to say, it is not stored with the encrypted payload much like the DEK is. There is utility in storing the encryption context with the payload. Namely, assume 10-100s of millions of records are encrypted with a certain encryption context. Then a regulation change or your chosen algorithm is broken and requires some new encryption context. Having the context with the payload means you can simply change to the new default so new records encrypt with the new context and existing records will still be able decrypt with the old context. If a re-encryption operation is necessary, it can be done in the background while the app is live, rather than requiring the app to go offline to re-encrypt with the new context, or having some logic that must make assumptions about the state of the data to facilitate live re-encryption. Having the encryption context live with the encrypted payload makes decryption and re-encryption operations painless. |
We support this scenario. Actually, I have had to do this several times in HEY already. It works like this: Imagine you have a class with an encrypted attribute: class Person
encrypts :email, deterministic: true
end Now, imagine that you need to encrypt it with AES-512 (even if it's a bit futuristic) and you have millions of records encrypted with AES-256. If you made the change like this: class Person
encrypts :email, deterministic: true, cipher: Aes512.new
end You would have a problem as you say. Existing encrypted data would fail to decrypt. But you can do this: class Person
encrypts :email, deterministic: true, cipher: Aes512.new, previous: [ { cipher: Aes256.new } ]
end Now, the system:
So just running this will re-encrypt the record with the new scheme: person.encrypt As a bonus, if you tried to search a Person.find_by_email("jorge@hey.com") It would execute this query under the hood: Person.find_by(email: [ "<email encrypted with AES512>", "<email encrypted with AES256>" ]) This means that queries keep working but data encrypted with both encryption schemes coexist. As mentioned, we've had to exercise this more than once during HEY lifetime. Now, the last bit that is missing is a mechanism to declare these previous encryptions schemes in a global way. So far in HEY I've done some monkey patching like this to add that module PreviousEncryptionScheme
module EncryptableRecord
def encrypts(*names, key_provider: nil, key: nil, deterministic: false, downcase: false, ignore_case: false, previous: [], **context_properties)
previous = Array.wrap(previous)
previous << ({ key_provider: key_provider, key: key, deterministic: deterministic, downcase: downcase, ignore_case: ignore_case,
encryptor: OldEncryptor.new, message_serializer: OldMessageSerializer.new, cipher: OldCipher.new
})
super
end
end
... But the idea I've always had in mind was adding something like this option instead: active_record.encryption.previous = [ { ... } ] # global previous encryption contexts I was actually close to implement this as part of this PR, it's an easy one, but I kept postponing it and decided to leave for a later PR 😅. I appreciate the feedback from someone with experience maintaining a similar library. Please keep it coming if you have more thoughts. |
2ae0cb3
to
351c876
Compare
I suppose that's a good compromise. We operated in a fashion where we wouldn't re-encrypt unless necessary (keys were compromised), so if the encryption context (and in our scenario that included DEK ID) changed, we'd just leave things in place. Consequently, across millions of records you could have several encryption contexts, but because it lives with the encrypted payload, everything just works. Your implementation suggests that you can only ever have two encryption contexts. Of course, your implementation is a little more storage efficient, whereas my suggestion requires more storage space but provides more flexibility in changing the encryption context. |
You can have as many contexts as you want, it's a list of previous contexts. Now, it's true that it's not meant to have a long list of previous contexts, as that might impact decryption performance (it might have to try them all to find the right one). In that case the design you suggests would be much better. As you say, it's all about tradeoffs. Thanks for sharing your experience here. |
6b9e001
to
25468db
Compare
25468db
to
4a9a8d2
Compare
This adds new rake tasks for these tests. This way, it prevents these tests from making the build fail. rails#41659 (comment)
4a9a8d2
to
abfc96b
Compare
Looks great. Just one thought: using deterministic encryption may be unnecessary for the ignore_case mechanism. I've seen implementations achieve similar outcome (particularly for email addresses) by recording the SHA-256 HMAC over a derived subkey & the case-folded attribute value via an ancillary column (e.g. named |
f13812a
to
502ffde
Compare
@inopinatus That could certainly be possible, but it would require to introduce this special case-handling in several spots, including the extended search mechanism we have in place. I think I prefer the current approach where |
62249cb
to
2f6f100
Compare
@jorgemanrubia Great addition ❤️ . I think the changelog entry is missing in the PR 😊 |
Indeed! Thanks @abhaynikam, adding it here #41821 |
Thanks for building this into the Rails stack! |
@jorgemanrubia this looks like an amazing feature! I'm excited to look at using it. I noticed that the guide and test uses the term "master key" in multiple places, most notably the guide and the tests. If I'm understanding the code correctly, I think this is actually referring to a "primary key". I assume this was changed before submitting to Rails from your internal implementation so it's understandable that some things can fall through the crack in the docs. Am I accurate in understanding that "primary key" is actually what is being referred to in places where you use "master key"? |
It is indeed the same as primary key
|
This is amazing, thank you to everyone involved! |
Hi, Just wondering if the encryption relies activerecord callbacks only. What happens if you would upsert (https://edgeapi.rubyonrails.org/classes/ActiveRecord/Persistence/ClassMethods.html#method-i-upsert) data? Since upsert does not instantiates a record i guess encryption will not be possible? Just want a clarification, not asking to actually do this :) |
Ref: rails#41659 Mutating the attributes hash requires to workaround the mutation in `find_or_create_by` etc. One extra Hash dup is not big deal.
This adds encrypted attributes to Active Record models. This is an extraction from HEY.
You can check the guide included with this PR to learn more about functionality and features.
Basic usage
Encrypted attributes are declared at the model level. These are regular active record attributes backed by a column with the same name. The library will transparently encrypt these attributes before saving them into the database and will decrypt them when retrieving their values.
By default, it will encrypt all the data using AES-GCM with a 256-bits key and a non-deterministic approach: encrypting the same text twice will result in different ciphertexts.
You can configure it to use a deterministic encryption approach instead. In this case, the initialization vector will be derived from the text to encrypt instead of being random, which enables queries against encrypted columns:
The library includes many additional features.
Please check the guide in this PR to learn more about setup and usage.
Implementation overview
EncryptableRecord
is the concern that makesActiveRecord::Base
records encryptable. It defines.encrypts
and the record-level API.When declaring an attribute as encrypted, internally, it uses a custom active record attribute type (
EncryptedAttributeType
). This performs the encryption and decryption when serializing and deserializing attribute values. It interacts with the encryption system to do this. The main components of the encryption system are:Encryptor
: the internal API for encrypting and decrypting data. It interacts with aKeyProvider
to build encrypted messages and deal with their serialization. The encryption/decryption itself is done by theCipher
and the serialization byMessageSerializer
.Cipher
the encryption algorithm (Aes 256 GCM)KeyProvider
serves encryption and decryption keys.MessageSerializer
: in charge of serializing and deserializing encrypted payloads (Message
).While most users won't need to do this, these components can be customized via config settings like:
They can also be overridden just for a given block or on a per-attribute basis:
Queries with deterministic attributes
When querying data encrypted deterministically, a tricky problem arises when you combine encrypted and unencrypted data. Or data encrypted with different schemes (e.g: because you decide to change encryption properties for a given attribute).
To solve this, we extend Active Record to modify queries automatically. So, for example, this query:
Becomes this under the hood when you enable support for unencrypted data:
The implementation of this system is in
ActiveRecord::Encryption::ExtendedDeterministicQueries
. This system has worked flawlessly for HEY since we launched, but the implementation feels a bit brittle and experimental. Because of that, I’ve put it behind a config flagencryption.extend_queries
that is false by default.I’d be happy to get advice (and help!) to improve this implementation.
Message structure
Each encrypted value is a
Message
that stores:Properties
)Example of headers:
Serialization
Messages are serialized with a JSON-based serializer (
MessageSerializer
). The serializer is configurable viaactive_record.encryption.message_serializer
.It's important to use safe serializers that can't deserialize arbitrary objects. A common supported scenario is encrypting existing unencrypted data. An attacker can leverage this to enter a tampered payload before encryption takes place and perform RCE attacks. This means custom serializers should avoid
Marshal
,YAML.load
(useYAML.safe_load
instead) orJSON.load
(useJSON.parse
instead).Performance
Time
OpenSSL AES encryption/decryption is fast, with submillisecond times for typical payloads (<700kb or less) or just a few milliseconds for multi-MB payloads.
As a result, and in our experience, the impact of adding this encryption system to an app is negligible in terms of performance. Even fast database queries are slower by one order of magnitude or more, so the overhead of encryption gets diluted when combined (not to mention if you add view-rendering to the mix).
This PR include some automated performance tests where a baseline without encryption gets compared with similar code using encryption for different key providers. You will see how the impact on tests is around 30% when encrypting data, but this is because there is no database latency in tests, so the relative impact is magnified. The same test shows identical performance when running in a more realistic environment. These tests are useful for optimization work and to prevent performance regressions.
Storage
Encryption adds storage overhead because:
The worst-case overload we have measured is around 250 bytes when the built-in envelope encryption key provider is used. For medium and large text columns this overload is negligible, but for
string
columns of 255 bytes, you should increase their limit accordingly (510 is recommended).To help to mitigate the encryption overload, the library uses compression automatically when the payload exceeds a certain threshold (140 bytes). Compression is done before encrypting the content (compressing after wouldn't compress much due to randomness). We have observed an actual reduction in space for larger text columns.
Using compression opens a vulnerability where an attacker can enter highly compressible payloads that can cause trouble when uncompressed. To prevent this, the library adds a length validation to the encrypted column based on the database column limit. This can be disabled with the option
active_record.encryption.config.validate_column_size
.There is a test
StoragePerformanceTest
measuring the storage overload in different scenarios. Notice that compression is not really showing much difference there due to using randomly-generated payloads. The test is meant to catch regressions and to help with storage optimization work.It would be easy to make compression a config option, as a future improvement. Without compression you can expect a consistent 33% overhead due to the Base64 encoding.
Security audit
A security firm reviewed this library before launching HEY. Thanks to it, we could fix an important flaw regarding our deterministic encryption approach in the initial version.
Credits
attr_encrypted
,lockbox
andvault-rails
.Pending