Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add encryption to Active Record #41659

Merged
merged 67 commits into from Apr 1, 2021
Merged

Conversation

@jorgemanrubia
Copy link
Contributor

@jorgemanrubia jorgemanrubia commented Mar 11, 2021

This adds encrypted attributes to Active Record models. This is an extraction from HEY.

You can check the guide included with this PR to learn more about functionality and features.

Basic usage

Encrypted attributes are declared at the model level. These are regular active record attributes backed by a column with the same name. The library will transparently encrypt these attributes before saving them into the database and will decrypt them when retrieving their values.

class Person < ApplicationRecord
  encrypts :name
end

By default, it will encrypt all the data using AES-GCM with a 256-bits key and a non-deterministic approach: encrypting the same text twice will result in different ciphertexts.

You can configure it to use a deterministic encryption approach instead. In this case, the initialization vector will be derived from the text to encrypt instead of being random, which enables queries against encrypted columns:

class Person < ApplicationRecord
  encrypts :name
  encrypts :email_address, deterministic: true
end

# Person.find_by(name: "jorge") # doesn't work
# Person.find_by(email_address: "jorge@basecamp.com") # works

The library includes many additional features.

  • Encrypt action text attributes
  • Support ignoring case
  • Work with encrypted and unencrypted data simultaneously.
  • Configurable encryption schemes
  • Support old encryption schemes while migrating existing encrypted data to a new scheme
  • Compression
  • Fixtures support
  • Filter parameters based on encrypted attributes (to remove sensitive params from logs).

Please check the guide in this PR to learn more about setup and usage.

Implementation overview

EncryptableRecord is the concern that makes ActiveRecord::Base records encryptable. It defines .encrypts and the record-level API.

When declaring an attribute as encrypted, internally, it uses a custom active record attribute type (EncryptedAttributeType). This performs the encryption and decryption when serializing and deserializing attribute values. It interacts with the encryption system to do this. The main components of the encryption system are:

  • Encryptor: the internal API for encrypting and decrypting data. It interacts with a KeyProvider to build encrypted messages and deal with their serialization. The encryption/decryption itself is done by the Cipher and the serialization by MessageSerializer.
  • Cipher the encryption algorithm (Aes 256 GCM)
  • KeyProvider serves encryption and decryption keys.
  • MessageSerializer: in charge of serializing and deserializing encrypted payloads (Message).

While most users won't need to do this, these components can be customized via config settings like:

config.active_record.encryption.encryptor = MyEncryptor.new

They can also be overridden just for a given block or on a per-attribute basis:

ActiveRecord::Encryption.with_encryption_context(encryptor: MyEncryptor.new) do
  ...
end

class Article
  encrypts :title, encryptor: MyEncryptor.new
end

Queries with deterministic attributes

When querying data encrypted deterministically, a tricky problem arises when you combine encrypted and unencrypted data. Or data encrypted with different schemes (e.g: because you decide to change encryption properties for a given attribute).

To solve this, we extend Active Record to modify queries automatically. So, for example, this query:

Contact.find_by(email_address: "jorge@hey.com")

Becomes this under the hood when you enable support for unencrypted data:

Contact.find_by(email_address: [ "jorge@hey.com", "<encrypted jorge@hey.com>" ])

The implementation of this system is in ActiveRecord::Encryption::ExtendedDeterministicQueries. This system has worked flawlessly for HEY since we launched, but the implementation feels a bit brittle and experimental. Because of that, I’ve put it behind a config flag encryption.extend_queries that is false by default.

I’d be happy to get advice (and help!) to improve this implementation.

Message structure

Each encrypted value is a Message that stores:

  • The encrypted payload
  • A public (unencrypted) list of headers (Properties)

Example of headers:

  • Encryption-related information such as initialization vectors.
  • Specific encryption system meta information, such as if the payload is compressed or not.
  • Key-related information such as encrypted keys (for envelope encryption) or key references.

Serialization

Messages are serialized with a JSON-based serializer (MessageSerializer). The serializer is configurable via active_record.encryption.message_serializer.

It's important to use safe serializers that can't deserialize arbitrary objects. A common supported scenario is encrypting existing unencrypted data. An attacker can leverage this to enter a tampered payload before encryption takes place and perform RCE attacks. This means custom serializers should avoid Marshal, YAML.load (use YAML.safe_load instead) or JSON.load (use JSON.parse instead).

Performance

Time

OpenSSL AES encryption/decryption is fast, with submillisecond times for typical payloads (<700kb or less) or just a few milliseconds for multi-MB payloads.

As a result, and in our experience, the impact of adding this encryption system to an app is negligible in terms of performance. Even fast database queries are slower by one order of magnitude or more, so the overhead of encryption gets diluted when combined (not to mention if you add view-rendering to the mix).

This PR include some automated performance tests where a baseline without encryption gets compared with similar code using encryption for different key providers. You will see how the impact on tests is around 30% when encrypting data, but this is because there is no database latency in tests, so the relative impact is magnified. The same test shows identical performance when running in a more realistic environment. These tests are useful for optimization work and to prevent performance regressions.

Storage

Encryption adds storage overhead because:

  • Values are encoded in Base64. This enables storing and transmitting binary data from ciphertexts and encryption metadata without encoding issues.
  • It stores encryption metadata along with the encrypted payload. For example, initialization vectors, auth tags (AES integrity), other keys when using envelope encryption, etc.

The worst-case overload we have measured is around 250 bytes when the built-in envelope encryption key provider is used. For medium and large text columns this overload is negligible, but for string columns of 255 bytes, you should increase their limit accordingly (510 is recommended).

To help to mitigate the encryption overload, the library uses compression automatically when the payload exceeds a certain threshold (140 bytes). Compression is done before encrypting the content (compressing after wouldn't compress much due to randomness). We have observed an actual reduction in space for larger text columns.

Using compression opens a vulnerability where an attacker can enter highly compressible payloads that can cause trouble when uncompressed. To prevent this, the library adds a length validation to the encrypted column based on the database column limit. This can be disabled with the option active_record.encryption.config.validate_column_size.

There is a test StoragePerformanceTest measuring the storage overload in different scenarios. Notice that compression is not really showing much difference there due to using randomly-generated payloads. The test is meant to catch regressions and to help with storage optimization work.

It would be easy to make compression a config option, as a future improvement. Without compression you can expect a consistent 33% overhead due to the Base64 encoding.

Security audit

A security firm reviewed this library before launching HEY. Thanks to it, we could fix an important flaw regarding our deterministic encryption approach in the initial version.

Credits

  • Many folks at Basecamp helped to shape this with their feedback. Special mention to @jeremy who contributed some key design decisions while we were trying to figure this out, and to @rosa and @georgeclaghorn for their help reviewing things and discussing approaches.
  • I checked many similar libraries while working on this one, and I learned good things from all of them. I want to make a special mention to @reidmorrison and symmetric-encryption for showing how encryption was a great scenario for using a custom attribute. Other gems I checked and was inspired by include attr_encrypted, lockbox and vault-rails.

Pending

  • Add configuration options to guide
  • Make the build happy
  • Move performance tests out of the main build (to a rake task)
  • Address feedback by Rafael
  • Test upstreamed version in HEY after all the changes
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from b6c6cf4 to 3b916d7 Mar 11, 2021
Copy link
Member

@rafaelfranca rafaelfranca left a comment

Thank you for working on this.

The general architecture of the feature looks correct but there are some major points that I think we need to fix before releasing this.

  • The first is the coupling to Action Text that is now inside Active Record. I know the dependency is not needed but I don't think Active Record should have any code that know about Action Text inside it. We can inject Action Text specific behavior in the same way we did with has_rich_text methods.
  • The install of query extensions is loading ActiveRecord::Base too early in the boot process. That means that configs set inside config/initializers will not be applied anymore. The fix is easy, we just need to move those extensions to inside on_load blocks.
actiontext/lib/action_text/attribute.rb Show resolved Hide resolved
activerecord/lib/active_record/encryption.rb Outdated Show resolved Hide resolved
activerecord/lib/active_record/encryption.rb Outdated Show resolved Hide resolved
activerecord/lib/active_record/encryption.rb Outdated Show resolved Hide resolved
activerecord/lib/active_record/encryption/cipher.rb Outdated Show resolved Hide resolved
# @TODO It should not patch anything if not needed (no previous schemes or no support for previous encryption schemes)
module ActiveRecord
module Encryption
module ExtendedDeterministicQueries

This comment has been minimized.

@rafaelfranca

rafaelfranca Mar 11, 2021
Member

What would make us comfortable to move the implementation here to the right places and not put it behind a flog?

This comment has been minimized.

@jorgemanrubia

jorgemanrubia Mar 12, 2021
Author Contributor

@rafaelfranca I'd like to see other apps using the extended queries and see if it works well for them. The system has worked well for us, without issues, but I wouldn't be surprised if there are queries that it's not covering, for example. Said this, we can certainly invert the default: set the flag to true and let the apps disable it if they want. Maybe that's a better way to get reports during the beta phase.

Although not as part of this PR, I'd like to explore a more robust approach. Now that encryption is part of Active Record, there is no need to patch existing methods I think. I suspect it should be possible to come with a better and maybe simpler implementation of the same idea.

activerecord/lib/active_record/railtie.rb Outdated Show resolved Hide resolved
activerecord/lib/active_record/railtie.rb Outdated Show resolved Hide resolved
activerecord/lib/active_record/railtie.rb Outdated Show resolved Hide resolved
@saghaulor-coupa
Copy link

@saghaulor-coupa saghaulor-coupa commented Mar 11, 2021

I'm excited to see this sort of thing become built into the framework. Having maintained attr_encrypted, and developing a few different implementations of crypto at Coupa, I have learned a few things that make life more difficult or more easy. I would like to offer something that we learned during our journey. I hope that you find it useful.

If I've understood this implementation correctly based on https://github.com/basecamp/rails/blob/active-record-encryption/guides/source/active_record_encryption.md#encryption-contexts, the encryption context is not available at the record level, which is to say, it is not stored with the encrypted payload much like the DEK is. There is utility in storing the encryption context with the payload. Namely, assume 10-100s of millions of records are encrypted with a certain encryption context. Then a regulation change or your chosen algorithm is broken and requires some new encryption context. Having the context with the payload means you can simply change to the new default so new records encrypt with the new context and existing records will still be able decrypt with the old context. If a re-encryption operation is necessary, it can be done in the background while the app is live, rather than requiring the app to go offline to re-encrypt with the new context, or having some logic that must make assumptions about the state of the data to facilitate live re-encryption. Having the encryption context live with the encrypted payload makes decryption and re-encryption operations painless.

@jorgemanrubia
Copy link
Contributor Author

@jorgemanrubia jorgemanrubia commented Mar 11, 2021

@sagarjunnarkar

Having the context with the payload means you can simply change to the new default so new records encrypt with the new context and existing records will still be able decrypt with the old context

We support this scenario. Actually, I have had to do this several times in HEY already. It works like this:

Imagine you have a class with an encrypted attribute:

class Person
  encrypts :email, deterministic: true
end

Now, imagine that you need to encrypt it with AES-512 (even if it's a bit futuristic) and you have millions of records encrypted with AES-256. If you made the change like this:

class Person
  encrypts :email, deterministic: true, cipher: Aes512.new 
end

You would have a problem as you say. Existing encrypted data would fail to decrypt. But you can do this:

class Person
  encrypts :email, deterministic: true, cipher: Aes512.new, previous: [ { cipher: Aes256.new } ]
end

Now, the system:

  • Will use Aes512 for encrypting
  • Will try to use Aes512 and Aes256 when decrypting, whatever works

So just running this will re-encrypt the record with the new scheme:

person.encrypt 

As a bonus, if you tried to search a Person:

Person.find_by_email("jorge@hey.com")

It would execute this query under the hood:

Person.find_by(email: [ "<email encrypted with AES512>", "<email encrypted with AES256>" ])

This means that queries keep working but data encrypted with both encryption schemes coexist.

As mentioned, we've had to exercise this more than once during HEY lifetime. Now, the last bit that is missing is a mechanism to declare these previous encryptions schemes in a global way. So far in HEY I've done some monkey patching like this to add that :previous option globally:

module PreviousEncryptionScheme
  module EncryptableRecord
    def encrypts(*names, key_provider: nil, key: nil, deterministic: false, downcase: false, ignore_case: false, previous: [], **context_properties)
      previous = Array.wrap(previous)
      previous << ({ key_provider: key_provider, key: key, deterministic: deterministic, downcase: downcase, ignore_case: ignore_case,
        encryptor: OldEncryptor.new, message_serializer: OldMessageSerializer.new, cipher: OldCipher.new
      })
      super
    end
  end
  ...

But the idea I've always had in mind was adding something like this option instead:

active_record.encryption.previous = [ { ... } ] # global previous encryption contexts

I was actually close to implement this as part of this PR, it's an easy one, but I kept postponing it and decided to leave for a later PR 😅.

I appreciate the feedback from someone with experience maintaining a similar library. Please keep it coming if you have more thoughts.

@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from 2ae0cb3 to 351c876 Mar 11, 2021
@saghaulor-coupa
Copy link

@saghaulor-coupa saghaulor-coupa commented Mar 11, 2021

I suppose that's a good compromise. We operated in a fashion where we wouldn't re-encrypt unless necessary (keys were compromised), so if the encryption context (and in our scenario that included DEK ID) changed, we'd just leave things in place. Consequently, across millions of records you could have several encryption contexts, but because it lives with the encrypted payload, everything just works. Your implementation suggests that you can only ever have two encryption contexts. Of course, your implementation is a little more storage efficient, whereas my suggestion requires more storage space but provides more flexibility in changing the encryption context.

@jorgemanrubia
Copy link
Contributor Author

@jorgemanrubia jorgemanrubia commented Mar 11, 2021

@saghaulor-coupa

Your implementation suggests that you can only ever have two encryption contexts

You can have as many contexts as you want, it's a list of previous contexts. Now, it's true that it's not meant to have a long list of previous contexts, as that might impact decryption performance (it might have to try them all to find the right one). In that case the design you suggests would be much better.

As you say, it's all about tradeoffs. Thanks for sharing your experience here.

jorgemanrubia added a commit to basecamp/rails that referenced this pull request Mar 11, 2021
rails#41659 (comment)

This adds new rake tasks for these tests. This way, it prevents these
tests from making the build fail.
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from 6b9e001 to 25468db Mar 11, 2021
jorgemanrubia added a commit to basecamp/rails that referenced this pull request Mar 11, 2021
This adds new rake tasks for these tests. This way, it prevents these
tests from making the build fail.

rails#41659 (comment)
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from 25468db to 4a9a8d2 Mar 11, 2021
jorgemanrubia added a commit to basecamp/rails that referenced this pull request Mar 11, 2021
This adds new rake tasks for these tests. This way, it prevents these
tests from making the build fail.

rails#41659 (comment)
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from 4a9a8d2 to abfc96b Mar 11, 2021
@inopinatus
Copy link
Contributor

@inopinatus inopinatus commented Mar 12, 2021

Looks great. Just one thought: using deterministic encryption may be unnecessary for the ignore_case mechanism. I've seen implementations achieve similar outcome (particularly for email addresses) by recording the SHA-256 HMAC over a derived subkey & the case-folded attribute value via an ancillary column (e.g. named digest_<name>), that's then used only for indexing and lookup. Was this method considered in the cryptographic assessment?

@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch 6 times, most recently from f13812a to 502ffde Mar 12, 2021
@jorgemanrubia
Copy link
Contributor Author

@jorgemanrubia jorgemanrubia commented Mar 12, 2021

@inopinatus That could certainly be possible, but it would require to introduce this special case-handling in several spots, including the extended search mechanism we have in place. I think I prefer the current approach where ignore_case: leverages on the existing deterministic: implementation.

@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from 62249cb to 2f6f100 Mar 12, 2021
jorgemanrubia and others added 17 commits Mar 12, 2021
I don't think this belong to the library yet. It's more like an util class we used
for building some mass-encryption tasks in HEY.
Move the Action Text stuff to its own concern in `actiontext`.
The .encrypts declaration in `ActionText::Encryption` was making this test
fail. It was failing to load a MySQL error that was being raised when trying
to determine the column limit.
Co-authored-by: Connor Shea <connor.james.shea@gmail.com>
This adds a new class `Scheme` that encapsulates the encryption
properties for a given attribute. This adds a proper code representation
for a domain concept we keep referring to: encryption schemes.

It's in charge of processing the user-passed config properties when
declaring the attribute, as well as of validating those.

This removes the concern of processing properties from
`EncryptableRecord` and `EncryptableAttributeType`. It's also nice to
have a place to group attribute encryption options, versus passing
lists of hashes around.

This is in preparation to upcoming changes that will add new config
options and support for previous encryption schemes.
This implements several changes to encourage deterministic encryption to
remain unchanged. The main motivation is letting you define unique
indexes on deterministically-encrypted columns:

- By default, deterministic encryption will always use the oldest
encryption scheme to encrypt new data, when there are many.
- You can skip this default behavior and make it always use the current
encryption scheme with:

```ruby
deterministic: { fixed: false } # using this should be a rare need
```

- Deterministic encryption still supports previous encryption schemes
normally. So they will be used to add additional values to queries, for
example.
- You can't rotate deterministic encryption keys anymore. We can add
support for that in the future.

This makes for reasonable defaults:

- People using "deterministic: true" will get unique indexes working out
of the box.
- The system will encourage keeping deterministic encryption stable:
  - By always using oldest encryption schemes
  - By forbidding configuring multiple keys

But you can still opt-out of the default if you need to.
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from cfe24f6 to b569c86 Apr 1, 2021
This enables to disable deterministic encryption by just not setting
deterministic_key.
@jorgemanrubia jorgemanrubia force-pushed the basecamp:active-record-encryption branch from b569c86 to e24fb55 Apr 1, 2021
This is adding yet another patch to make them work. This system needs to
be reworked as it's currently very brittle.
@jeremy jeremy merged commit 07c3eee into rails:main Apr 1, 2021
2 checks passed
2 checks passed
build
Details
buildkite/rails Build #76195 passed (15 minutes, 51 seconds)
Details
@jorgemanrubia jorgemanrubia deleted the basecamp:active-record-encryption branch Apr 1, 2021
@abhaynikam
Copy link
Contributor

@abhaynikam abhaynikam commented Apr 2, 2021

@jorgemanrubia Great addition ❤️ . I think the changelog entry is missing in the PR 😊

@jorgemanrubia
Copy link
Contributor Author

@jorgemanrubia jorgemanrubia commented Apr 2, 2021

Indeed! Thanks @abhaynikam, adding it here #41821

@geetfun
Copy link

@geetfun geetfun commented Apr 2, 2021

Thanks for building this into the Rails stack!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet