New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encryption: support support_unencrypted_data
at a per-attribute level
#49072
Encryption: support support_unencrypted_data
at a per-attribute level
#49072
Conversation
5665931
to
a7fa679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this one @ghiculescu.
The query extension system is meant to make deterministically encrypted attributes work in queries, which is a core behavior you want in place if you use encryption. I think a global configuration flag makes sense, in case you want to completely disable the system. But I don't see the fine-grained configuration option justified. The case of gradually enabling this on a per-column basis doesn't feel enough justification to me. What's the benefit compared to enable the flag globally?
You should extend queries if you know you have a mixture of encrypted and unencrypted data in a column you’re querying by. If you know all data in a column is encrypted then you don’t need extended queries for that column. But if you enable the configuration globally then you still have the extra overhead added to all your queries. This PR lets you enable the config globally and then selectively opt out of it on specific columns that don’t need it. For example if you backfill a column so that it’s fully encrypted then you can stop extended queries for just that column. (Maybe I made it confusing by talking about opting-in per column in the PR body. That doesn’t do anything different to how it works now.) |
Thanks, I understand better the scenario you have in mind with your last comment. Notice that the system is also used when you use different encryption schemes ( My main concern is: the new option adds a little bit of additional complexity, both for people seeing how to use the framework and internally. Does the overhead-saving gain justifies the new config option? I haven't measured, but it would be surprising to me. Also, think that situations where you manage encrypted/unencrypted data should be a transient one. But, even if it was permanent, would the gain be justified? |
I’m not sure if that question is rhetorical, but for the record, yeah I think the gain is worth it. As you say it’s hopefully a transient state, but some backfills can take a very long time or not happen at all for whatever reason. The slightly more complex API makes up for it making for a smoother process of gradually encrypting everything. It makes the transition to fully using encryption less scary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Just some minor comments and questions.
activerecord/lib/active_record/encryption/encryptable_record.rb
Outdated
Show resolved
Hide resolved
activerecord/lib/active_record/encryption/encryptable_record.rb
Outdated
Show resolved
Hide resolved
activerecord/test/cases/encryption/extended_deterministic_queries_test.rb
Outdated
Show resolved
Hide resolved
activerecord/test/cases/encryption/extended_deterministic_queries_test.rb
Outdated
Show resolved
Hide resolved
activerecord/test/cases/encryption/extended_deterministic_queries_test.rb
Outdated
Show resolved
Hide resolved
@ghiculescu not rhetorical. Could you elaborate why you think it's worth it? By overhead you mean a performance overhead, right? What's the performance gain? |
Also interested in understanding how this option makes it less scary. Is it related to the referred overhead? Or to the psychological element in seeing the queries altered when using deterministic encryption + support for unencrypted data? |
@jorgemanrubia The backstory for this is that we have a big application that's over a decade old. I've been gradually adding encryption to parts of it over the last year. Other teams have also been adding encryption, both to existing features and new ones. The current state is we have a reasonable number of encrypted columns (over 30, and I still have more I want to do). About a quarter of these are deterministic. Some are fully encrypted, because encryption was enabled when the column was added. Some are fully encrypted, because all the data has been backfilled. And some are not fully encrypted. The eventual goal is to have every encrypted column be fully encrypted, and to turn This leaves me with a few issues:
With this PR, I can tick off both of these issues.
All this leads to my comment about scariness. Let's say one day we decide there's no more deterministic columns to encrypt, and we think we've backfilled everything. We deploy a change to production that turns ps. I think I should have led with this, my PR body was pretty basic. Sorry for the lack of context, and thanks for bearing with me. |
Thanks @ghiculescu, I understand context much better now 🙏. I agree that such support would help to gradually reach the situation where you can disable support for unencrypted data. I see the value in that. I'd suggest the following:
|
I’ll make these follow ups tomorrow 👍 |
a7fa679
to
e39e223
Compare
extend_queries
being set at a per-attribute levelsupport_unencrypted_data
at a per-attribute level
909bdeb
to
02554dc
Compare
718a35d
to
6cc63f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @ghiculescu 👏
activerecord/lib/active_record/encryption/encryptable_record.rb
Outdated
Show resolved
Hide resolved
activerecord/test/cases/encryption/extended_deterministic_queries_test.rb
Outdated
Show resolved
Hide resolved
activerecord/lib/active_record/encryption/extended_deterministic_queries.rb
Outdated
Show resolved
Hide resolved
6cc63f0
to
60ed44f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @ghiculescu, this is a nice addition 👏
activerecord/lib/active_record/encryption/encryptable_record.rb
Outdated
Show resolved
Hide resolved
@@ -23,8 +23,9 @@ module Encryption | |||
# some prepared statements caching. That's why we need to intercept +ActiveRecord::Base+ as soon | |||
# as it's invoked (so that the proper prepared statement is cached). | |||
# | |||
# When modifying this file run performance tests in +test/performance/extended_deterministic_queries_performance_test.rb+ to | |||
# make sure performance overhead is acceptable. | |||
# When modifying this file run performance tests in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in the public documentation of this module. And why we have @todo on it? Let's remove those from the public documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this module should be public API at all. Could we make the whole thing :nodoc:
?
I will move the TODOs and internal notes outside of the public docs for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgemanrubia do you have any thoughts on the public/private state of the docs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, those notes are meant to be internal comments, not part of public API. I think your change here was fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But should this module be public at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think it should
ad02964
to
b1acf2a
Compare
c0cf5ad
to
3c1c7c9
Compare
@rafaelfranca Could you please take another look at this one? |
3c1c7c9
to
439c93e
Compare
…ibute level rails#49072 allowed you to turn `support_unencrypted_data` on a global level, then turn it off for specific attributes. But it didn't allow the inverse: you couldn't turn the config off globally, and then turn it on for a specific attribute. This PR adds support for that.
…ibute level rails#49072 allowed you to turn `support_unencrypted_data` on a global level, then turn it off for specific attributes. But it didn't allow the inverse: you couldn't turn the config off globally, and then turn it on for a specific attribute. This PR adds support for that.
If
ActiveRecord::Encryption.config.support_unencrypted_data == true
, this allows you to do:Now only the
email
column will allow unencrypted data (and ifextend_queries
is true, onlyemail
queries will get extended). Here is some back story on why you might want this.