-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add authenticate_by when using has_secure_password #43765
Conversation
e334a52
to
050c871
Compare
if record = find_by(attributes.except(*passwords.keys)) | ||
record if passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size | ||
else | ||
self.new(attributes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just calling the hash algorithm again right? I think we should just do new(passwords)
in that case to be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct! Sounds good to me. I've also added a CHANGELOG entry.
This method is intended to replace code like the following, which returns early when a user with a matching email is not found: ```ruby User.find_by(email: "...")&.authenticate("...") ``` Such code is vulnerable to timing-based enumeration attacks, wherein an attacker can determine if a user account with a given email exists. After confirming that an account exists, the attacker can try passwords associated with that email address from other leaked databases, in case the user re-used a password across multiple sites (a common practice). Additionally, knowing an account email address allows the attacker to attempt a targeted phishing ("spear phishing") attack. `authenticate_by` addresses the vulnerability by taking the same amount of time regardless of whether a user with a matching email is found.
050c871
to
9becc41
Compare
Follow-up to rails#43765. This ensures that `authenticate_by` supports controller params, e.g.: ```ruby User.authenticate_by(params.permit(:email, :password)) ``` Note that `ActionController::Parameters#to_h` will raise an error when there are unpermitted params. This guards against unsafe usage such as: ```ruby User.authenticate_by(params) ```
|
||
if record = find_by(attributes.except(*passwords.keys)) | ||
record if passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how is this time constantant? line 46 will find a record and 48 not. I would guess there is measurable time difference between the two on db site, isn't it? this will encrease when the database is under heavy load and/or the table has a hughe amount of entries.
and with a hughe amount of attributes there will be a difference between instantiating a "real" record and record only with a view attributes.
An adversary will not make a single query. To get rid of the network jitter you have to make a lot of queries.
Or I'm wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for bringing up these concerns!
If identifiers (e.g. email) are backed by a database index (which they should be!), I believe the query will take roughly the same amount of time for a given database load, regardless of whether the record is found.
Also, to my knowledge, Active Record lazily casts attributes. In other words, initializing a record only stores raw string values, and additional processing isn't done until attributes are accessed. I suppose it is possible to have so many attributes that simply storing those strings would increase the timing beyond network latency jitter. In that case, though, you probably want to limit the attributes you fetch anyway, e.g. User.select(:id, :email, :password_digest, ...).authenticate_by(...)
.
However, if I'm underestimating the risk of having many attributes, one solution would be to add something like reselect(primary_key, *passwords.keys.map{ |name| :"#{name}_digest" })
before the find_by
, and then return find(record[primary_key])
instead of record
. Of course, that would incur a 2nd database query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your reply.
regarding the index take a look at CVE-2019-16782 within rack. there the data was stored in a hash, I think the performance of a hash in memory is better then a lookup in a db index.
so I would take an approach like they did, to solve the issue, canonicalize the query attributes create a hash, store these hash within the db, lookup these hash within the db at query time,
select only these hash + password.
one step further, do it like the "have I been pawned" api, query only the first x chars of the
hash so that you always select n records. make a secure compare on the hash and after
this a secure compare on the password. so you never know if you got only a near hit, or a real hit.
but, even if this is really ok (I don't know for sure). then after a record is found in the db, there will be log entries made like failed login attemp ...
and often there will be a failed login attemp + 1 update, that will
cost time and is out of your hand.
it's hard to get this bulletproof I think. every thing arround you is designed to fail fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with Rack internals, so I can't say whether the analogy holds. Perhaps @tenderlove wouldn't mind weighing in, since he authored the commit you linked.
I agree that it's hard to make it bulletproof. One thing to keep in mind is risk vs reward vs effort. Session hijacking is a bigger threat than user enumeration. Conversely, altering the database schema for Rack is more feasible (I assume) than asking users to add a column (and index) to their own tables. Also, some mitigation is better than none. The more we reduce the timing difference, the more requests an attacker has to make to establish statistical significance. Past a certain point, attack detection and rate limiting software can provide additional protection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to keep in mind is risk vs reward vs effort.
true
The more we reduce the timing difference, the more requests an attacker has to make to establish statistical significance
don't you add extra time with your dynamic lookup (passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size
)?
for this to achieve, wouldn't timing be more equal with a generated "real" method within InstanceMethodsOnActivation
matching the attribute name ala authenticate_<attr_name>_securely
is in place?
and what's about the else
tree? you don't compare a passsword here, what happens if you set BCrypt::Engine.cost
to a higher value like BCrypt::Engine.cost = 31
? Or when you get password ala User.authenticate_by(email: "wrong@example.com", password: "abc123" * 1000 )# or a large random string
?
is your benchmark still valid with these edge cases?
when a developer uses your method, how can a failed attempts counter be encreased?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this to achieve, wouldn't timing be more equal with a generated "real" method within
InstanceMethodsOnActivation
matching the attribute name alaauthenticate_<attr_name>_securely
is in place?
I am not sure what you mean. InstanceMethodsOnActivation
does generate real methods matching the attribute name. passwords.count { ... }
is calling those methods.
and what's about the
else
tree? you don't compare a passsword here, what happens if you setBCrypt::Engine.cost
to a higher value likeBCrypt::Engine.cost = 31
? Or when you get password alaUser.authenticate_by(email: "wrong@example.com", password: "abc123" * 1000 )# or a large random string
?
In the else
branch, passwords are hashed with BCrypt via setters (specifically, this line) which are called by the constructor. The cost of hashing the passwords is the same either way. There will be some timing difference due to (not) comparing the hashes, but we don't have a hash to securely compare to. In an earlier revision, I considered comparing passwords with a dummy password constant, but that introduced complications when changing the BCrypt cost (e.g. via ActiveModel::SecurePassword.min_cost
).
(Also, for reference, BCrypt passwords can be a maximum of 72 bytes. Additional characters are ignored.)
when a developer uses your method, how can a failed attempts counter be encreased?
That is outside the scope of authenticate_by
, but you can increment your counter when authenticate_by
returns nil
.
I appreciate you giving thought to these issues. If you think there is a problem with the code, I would encourage you to gather data (e.g. benchmarks) that demonstrates the problem. If you think you have a solution for a problem, I would encourage you to submit a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean. InstanceMethodsOnActivation does generate real methods matching the attribute name. passwords.count { ... } is calling those methods.
yes, you'r right should go into ClassMethods, in order to remove the need for public_send
. this benchmark,
shows what it costs.
(Also, for reference, BCrypt passwords can be a maximum of 72 bytes. Additional characters are ignored.)
I know, but it does not keep others from sending you payloads as they wish, just to keep your system a little bit more busy.
I did create a little benchmark here but I'm not sure, if my benchmarking is valid. maybe you can spot some issues. I hope the README
is enough to get the project up & running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having a little trouble following the benchmark you linked, so I created a benchmark based off of a Rails bug report template:
Benchmark script
# frozen_string_literal: true
require "bundler/inline"
gemfile(true) do
source "https://rubygems.org"
git_source(:github) { |repo| "https://github.com/#{repo}.git" }
gem "rails", github: "rails/rails", branch: "main"
gem "sqlite3"
gem "bcrypt"
gem "faker"
end
require "active_record"
# require "minitest/autorun"
require "logger"
# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
# ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Schema.define do
create_table :users, force: true do |t|
t.string :email
t.string :password_digest
t.index ["email"], name: "index_users_on_email", unique: true
end
end
class User < ActiveRecord::Base
has_secure_password
end
def authenticate_all(emails_and_passwords)
emails_and_passwords.each do |email, password|
User.authenticate_by(email: email, password: password)
end
end
USERS = 1000
ITERATIONS = 1000
puts "Generating fake data..."
# Use a limited number of passwords to save time during setup.
password_pool = 10.times.to_h do
password = Faker::Internet.password(min_length: 10, max_length: 20)
digest = User.new(password: password).password_digest
[password, digest]
end
emails_and_passwords = USERS.times.to_h { [Faker::Internet.email, password_pool.keys.sample] }
correct_emails_and_correct_passwords = emails_and_passwords.to_a.sample(ITERATIONS).to_h
correct_emails_and_incorrect_passwords = correct_emails_and_correct_passwords.transform_values { |password| password + "!" }
incorrect_emails_and_correct_passwords = correct_emails_and_correct_passwords.transform_keys { |email| email + "!" }
puts "Creating #{USERS} users..."
emails_and_passwords.to_a.shuffle!.each do |email, password|
User.create!(email: email, password_digest: password_pool[password])
end
puts "Benchmarking #{ITERATIONS} iterations..."
Benchmark.bm(35) do |x|
x.report("correct email, correct password") { authenticate_all(correct_emails_and_correct_passwords) }
x.report("correct email, incorrect password") { authenticate_all(correct_emails_and_incorrect_passwords) }
x.report("incorrect email, correct password") { authenticate_all(incorrect_emails_and_correct_passwords) }
end
Here are the results for USERS = 1000
and ITERATIONS = 1000
:
user system total real
correct email, correct password 390.200265 0.041256 390.241521 (390.319875)
correct email, incorrect password 390.474732 0.008028 390.482760 (390.522619)
incorrect email, correct password 390.994839 0.019997 391.014836 (391.060804)
The reason you are seeing a difference in your benchmark is because you always set the password to ""
. But that is a good catch! I've fixed that behavior in #43958.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my benchmark is a "normal" rails app. it requires postgresql. this is mandatory. you have to create a db, migrate & seed it. seeding will take up a long time. even more when you increase the bcrypt costs.
after this you run a rails runner script via rails r scripts/benchmark.rb
if you wan't to see greater differences,
then you set the db "under load" by running rails db:seed
once again, during the benchmark, so the db is under load during the benchmark, like a normal app is.
now that you changed your code, you have to set the password to " " and you will still see differences, not that much as before, but still messasure able.
I think your missing specific points, you can't compare sqlite with postgresql. my benchmark uses explicitly btree index, because it's known to leak informations.
your benchmark does not set the db under load. your schema is small and your db relativly empty, you can't compare 1000 users with 3 attrs with a filled up db. my benchmark fills the db with 500.000 (still small sized rows) records. if you want a real world example, of a user table, take a look at the gitlab user model. further more,my benachmark does not consider, that often associations to the user model will be eager loaded, like permissions & settings, what inceases the used time.
providing such a detailed benchmark, is a little bit too time consuming.
I think there are valid use cases where password protected accounts and accounts with a zero lenght password
live together. now you can't you use your method anymore for these use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your missing specific points, you can't compare sqlite with postgresql. my benchmark uses explicitly btree index, because it's known to leak informations.
your benchmark does not set the db under load. your schema is small and your db relativly empty, you can't compare 1000 users with 3 attrs with a filled up db. my benchmark fills the db with 500.000 (still small sized rows) records. if you want a real world example, of a user table, take a look at the gitlab user model. further more,my benachmark does not consider, that often associations to the user model will be eager loaded, like permissions & settings, what inceases the used time.
Feel free to modify the script I posted to use Postgres and report the results! Also feel free to increase the number of users! The reason I added separate USERS
and ITERATIONS
settings, as well as the password_pool
, is so that it would be feasible to do so. 😄 (Though a faster method of generating fake emails might also help.)
Unfortunately, we cannot dictate what databases, schemas, index types, or data sets users have. The purpose of my benchmark was to show that authenticate_by
itself exhibits a minimal timing difference, since that is what we can control.
My intuition is that a database (or web server) under load will exhibit more variability in response times than one not under load. If so, this variability would make true timing differences harder to discern rather than easier. But I have not gathered evidence to demonstrate this.
If you would like to add some concurrency to my benchmark, authenticate_all
would be a good spot to spawn multiple threads, and then wait for each of them to complete with Thread#join
.
Incidentally, my understanding of your benchmark is that it executes sequentially, and thus does not test the database under load either. Is that correct? Or was it run by spawning many processes?
I think there are valid use cases where password protected accounts and accounts with a zero lenght password
live together. now you can't you use your method anymore for these use cases.
Perhaps, but such usage was already unsupported by has_secure_password
:
class User < ApplicationRecord
has_secure_password validations: false
end
user = User.create!(email: "unique@example.com", password: "")
User.find_by(email: user.email).authenticate("") # => false
I did change my benchmark, to show that you still leak informations, when you use eager-load. more I can't to. sorry, I did already spend the amount of time I can spare. to make this kind of tests you need a lot of time. you need already a lot of time to run the test. maybe some one else can pick up here. maybe the rails bugbounty program will inspire others to pickup here.
it depends on the kind/amount of load. lookups get slower if the indece has to be swapped out of the ram, because the load requires the large indece of other tables. at the end indeces get so large that they can't be hold in the RAM anymore. this will be the interesting part to measure. timing may get more obvious then.
yes, the benchmark itself is sequential, but if you run seeding in parallel, it creates insert load on the same table.
for your method the db is an important factor. I think your simply ignore the db.
TIL
here you make a promise and I think there will be more edge cases where you can't keep your promise. for instance you can't use |
I did add some quick thoughts. maybe it is of some use |
This method is intended to replace code like the following, which returns early when a user with a matching email is not found:
Such code is vulnerable to timing-based enumeration attacks, wherein an attacker can determine if a user account with a given email exists. After confirming that an account exists, the attacker can try passwords associated with that email address from other leaked databases, in case the user re-used a password across multiple sites (a common practice). Additionally, knowing an account email address allows the attacker to attempt a targeted phishing ("spear phishing") attack.
authenticate_by
addresses the vulnerability by taking the same amount of time regardless of whether a user with a matching email is found.