Add authenticate_by when using has_secure_password #43765

jonathanhefner · 2021-12-02T18:00:08Z

This method is intended to replace code like the following, which returns early when a user with a matching email is not found:

User.find_by(email: "...")&.authenticate("...")

Such code is vulnerable to timing-based enumeration attacks, wherein an attacker can determine if a user account with a given email exists. After confirming that an account exists, the attacker can try passwords associated with that email address from other leaked databases, in case the user re-used a password across multiple sites (a common practice). Additionally, knowing an account email address allows the attacker to attempt a targeted phishing ("spear phishing") attack.

authenticate_by addresses the vulnerability by taking the same amount of time regardless of whether a user with a matching email is found.

rafaelfranca · 2021-12-02T23:30:52Z

activerecord/lib/active_record/secure_password.rb

+        if record = find_by(attributes.except(*passwords.keys))
+          record if passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size
+        else
+          self.new(attributes)


This is just calling the hash algorithm again right? I think we should just do new(passwords) in that case to be explicit.

Correct! Sounds good to me. I've also added a CHANGELOG entry.

This method is intended to replace code like the following, which returns early when a user with a matching email is not found: ```ruby User.find_by(email: "...")&.authenticate("...") ``` Such code is vulnerable to timing-based enumeration attacks, wherein an attacker can determine if a user account with a given email exists. After confirming that an account exists, the attacker can try passwords associated with that email address from other leaked databases, in case the user re-used a password across multiple sites (a common practice). Additionally, knowing an account email address allows the attacker to attempt a targeted phishing ("spear phishing") attack. `authenticate_by` addresses the vulnerability by taking the same amount of time regardless of whether a user with a matching email is found.

Follow-up to rails#43765. This ensures that `authenticate_by` supports controller params, e.g.: ```ruby User.authenticate_by(params.permit(:email, :password)) ``` Note that `ActionController::Parameters#to_h` will raise an error when there are unpermitted params. This guards against unsafe usage such as: ```ruby User.authenticate_by(params) ```

consu · 2021-12-18T20:48:42Z

activerecord/lib/active_record/secure_password.rb

+
+        if record = find_by(attributes.except(*passwords.keys))
+          record if passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size
+        else


I'm curious how is this time constantant? line 46 will find a record and 48 not. I would guess there is measurable time difference between the two on db site, isn't it? this will encrease when the database is under heavy load and/or the table has a hughe amount of entries.

and with a hughe amount of attributes there will be a difference between instantiating a "real" record and record only with a view attributes.

An adversary will not make a single query. To get rid of the network jitter you have to make a lot of queries.

Or I'm wrong?

Thank you for bringing up these concerns!

If identifiers (e.g. email) are backed by a database index (which they should be!), I believe the query will take roughly the same amount of time for a given database load, regardless of whether the record is found.

Also, to my knowledge, Active Record lazily casts attributes. In other words, initializing a record only stores raw string values, and additional processing isn't done until attributes are accessed. I suppose it is possible to have so many attributes that simply storing those strings would increase the timing beyond network latency jitter. In that case, though, you probably want to limit the attributes you fetch anyway, e.g. User.select(:id, :email, :password_digest, ...).authenticate_by(...).

However, if I'm underestimating the risk of having many attributes, one solution would be to add something like reselect(primary_key, *passwords.keys.map{ |name| :"#{name}_digest" }) before the find_by, and then return find(record[primary_key]) instead of record. Of course, that would incur a 2nd database query.

thanks for your reply.

regarding the index take a look at CVE-2019-16782 within rack. there the data was stored in a hash, I think the performance of a hash in memory is better then a lookup in a db index.

so I would take an approach like they did, to solve the issue, canonicalize the query attributes create a hash, store these hash within the db, lookup these hash within the db at query time,

select only these hash + password.

one step further, do it like the "have I been pawned" api, query only the first x chars of the
hash so that you always select n records. make a secure compare on the hash and after
this a secure compare on the password. so you never know if you got only a near hit, or a real hit.

but, even if this is really ok (I don't know for sure). then after a record is found in the db, there will be log entries made like failed login attemp ... and often there will be a failed login attemp + 1 update, that will
cost time and is out of your hand.

it's hard to get this bulletproof I think. every thing arround you is designed to fail fast.

I am unfamiliar with Rack internals, so I can't say whether the analogy holds. Perhaps @tenderlove wouldn't mind weighing in, since he authored the commit you linked.

I agree that it's hard to make it bulletproof. One thing to keep in mind is risk vs reward vs effort. Session hijacking is a bigger threat than user enumeration. Conversely, altering the database schema for Rack is more feasible (I assume) than asking users to add a column (and index) to their own tables. Also, some mitigation is better than none. The more we reduce the timing difference, the more requests an attacker has to make to establish statistical significance. Past a certain point, attack detection and rate limiting software can provide additional protection.

One thing to keep in mind is risk vs reward vs effort.

true

The more we reduce the timing difference, the more requests an attacker has to make to establish statistical significance

don't you add extra time with your dynamic lookup (passwords.count { |name, value| record.public_send(:"authenticate_#{name}", value) } == passwords.size)?

for this to achieve, wouldn't timing be more equal with a generated "real" method within InstanceMethodsOnActivation matching the attribute name ala authenticate_<attr_name>_securely is in place?

and what's about the else tree? you don't compare a passsword here, what happens if you set BCrypt::Engine.cost to a higher value like BCrypt::Engine.cost = 31? Or when you get password ala User.authenticate_by(email: "wrong@example.com", password: "abc123" * 1000 )# or a large random string?

is your benchmark still valid with these edge cases?

when a developer uses your method, how can a failed attempts counter be encreased?

for this to achieve, wouldn't timing be more equal with a generated "real" method within InstanceMethodsOnActivation matching the attribute name ala authenticate_<attr_name>_securely is in place?

I am not sure what you mean. InstanceMethodsOnActivation does generate real methods matching the attribute name. passwords.count { ... } is calling those methods.

and what's about the else tree? you don't compare a passsword here, what happens if you set BCrypt::Engine.cost to a higher value like BCrypt::Engine.cost = 31? Or when you get password ala User.authenticate_by(email: "wrong@example.com", password: "abc123" * 1000 )# or a large random string?

In the else branch, passwords are hashed with BCrypt via setters (specifically, this line) which are called by the constructor. The cost of hashing the passwords is the same either way. There will be some timing difference due to (not) comparing the hashes, but we don't have a hash to securely compare to. In an earlier revision, I considered comparing passwords with a dummy password constant, but that introduced complications when changing the BCrypt cost (e.g. via ActiveModel::SecurePassword.min_cost).

(Also, for reference, BCrypt passwords can be a maximum of 72 bytes. Additional characters are ignored.)

when a developer uses your method, how can a failed attempts counter be encreased?

That is outside the scope of authenticate_by, but you can increment your counter when authenticate_by returns nil.

I appreciate you giving thought to these issues. If you think there is a problem with the code, I would encourage you to gather data (e.g. benchmarks) that demonstrates the problem. If you think you have a solution for a problem, I would encourage you to submit a PR.

I am not sure what you mean. InstanceMethodsOnActivation does generate real methods matching the attribute name. passwords.count { ... } is calling those methods.

yes, you'r right should go into ClassMethods, in order to remove the need for public_send. this benchmark,
shows what it costs.

(Also, for reference, BCrypt passwords can be a maximum of 72 bytes. Additional characters are ignored.)

I know, but it does not keep others from sending you payloads as they wish, just to keep your system a little bit more busy.

I did create a little benchmark here but I'm not sure, if my benchmarking is valid. maybe you can spot some issues. I hope the README is enough to get the project up & running.

I was having a little trouble following the benchmark you linked, so I created a benchmark based off of a Rails bug report template:

Benchmark script

# frozen_string_literal: true require "bundler/inline" gemfile(true) do source "https://rubygems.org" git_source(:github) { |repo| "https://github.com/#{repo}.git" } gem "rails", github: "rails/rails", branch: "main" gem "sqlite3" gem "bcrypt" gem "faker" end require "active_record" # require "minitest/autorun" require "logger" # This connection will do for database-independent bug reports. ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:") # ActiveRecord::Base.logger = Logger.new(STDOUT) ActiveRecord::Schema.define do create_table :users, force: true do |t| t.string :email t.string :password_digest t.index ["email"], name: "index_users_on_email", unique: true end end class User < ActiveRecord::Base has_secure_password end def authenticate_all(emails_and_passwords) emails_and_passwords.each do |email, password| User.authenticate_by(email: email, password: password) end end USERS = 1000 ITERATIONS = 1000 puts "Generating fake data..." # Use a limited number of passwords to save time during setup. password_pool = 10.times.to_h do password = Faker::Internet.password(min_length: 10, max_length: 20) digest = User.new(password: password).password_digest [password, digest] end emails_and_passwords = USERS.times.to_h { [Faker::Internet.email, password_pool.keys.sample] } correct_emails_and_correct_passwords = emails_and_passwords.to_a.sample(ITERATIONS).to_h correct_emails_and_incorrect_passwords = correct_emails_and_correct_passwords.transform_values { |password| password + "!" } incorrect_emails_and_correct_passwords = correct_emails_and_correct_passwords.transform_keys { |email| email + "!" } puts "Creating #{USERS} users..." emails_and_passwords.to_a.shuffle!.each do |email, password| User.create!(email: email, password_digest: password_pool[password]) end puts "Benchmarking #{ITERATIONS} iterations..." Benchmark.bm(35) do |x| x.report("correct email, correct password") { authenticate_all(correct_emails_and_correct_passwords) } x.report("correct email, incorrect password") { authenticate_all(correct_emails_and_incorrect_passwords) } x.report("incorrect email, correct password") { authenticate_all(incorrect_emails_and_correct_passwords) } end

Here are the results for USERS = 1000 and ITERATIONS = 1000:

user system total real correct email, correct password 390.200265 0.041256 390.241521 (390.319875) correct email, incorrect password 390.474732 0.008028 390.482760 (390.522619) incorrect email, correct password 390.994839 0.019997 391.014836 (391.060804)

The reason you are seeing a difference in your benchmark is because you always set the password to "". But that is a good catch! I've fixed that behavior in #43958.

my benchmark is a "normal" rails app. it requires postgresql. this is mandatory. you have to create a db, migrate & seed it. seeding will take up a long time. even more when you increase the bcrypt costs.

after this you run a rails runner script via rails r scripts/benchmark.rb if you wan't to see greater differences,
then you set the db "under load" by running rails db:seed once again, during the benchmark, so the db is under load during the benchmark, like a normal app is.

now that you changed your code, you have to set the password to " " and you will still see differences, not that much as before, but still messasure able.

I think your missing specific points, you can't compare sqlite with postgresql. my benchmark uses explicitly btree index, because it's known to leak informations.

your benchmark does not set the db under load. your schema is small and your db relativly empty, you can't compare 1000 users with 3 attrs with a filled up db. my benchmark fills the db with 500.000 (still small sized rows) records. if you want a real world example, of a user table, take a look at the gitlab user model. further more,my benachmark does not consider, that often associations to the user model will be eager loaded, like permissions & settings, what inceases the used time.

providing such a detailed benchmark, is a little bit too time consuming.

I think there are valid use cases where password protected accounts and accounts with a zero lenght password
live together. now you can't you use your method anymore for these use cases.

I think your missing specific points, you can't compare sqlite with postgresql. my benchmark uses explicitly btree index, because it's known to leak informations.

your benchmark does not set the db under load. your schema is small and your db relativly empty, you can't compare 1000 users with 3 attrs with a filled up db. my benchmark fills the db with 500.000 (still small sized rows) records. if you want a real world example, of a user table, take a look at the gitlab user model. further more,my benachmark does not consider, that often associations to the user model will be eager loaded, like permissions & settings, what inceases the used time.

Feel free to modify the script I posted to use Postgres and report the results! Also feel free to increase the number of users! The reason I added separate USERS and ITERATIONS settings, as well as the password_pool, is so that it would be feasible to do so. 😄 (Though a faster method of generating fake emails might also help.)

Unfortunately, we cannot dictate what databases, schemas, index types, or data sets users have. The purpose of my benchmark was to show that authenticate_by itself exhibits a minimal timing difference, since that is what we can control.

My intuition is that a database (or web server) under load will exhibit more variability in response times than one not under load. If so, this variability would make true timing differences harder to discern rather than easier. But I have not gathered evidence to demonstrate this.

If you would like to add some concurrency to my benchmark, authenticate_all would be a good spot to spawn multiple threads, and then wait for each of them to complete with Thread#join.

Incidentally, my understanding of your benchmark is that it executes sequentially, and thus does not test the database under load either. Is that correct? Or was it run by spawning many processes?

I think there are valid use cases where password protected accounts and accounts with a zero lenght password
live together. now you can't you use your method anymore for these use cases.

Perhaps, but such usage was already unsupported by has_secure_password:

class User < ApplicationRecord has_secure_password validations: false end user = User.create!(email: "unique@example.com", password: "") User.find_by(email: user.email).authenticate("") # => false

consu · 2021-12-22T13:26:34Z

Feel free to modify the script I posted to use Postgres and report the results! Also feel free to increase the number of users!

I did change my benchmark, to show that you still leak informations, when you use eager-load.

more I can't to. sorry, I did already spend the amount of time I can spare. to make this kind of tests you need a lot of time. you need already a lot of time to run the test. maybe some one else can pick up here. maybe the rails bugbounty program will inspire others to pickup here.

My intuition is that a database (or web server) under load will exhibit more variability in response times than one not under load.

it depends on the kind/amount of load. lookups get slower if the indece has to be swapped out of the ram, because the load requires the large indece of other tables. at the end indeces get so large that they can't be hold in the RAM anymore. this will be the interesting part to measure. timing may get more obvious then.

Incidentally, my understanding of your benchmark is that it executes sequentially, and thus does not test the database under load either. Is that correct?

yes, the benchmark itself is sequential, but if you run seeding in parallel, it creates insert load on the same table.

The purpose of my benchmark was to show that authenticate_by itself exhibits a minimal timing difference, since that is what we can control.

for your method the db is an important factor. I think your simply ignore the db.

Perhaps, but such usage was already unsupported by has_secure_password:

TIL

authenticate_by addresses the vulnerability by taking the same amount of
time regardless of whether a user with a matching email is found:

here you make a promise and I think there will be more edge cases where you can't keep your promise.
there will be a lot of explaining to do, under which circumstances this promise is kept.

for instance you can't use Devise::Models::Lockable with your method, because you need to load the record once again and issue an update.

jonathanhefner · 2021-12-25T17:43:25Z

@consu

here you make a promise and I think there will be more edge cases where you can't keep your promise.
there will be a lot of explaining to do, under which circumstances this promise is kept.

I do want to manage expectations, so I've opened #43997 to address this.

consu · 2021-12-25T19:26:50Z

I do want to manage expectations, so I've opened #43997 to address this.

I did add some quick thoughts. maybe it is of some use

rails-bot bot added the activerecord label Dec 2, 2021

jonathanhefner force-pushed the authenticate_by branch 3 times, most recently from e334a52 to 050c871 Compare December 2, 2021 19:13

rafaelfranca reviewed Dec 2, 2021

View reviewed changes

jonathanhefner force-pushed the authenticate_by branch from 050c871 to 9becc41 Compare December 3, 2021 16:07

rafaelfranca merged commit 99bd1d2 into rails:main Dec 3, 2021

jonathanhefner mentioned this pull request Dec 5, 2021

Convert attributes to Hash in authenticate_by #43779

Merged

consu reviewed Dec 18, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add authenticate_by when using has_secure_password #43765

Add authenticate_by when using has_secure_password #43765

jonathanhefner commented Dec 2, 2021

rafaelfranca Dec 2, 2021

jonathanhefner Dec 3, 2021

consu Dec 18, 2021

jonathanhefner Dec 18, 2021

consu Dec 18, 2021

jonathanhefner Dec 18, 2021

consu Dec 19, 2021

jonathanhefner Dec 19, 2021

consu Dec 21, 2021

jonathanhefner Dec 21, 2021

consu Dec 21, 2021

jonathanhefner Dec 21, 2021

consu commented Dec 22, 2021

jonathanhefner commented Dec 25, 2021

consu commented Dec 25, 2021

Add authenticate_by when using has_secure_password #43765

Add authenticate_by when using has_secure_password #43765

Conversation

jonathanhefner commented Dec 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

consu commented Dec 22, 2021

jonathanhefner commented Dec 25, 2021

consu commented Dec 25, 2021