Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttling repeated login requests & account lock #3037

Open
5 of 6 tasks
atreya2011 opened this issue Jan 19, 2023 · 49 comments
Open
5 of 6 tasks

Throttling repeated login requests & account lock #3037

atreya2011 opened this issue Jan 19, 2023 · 49 comments
Labels
rfc A request for comments to discuss and share ideas.

Comments

@atreya2011
Copy link
Contributor

atreya2011 commented Jan 19, 2023

Preflight checklist

Context and scope

  • The current system does not have a feature for:
    • Throttling repeated login requests.
    • Locking an account after a certain number of failed login attempts.
  • This feature aims to improve security, by preventing unauthorized access attempts.

Goals and non-goals

  • To limit the number of consecutive login attempts by adding exponential backoff between each login request and lock the user out of the account after a specific number of failed login attempts.

The design

Based on the discussion here: #654

  • The system will track the number of consecutive login attempts made by a user within a certain time period.
  • If the number of login attempts exceeds a specified threshold within the time period, the user will be locked out of the account.
  • After each login attempt, the system will increment the amount of time required before the next login request can be made, using an exponential backoff algorithm by using a back-off factor.
  • The threshold (max attempts), duration, and backoff factor can be configured.
  • The account shall be unlocked using the Admin API.

Example rate-limit config

rate_limit:
  duration: 300 # in seconds
  backoff: 2 # the backoff factor

Example lockout config

lockout:
  max_attempts: 5
  consecutive_login_interval: 5m

APIs

No response

Data storage

Create a new table to store the login history of a specific identity. The schema is as follows:

  • id (uuid, primary key)
  • identity_id (foreign key referencing identities.id)
  • login_time (datetime)
  • login_status (enum: 'success', 'failure')

One identity shall have many login history records
One login history record belongs to one identity

Code and pseudo-code

When an identity attempts to log in:

  • Check if the identity's account is locked by checking the active field in the identities table.
  • If the active field is set to 'false', deny the login attempt.
  • Else, check the identity's credentials.
  • If the credentials are valid, insert a new record into the login_history table with the identity_id, login_time, and login_status set to 'success'.
  • If the credentials are invalid, insert a new record into the login_history table with the identity_id, login_time, and login_status set to failure.
  • Check the number of recent failed login attempts for the user.
  • If the number of failed attempts exceeds a pre-defined threshold in the config, lock the user's account by updating the active field in the identities table to false.

How to track consecutive login attempts:

  • Query the login history table to retrieve all the failed login attempts for a specific identity that occurred within a pre-defined period of time (consecutive_login_interval of e.g. 5 minutes).
  • Compare the count of the returned rows with a pre-defined threshold.
  • If the count exceeds the threshold, lock the account.
  • Flush login history once the account is unlocked.

Degree of constraint

No response

Alternatives considered

No response

@atreya2011 atreya2011 added the rfc A request for comments to discuss and share ideas. label Jan 19, 2023
@aeneasr
Copy link
Member

aeneasr commented Jan 20, 2023

Thank you for this design proposal! It is well written! I do have some points here:

If the number of login attempts exceeds a specified threshold within the time period, the user will be locked out of the account.

The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.

If the number of failed attempts exceeds a pre-defined threshold in the config, lock the user's account by updating the active field in the identities table to false.

Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.

If the credentials are valid, insert a new record into the login_history table with the identity_id, login_time, and login_status set to 'success'.

My recommendation would be to have a table with the following layout:

CREATE TABLE selfservice_rate_limits (
  id UUID PRIMARY KEY
  nid UUID
  identity_id UUID NOT NULL REFERENCES ...
  attempts int NOT NULL
  last_attempt TIMESTAMP NOT NULL
)

And then you would do UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ... depending on how long ago last_attempt was. This will make reading the table much less expensive because we do not need to make a range query SELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15 but instead query a single row SELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1.

At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!

@atreya2011
Copy link
Contributor Author

atreya2011 commented Jan 21, 2023

@aeneasr

Thank you for your feedback and suggestions!

The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.

What about a config like this?

rate_limit:
  max_duration: 30s
  backoff_factor: 2
  max_wait_time: 10s
  max_login_attempts: 5
  login_attempt_window: 1m

If we introduce a max_wait_time parameter, the logic for rate limiting would be something like the below, based on the sample configuration above. Let me know if I am going in the right direction 🙏🏼

  • The user will have to wait for 10 seconds before trying to login after each failed login attempt.
  • Assuming the user fails 5 login attempts set by max_login_attempts within a span of 50 seconds, the next failed login attempt within the 1 minute login_attempt_window will result in the user's account being locked out.
  • The user is free to adjust the above parameters depending on the level of security that is desirable for them.

Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.

We could use a table with a design as follows:

Option 1

CREATE TABLE lockouts (
    id UUID PRIMARY KEY
    identity_id UUID NOT NULL REFERENCES ...
    lock_time TIMESTAMP NOT NULL,
    reason ENUM('rate_limiting', 'admin_action') NOT NULL
);

it may be beneficial to consider creating a separate table if the relationship between the identities table and the lockouts table is intended to be a one-to-many relationship. An alternative solution could be to incorporate the following columns into the selfservice_rate_limits table, rather than creating a separate one.

Option 2

locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,

As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as identities/{identity_id}/unlock to unlock the account.

My recommendation would be to have a table with the following layout:

CREATE TABLE selfservice_rate_limits (
  id UUID PRIMARY KEY
  nid UUID
  identity_id UUID NOT NULL REFERENCES ...
  attempts int NOT NULL
  last_attempt TIMESTAMP NOT NULL
)

And then you would do UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ... depending on how long ago last_attempt was. This will make reading the table much less expensive because we do not need to make a range query SELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15 but instead query a single row SELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1.

At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!

Thank you for providing the table design and sample query 🙇🏼‍♂️ I was also considering the same concept.

Could you please provide some clarification regarding the following 🙏🏼

  • The role of nid in the selfservice_rate_limits table?
  • Is it a one-to-one relationship between the identities table and the selfservice_rate_limits table?

depending on how long ago last_attempt was.

  • Can we use the max_wait_time in the config to check how long ago the last_attempt was?

@aeneasr
Copy link
Member

aeneasr commented Jan 26, 2023

@aeneasr

Thank you for your feedback and suggestions!

The problem with this is that you can effectively DoS accounts and lock them by trying to repeatedly log in. To avoid this, it is recommended to set a maximum wait time (e.g. 10s) which makes brute forcing impractical, but also prevents DoSing.

What about a config like this?

rate_limit:
  max_duration: 30s
  backoff_factor: 2
  max_wait_time: 10s
  max_login_attempts: 5
  login_attempt_window: 1m

I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?

If we introduce a max_wait_time parameter, the logic for rate limiting would be something like the below, based on the sample configuration above. Let me know if I am going in the right direction 🙏🏼

  • The user will have to wait for 10 seconds before trying to login after each failed login attempt.
  • Assuming the user fails 5 login attempts set by max_login_attempts within a span of 50 seconds, the next failed login attempt within the 1 minute login_attempt_window will result in the user's account being locked out.

I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:

  1. failed attempt user has to wait 100ms
  2. failed attempt user has to wait 1s
  3. failed attempt user has to wait 2s
  4. failed attempt user has to wait 4s
  5. failed attempt user has to wait 8s
  6. failed attempt user has to wait 16s
  7. failed attempt user has to wait 16s
  8. failed attempt user has to wait 16s
  9. failed attempt user has to wait 16s
  • The user is free to adjust the above parameters depending on the level of security that is desirable for them.

Which process activates the identity and how does it know whether the rate limiting has disabled it, or whether it was disabled by an administrator? I would recommend to not use the active field for this but instead use the additional table.

We could use a table with a design as follows:

Option 1

CREATE TABLE lockouts (
    id UUID PRIMARY KEY
    identity_id UUID NOT NULL REFERENCES ...
    lock_time TIMESTAMP NOT NULL,
    reason ENUM('rate_limiting', 'admin_action') NOT NULL
);

That sounds good! Keep in mind that not all databases support ENUMs.

it may be beneficial to consider creating a separate table if the relationship between the identities table and the lockouts table is intended to be a one-to-many relationship. An alternative solution could be to incorporate the following columns into the selfservice_rate_limits table, rather than creating a separate one.

I think it makes sense to keep the business logic around this in one table. This will help with efficiency!

Option 2

locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,

Could you elaborate on this a bit more, I'm not sure in what context to put this?

As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as identities/{identity_id}/unlock to unlock the account.

That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?

My recommendation would be to have a table with the following layout:

CREATE TABLE selfservice_rate_limits (
  id UUID PRIMARY KEY
  nid UUID
  identity_id UUID NOT NULL REFERENCES ...
  attempts int NOT NULL
  last_attempt TIMESTAMP NOT NULL
)

And then you would do UPDATE ... SET attempts=attempts+1, last_attempt=NOW(), ... depending on how long ago last_attempt was. This will make reading the table much less expensive because we do not need to make a range query SELECT ... FROM rate_limits WHERE identity_id = ... AND login_time LIMIT 15 but instead query a single row SELECT ... FROM selfservice_rate_limits WHERE identity_id=... LIMIT 1.
At scale, range queries are very expensive and it's always better to try and do single selects whenever possible!

Thank you for providing the table design and sample query 🙇🏼‍♂️ I was also considering the same concept.

Could you please provide some clarification regarding the following 🙏🏼

  • The role of nid in the selfservice_rate_limits table?

This is an internal value which will help (in the future) migrate between ory self-hosted and the ory network.

  • Is it a one-to-one relationship between the identities table and the selfservice_rate_limits table?

I think that makes sense, I don't think we need a 1:n relation for it - 1:n relations also are range queries which are significantly slower than unique (secondary) key reads.

depending on how long ago last_attempt was.

  • Can we use the max_wait_time in the config to check how long ago the last_attempt was?

I'd probably use fixed values for now, to not complicate the kratos config further

@tacurran
Copy link
Member

@atreya2011 from Slack 20230128 "What are Ory's ways of challenging suspicious logins for example:
a. logins from new locations
b. too many failed attempts
c. etc. note from tacurran - this could include blacklisted devices, new credentials, weak credentials

These features will require some advanced data analysis, perhaps ML, and should be included in this issue.

@atreya2011
Copy link
Contributor Author

atreya2011 commented Jan 30, 2023

@aeneasr

What about a config like this?

rate_limit:
  max_duration: 30s
  backoff_factor: 2
  max_wait_time: 10s
  max_login_attempts: 5
  login_attempt_window: 1m

I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?

https://developers.cloudflare.com/waf/rate-limiting-rules/best-practices/
https://developer.okta.com/docs/reference/rl-best-practices/

I found 2 articles that discuss best practices for rate limiting in Cloudflare WAF and Okta. They emphasize the importance of customizing rate-limiting settings to match the specific needs of a website or API and suggest using a combination of IP and user identity to establish unique rate limits. Both articles advise on how to monitor the effectiveness of rate limiting and make adjustments as needed to prevent DDoS attacks and other malicious traffic. Unfortunately, they don't give specific advice on default values. I will keep searching...

I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:

  1. failed attempt user has to wait 100ms
  2. failed attempt user has to wait 1s
  3. failed attempt user has to wait 2s
  4. failed attempt user has to wait 4s
  5. failed attempt user has to wait 8s
  6. failed attempt user has to wait 16s
  7. failed attempt user has to wait 16s
  8. failed attempt user has to wait 16s
  9. failed attempt user has to wait 16s

I understand the logic you've presented. Additionally, I have come across two articles that suggest setting a lockout threshold of 10 attempts or more as best practice for account lockout in Active Directory. Based on the logic discussed, it would be advisable to set the threshold at or above 10 to prevent the risk of a malicious user causing a lockout. We could keep the default value of 0 to signify no lockout.

https://www.manageengine.com/products/active-directory-audit/kb/best-practices/account-lockout-best-practices.html#:~:text=The%20recommended%20threshold%20is%2015%20to%2050.

https://www.netwrix.com/account_lockout_best_practices.html

CREATE TABLE lockouts (
    id UUID PRIMARY KEY
    identity_id UUID NOT NULL REFERENCES ...
    lock_time TIMESTAMP NOT NULL,
    reason ENUM('rate_limiting', 'admin_action') NOT NULL
);

That sounds good! Keep in mind that not all databases support ENUMs.

Thank you! If I understand correctly, Kratos utilizes go-buffalo/pop as the underlying package for handling database-related operations. Therefore, it would be appropriate to design the table as a Go struct, correct?

I think it makes sense to keep the business logic around this in one table. This will help with efficiency!

Option 2

locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,

Could you elaborate on this a bit more, I'm not sure in what context to put this?

I apologize for any confusion. I had suggested adding the lockout-related parameters to the selfservice_rate_limits table, so that all the information would be in one place. I would be grateful to learn your thoughts regarding the following database design.

CREATE TABLE selfservice_rate_limits (
   id UUID PRIMARY KEY
   nid UUID
   identity_id UUID NOT NULL REFERENCES ...
   attempts int NOT NULL
   last_attempt TIMESTAMP NOT NULL
   locked BOOLEAN
   lock_time TIMESTAMP NOT NULL,
   reason varchar NOT NULL,
)

As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as identities/{identity_id}/unlock to unlock the account.

That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?

I don't have a particular preference for this. But having an unlock endpoint for an admin to unlock a user account does seem clear and straightforward.
Alternatively, we could introduce an endpoint, such as /identities/{identity_id}/rate_limit, and send a PATCH request with the necessary parameters to unlock a user account.

  • The role of nid in the selfservice_rate_limits table?

This is an internal value which will help (in the future) migrate between ory self-hosted and the ory network.

Thank you for the clarification!

  • Can we use the max_wait_time in the config to check how long ago the last_attempt was?

I'd probably use fixed values for now, to not complicate the kratos config further

I understand, that seems perfectly reasonable.

@atreya2011
Copy link
Contributor Author

atreya2011 commented Jan 30, 2023

@atreya2011 from Slack 20230128 "What are Ory's ways of challenging suspicious logins for example:
a. logins from new locations
b. too many failed attempts
c. etc. note from tacurran - this could include blacklisted devices, new credentials, weak credentials

These features will require some advanced data analysis, perhaps ML, and should be included in this issue.

@tacurran

Thank you for bringing these points to my attention 🙏🏼

Regarding point A, we could enhance security by combining Two-factor authentication (2FA), which is already available in Kratos, with monitoring login activity from new locations and setting up alerts / requiring a one-time code input for any unusual behaviour. However, I believe that addressing this point deserves a separate discussion (issue), as it does not directly relate to rate-limiting and account lockout discussed in this design proposal 🙇🏼‍♂️

Regarding point B, I have submitted a design proposal in this issue to implement account lockout after a specified number of failed login attempts, as well as rate-limiting consecutive login attempts.

Would you be able to provide additional details regarding point C? Is it related to creating an endpoint to counteract point A, in order for an admin to blacklist devices/IPs and prevent malicious activities?

@aeneasr
Copy link
Member

aeneasr commented Jan 31, 2023

@aeneasr

What about a config like this?

rate_limit:
  max_duration: 30s
  backoff_factor: 2
  max_wait_time: 10s
  max_login_attempts: 5
  login_attempt_window: 1m

I think this is great! For the first step, I think it makes sense to just choose sensible defaults. Adding more configuration options is something we'll consider if enough people want to change this (for good reason). Maybe there's a blog article from a security expert that can help use choose correct values?

https://developers.cloudflare.com/waf/rate-limiting-rules/best-practices/ https://developer.okta.com/docs/reference/rl-best-practices/

I found 2 articles that discuss best practices for rate limiting in Cloudflare WAF and Okta. They emphasize the importance of customizing rate-limiting settings to match the specific needs of a website or API and suggest using a combination of IP and user identity to establish unique rate limits. Both articles advise on how to monitor the effectiveness of rate limiting and make adjustments as needed to prevent DDoS attacks and other malicious traffic. Unfortunately, they don't give specific advice on default values. I will keep searching...

I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:

  1. failed attempt user has to wait 100ms
  2. failed attempt user has to wait 1s
  3. failed attempt user has to wait 2s
  4. failed attempt user has to wait 4s
  5. failed attempt user has to wait 8s
  6. failed attempt user has to wait 16s
  7. failed attempt user has to wait 16s
  8. failed attempt user has to wait 16s
  9. failed attempt user has to wait 16s

I understand the logic you've presented. Additionally, I have come across two articles that suggest setting a lockout threshold of 10 attempts or more as best practice for account lockout in Active Directory. Based on the logic discussed, it would be advisable to set the threshold at or above 10 to prevent the risk of a malicious user causing a lockout. We could keep the default value of 0 to signify no lockout.

That sounds good to me!

https://www.manageengine.com/products/active-directory-audit/kb/best-practices/account-lockout-best-practices.html#:~:text=The%20recommended%20threshold%20is%2015%20to%2050.

https://www.netwrix.com/account_lockout_best_practices.html

CREATE TABLE lockouts (
    id UUID PRIMARY KEY
    identity_id UUID NOT NULL REFERENCES ...
    lock_time TIMESTAMP NOT NULL,
    reason ENUM('rate_limiting', 'admin_action') NOT NULL
);

That sounds good! Keep in mind that not all databases support ENUMs.

Thank you! If I understand correctly, Kratos utilizes go-buffalo/pop as the underlying package for handling database-related operations. Therefore, it would be appropriate to design the table as a Go struct, correct?

That's correct! Please keep in mind that enum is not supported in all databases, so you'll probably need to implement the enum in Go code :)

I think it makes sense to keep the business logic around this in one table. This will help with efficiency!

Option 2

locked BOOLEAN
lock_time TIMESTAMP NOT NULL,
reason ENUM('rate_limiting', 'admin_action') NOT NULL,

Could you elaborate on this a bit more, I'm not sure in what context to put this?

I apologize for any confusion. I had suggested adding the lockout-related parameters to the selfservice_rate_limits table, so that all the information would be in one place. I would be grateful to learn your thoughts regarding the following database design.

CREATE TABLE selfservice_rate_limits (
   id UUID PRIMARY KEY
   nid UUID
   identity_id UUID NOT NULL REFERENCES ...
   attempts int NOT NULL
   last_attempt TIMESTAMP NOT NULL
   locked BOOLEAN
   lock_time TIMESTAMP NOT NULL,
   reason varchar NOT NULL,
)

That looks very good to me! Please don't forget to add an index over nid, identity_id to make look up fast :) I'm not sure if we need lock_time (as we have last_attempt which should be equal to lock_time) but generally this looks great!

As for the process for unlocking the identity after the lockout, we could add an admin-only endpoint such as identities/{identity_id}/unlock to unlock the account.

That's a great idea! It would be great to keep the naming consistent. So for example use "lock/unlock" everywhere, or "rate limit", or "throttling" or something else. Do you have a preference?

I don't have a particular preference for this. But having an unlock endpoint for an admin to unlock a user account does seem clear and straightforward. Alternatively, we could introduce an endpoint, such as /identities/{identity_id}/rate_limit, and send a PATCH request with the necessary parameters to unlock a user account.

Good question! I think we have several options. From a pure REST perspective, we try to modify resources in plural:

  • DELETE /identities/<id>/locks
  • PATCH /identities/<id>/rate-limits
  • PATCH /identities/<id>/throttles

Alternatively we could use verbs like but that would not be fully "REST"y:

  • PATCH /identities/<id>/unlock
  • PATCH /identities/<id>/unblock

I think that "block" and "unblock" are misleading, because "blocking" a user typically involves administrative intervention (user is malicious -> block them) whereas here we throttle/limit user authentication. What do you think of the following:

PATCH /identities/<id>/rate-limits?flow=login

and a table of

CREATE TABLE selfservice_login_rate_limits () ...

or alternatively a table with the design above and an additional field to deal with rate limits in the future for recovery or verification flows:

CREATE TABLE selfservice_rate_limits (
   ...
   flow_type ENUM('login', 'recovery', ...)
   ...
)

@supercairos
Copy link
Contributor

Super excited about this being implemented.

Let me know If you need any help implementing it. @jossbnd and I can help if needed ✋

@atreya2011
Copy link
Contributor Author

@aeneasr

Thank you for all the feedback! Let me summarize our discussion and, if everything looks good, I'll proceed with the implementation 🙂

Good question! I think we have several options. From a pure REST perspective, we try to modify resources in plural:

  • DELETE /identities/<id>/locks
  • PATCH /identities/<id>/rate-limits
  • PATCH /identities/<id>/throttles

Alternatively we could use verbs like but that would not be fully "REST"y:

  • PATCH /identities/<id>/unlock
  • PATCH /identities/<id>/unblock

I think that "block" and "unblock" are misleading, because "blocking" a user typically involves administrative intervention (user is malicious -> block them) whereas here we throttle/limit user authentication. What do you think of the following:

PATCH /identities/<id>/rate-limits?flow=login

and a table of

CREATE TABLE selfservice_login_rate_limits () ...

or alternatively a table with the design above and an additional field to deal with rate limits in the future for recovery or verification flows:

CREATE TABLE selfservice_rate_limits (
   ...
   flow_type ENUM('login', 'recovery', ...)
   ...
)

I really like the idea of PATCH /identities/<id>/rate-limits?flow=login since, as you pointed out, we can use it as a base for rate-limiting the other flows as well. I have updated the table design based on your feedback, including the removal of the lock_time field and the addition of the flow_type field. Understood regarding implementation of enums in Go code :)

CREATE TABLE selfservice_rate_limits (
    id UUID PRIMARY KEY,
    nid UUID,
    identity_id UUID NOT NULL REFERENCES identities,
    attempts int NOT NULL,
    last_attempt TIMESTAMP NOT NULL,
    locked BOOLEAN,
    reason ENUM('rate_limiting', 'admin_action') NOT NULL,
    flow_type ENUM('login', 'recovery', ...)
);

CREATE INDEX idx_identity_id ON selfservice_rate_limits (identity_id);
CREATE INDEX idx_nid ON selfservice_rate_limits (nid);

I have also simplified the configuration related to rate limiting, based on your feedback. Here is the updated version:

rate_limit:
    max_duration: 30s (default)
    backoff_factor: 2 (default)
    max_attempts_before_lockout: 0 (default)

@atreya2011
Copy link
Contributor Author

atreya2011 commented Feb 2, 2023

Super excited about this being implemented.

Let me know If you need any help implementing it. @jossbnd and I can help if needed hand

@supercairos
Thank you for reaching out! I am planning to start working on it once I get the final approval from @aeneasr.
I would greatly appreciate it if you could provide feedback on my pull request once I submit it. Thank you 🙏🏼

@aeneasr
Copy link
Member

aeneasr commented Feb 2, 2023

Awesome, that's a great summary! A few more points (but we're close!)

I'm not quite sure if I understand what max_attempts_before_lockout governs, could you briefly explain why we need this configuration setting?

I thought a bit more about the table. What do you think about this:

CREATE TABLE selfservice_rate_limits (
    id UUID PRIMARY KEY,
    nid UUID,
    identity_id UUID NOT NULL REFERENCES identities,
    attempts int NOT NULL,
    next_possible_attempt TIMESTAMP NOT NULL
);

I don't think we need the enums right now, reason currently has only one value (too many login attempts) and flow_type is not yet relevant.

In business logic:

  1. If next_possible_attempt is in the past, then login is permitted
  2. If login fails, attempts is increased by one, and next_possible_attempt is updated to a future timestamp (e.g. in 30 seconds)
  3. If login succeeds, next_possible_attempt is set sufficiently to the past (to prevent clock skew issues) and attempts is set to 0. Alternativley the row could be deleted but that could be problematic with transaction contention and/or multi-region databases.

My recommendation would be the following configuration layout:

selfservice:
  flows:
    login:
      rate_limit:
        max_duration: 30s
        backoff_factor: 2
        # potentially max_attempts_before_lockout

This would make the semantics very clear - this rate limit config is for login :)

@atreya2011
Copy link
Contributor Author

atreya2011 commented Feb 3, 2023

@aeneasr

Thank you for the feedback!

While last_attempt is a clear and straightforward option, using next_possible_attempt to determine if a user is allowed to log in works well too. I am flexible and open to either approach.

The purpose of max_attempts_before_lockout is to determine the maximum number of attempts allowed before an account lockout occurs, where login would be disallowed until manually reset by an admin.

We could prevent a malicious user from making too many failed login attempts this way.

The following articles suggest that a value of 10 or higher would be appropriate for max_attempts_before_lockout. We could set the default value to 0, signifying no lockout, and offer the flexibility for the user to adjust it as they see fit in the config :)

https://www.manageengine.com/products/active-directory-audit/kb/best-practices/account-lockout-best-practices.html#:~:text=The%20recommended%20threshold%20is%2015%20to%2050
https://www.netwrix.com/account_lockout_best_practices.html

Please let me clarify how max_attempts_before_lockout would work, by making a slight modification to your proposed business logic and taking into consideration your proposed method for calculating the backoff duration.

I would recommend an linear/exponential backoff with a fixed maxmimum - so something along the lines of:

failed attempt user has to wait 100ms
failed attempt user has to wait 1s
failed attempt user has to wait 2s
failed attempt user has to wait 4s
failed attempt user has to wait 8s
failed attempt user has to wait 16s
failed attempt user has to wait 30s
failed attempt user has to wait 30s
failed attempt user has to wait 30s

  1. If next_possible_attempt is in the past, then login is permitted.
  2. If login fails, attempts is increased by one, and next_possible_attempt is updated using an exponential backoff algorithm, with a cap at max_duration.
  3. If login succeeds, next_possible_attempt is set to a time sufficiently in the past, and in the case of Go, would be set to time.Time{} (to prevent clock skew issues) and attempts is set to 0.
  4. If attempts exceed max_attempts_before_lockout, the user will be locked out and unable to login again until the admin resets the max_attempts to 0.
  5. The admin could do this, for example, with a PATCH /identities/<id>/rate-limits and a payload of {"attempts": 0} to unlock the user.

@aeneasr
Copy link
Member

aeneasr commented Feb 3, 2023

Thank you very much for the explanation around max_attempts_before_lockout. I think that there are several things missing to make max_attempts_before_lockout work really well:

  1. Automated reporting for the administrator telling them that a user was locked out
  2. A way for the user to contest the lock out (why should I not be able to sign in when someone else nuked my account?)
  3. Convenient way to fetch all locked accounts easily

Since that involves a lot of work, my suggestion is to keep this feature out of scope for now and focus on the pure implementation of this feature. If an administrator wants to lock out a user, they can also set the time quite high (an hour or more) which effectively achieves the same thing. WDYT?

@atreya2011
Copy link
Contributor Author

atreya2011 commented Feb 3, 2023

@aeneasr

I understood points 1 and 3.

Automated reporting for the administrator telling them that a user was locked out

We can use the courier in kratos to automatically send an email to the administrator regarding the lockout.

Convenient way to fetch all locked accounts easily

If we add an additional locked BOOL column in the table, it will become easy to query and fetch all locked accounts.

A way for the user to contest the lock out (why should I not be able to sign in when someone else nuked my account?)

Can you confirm if my understanding regarding point 2 is correct?
The idea behind point 2 is to provide a mechanism for the user to contest the lock-out if they believe it was unjustified. This could involve:

  • a user-initiated process, such as submitting a request through the UI
  • in case the user is locked out kratos sends a JSON response for rendering a contact form so that the user can contact the administrator
  • we should be able to configure an administrator email address in the kratos config

If an administrator wants to lock out a user, they can also set the time quite high (an hour or more) which effectively achieves the same thing. WDYT?

Yea this is also possible. While manual tracking by an administrator is possible, implementing a max_attempts_before_lockout feature would ensure automatic lockout, making it more convenient for the administrator as they wouldn't need to keep track of multiple failed login attempts from a user.

Since that involves a lot of work, my suggestion is to keep this feature out of scope for now and focus on the pure implementation of this feature.

Thank you for your input! Once I have completed the implementation of the rate-limiting feature, I will take your suggestions into consideration and prepare a revised proposal for implementing the account lockout feature, including all three points you mentioned (if that is okay with you 🙂). Shall I go ahead with the implementation of the rate-limiting feature?

@aeneasr
Copy link
Member

aeneasr commented Feb 4, 2023

Epic, I think we're fully aligned. Thank you for your patience in this process - it definitely helps align on the vision for the feature and will make implementation much easier! Let's go! :)

@atreya2011
Copy link
Contributor Author

@aeneasr

Awesome! Likewise, I thoroughly enjoyed the brainstorming process and am grateful for your feedback and approval 🙏🏼
I will proceed with implementing the discussed design 🙇🏼

@DEVXIX
Copy link

DEVXIX commented Feb 11, 2023

@aeneasr

Awesome! Likewise, I thoroughly enjoyed the brainstorming process and am grateful for your feedback and approval 🙏🏼 I will proceed with implementing the discussed design 🙇🏼

Hi, I'm kinda interested in the feature, do you guys have any ETA for shipping it?

@atreya2011
Copy link
Contributor Author

@Dparty
I am in the same boat as well. My apologies for not being able to give you an exact estimate at this time. I understand that this may be disappointing, but please know that I am actively working on it. Thank you for your understanding 🙏🏼

@credcore-dave
Copy link

credcore-dave commented Mar 6, 2023

Awesome you are working on this @atreya2011 - I think it'll be a very valuable feature, and the org I work for also needs it. I'm not that experienced with go but if I can help with implementing or testing anything, let me know.

@atreya2011
Copy link
Contributor Author

@credcore-dave
Thank you! If all goes well I should be able to get a PR up by end of this month :)

@Robert-Bosse
Copy link

Hello all.
I was reading the conversation going on here and I think what you suggested and refined for throttling repeated requests is a very good approach if a malicious actor is trying to get access to a specific account.
Do you also have an idea what to do about credential stuffing where an attacker is using (e.g. pastebin) lists of usernames/emails and passwords to try and check if he can find accounts to take over easily? Since he will only use every username/email once there is another approach needed for this attack vector.

@dbhobbs
Copy link
Contributor

dbhobbs commented Mar 22, 2023

Hello all. I was reading the conversation going on here and I think what you suggested and refined for throttling repeated requests is a very good approach if a malicious actor is trying to get access to a specific account. Do you also have an idea what to do about credential stuffing where an attacker is using (e.g. pastebin) lists of usernames/emails and passwords to try and check if he can find accounts to take over easily? Since he will only use every username/email once there is another approach needed for this attack vector.

@Robert-Bosse I think this issue might be what you're looking for?

@vinckr
Copy link
Member

vinckr commented Mar 22, 2023

@Robert-Bosse
Ory Kratos uses the haveibeenpwned API to check against a db of leaked passwords, you can also host your own version of it. I guess this mostly helps against current/past leaks, not so much against future or secret leaks.
You can also enforce Multifactor AuthN with TOTP or WebAuthn.
CAPTCHA or IP blocking only helps a bit against this attack, by far the most effective method is MFA.
Ory Network also implements capabilities to fingerprint sessions to devices, that is also pretty effective.
/I think the CSRF protection also helps, at least OWASP mentions it.

Happy to continue this in a dedicated discussion, to not derail this issue further

@Oscmage
Copy link

Oscmage commented Jun 5, 2023

What is the status of the implementation for this? I can't find any PR or work in progress. Would love this feature and would happily contribute if assistance is needed.

@atreya2011
Copy link
Contributor Author

@Oscmage Work is progressing on this slowly. My apologies for the wait 🙇‍♂️

@Oscmage
Copy link

Oscmage commented Jun 5, 2023

@atreya2011 No worries at all! Sorry if it came out like that. Happy to wait/assist :) Was just curious since this was one of the features that we were comparing our current Firebase/Google Identity Platform solution with.

@SeanTasker
Copy link

Similar situation here for us. Looking forward to this functionality. We've been trying to solve this problem for a couple of months but there hasn't been an elegant solution externally from Kratos. Is there anything that I could do to provide assistance, we've got a few developers that would be happy to contribute :)

@atreya2011
Copy link
Contributor Author

atreya2011 commented Jul 1, 2023

Similar situation here for us. Looking forward to this functionality. We've been trying to solve this problem for a couple of months but there hasn't been an elegant solution externally from Kratos. Is there anything that I could do to provide assistance, we've got a few developers that would be happy to contribute :)

@SeanTasker
Personal commitments were slowing down my progress and I am hoping to make a PR this month sometime 🙏🏼
Do you have a fork where your team has been trying to solve this?

@SeanTasker
Copy link

No problem at all! And apologies for my slow response. Our internal issue tracker was down and I wanted to check a couple of things as we originally discussed a number of options for our login page. In the end we settled on using this template https://github.com/timalanfarrow/kratos-selfservice-ui-vue3-typescript to build our own login and account creation pages. So originally we tried to solve this with a captcha, however it turned out to be insufficient as we needed to store some state server side.

We haven't forked Kratos (yet) - that's why I am here, to determine if forking will get us closer to a solution sooner. We were thinking about adding some additional hooks that would let us implement additional login restriction checks.

Let me know if we can help. We're not super familiar with the Kratos code base right now so would just need some direction to help us know where to focus our attention.

@bradleyball
Copy link

@atreya2011 any update on this functionality? Thanks!

@frederikhors
Copy link
Contributor

I think should be prioritized. We had problems with this in the past days. This is so necessary. We are thinking to change Kratos for this reason. We are very scared.

@atreya2011
Copy link
Contributor Author

@bradleyball @frederikhors
Progress is plodding on this due to many personal and professional commitments. I am looking at probably the end of this year or hopefully by January 2024 to get a PR up.
However, it seems like a new approach is being discussed here? #3580

@frederikhors
Copy link
Contributor

@atreya2011 I think this should be done from Ory Kratos staff. Is not a nice-to-have feature. This is a security MUST.

@atreya2011
Copy link
Contributor Author

@frederikhors
@aeneasr will carefully review the code before merging it to master so you don’t have to worry about who works on this issue 🙂

@frederikhors
Copy link
Contributor

No, I didn't explain myself correctly. I mean it's urgent to do. Because without it there is a BIG RISK trying a login many times until you find the correct password.

@Robert-Bosse
Copy link

@frederikhors Yes. I second that. It is something that is urgently needed.

@kmherrmann
Copy link
Contributor

This is a top priority for the Ory team and we're prioritizing protections. I'll update this issue with any upcoming changes.

@frederikhors
Copy link
Contributor

@kmherrmann any ETA? Even if not exact?

@kirill-gerasimenko-da
Copy link

@kmherrmann any ETA? Even if not exact?

It would be great to hear back on this.

@Robert-Bosse
Copy link

@kmherrmann any news for our top priority project here?

@aeneasr
Copy link
Member

aeneasr commented Jan 10, 2024

Hello, we have decided not to work on this as part of Ory Kratos. Rate limiting, credentials stuffing, IP rate limiting across multiple nodes, and DoS prevention are very difficult problems to solve (typically cat and mouse type problems) and it makes much more sense to solve them on an operational level with things like Gateway Ratelimiters, JA3 Fingerprinting, Anti Bot detection, API firewalls or services like Cloudflare or Akamai.

We have solved these things in Ory Network using a variety of tools, but that also implies that we are not implementing this in Ory Kratos itself.

Sorry for anyone who was waiting for this here! Thank you for your understanding.

@atreya2011
Copy link
Contributor Author

@aeneasr

Thank you for the clarification! Does this mean any PR related to this the feature won’t be accepted? Or does it mean that ORY won’t officially work on this feature? If ORY won’t officially work on this feature but is willing to accept a PR based on the design proposal above, I can pick up my implementation from where I left it off.

@frederikhors
Copy link
Contributor

Hello, we have decided not to work on this as part of Ory Kratos. Rate limiting, credentials stuffing, IP rate limiting across multiple nodes, and DoS prevention are very difficult problems to solve (typically cat and mouse type problems) and it makes much more sense to solve them on an operational level with things like Gateway Ratelimiters, JA3 Fingerprinting, Anti Bot detection, API firewalls or services like Cloudflare or Akamai.

We have solved these things in Ory Network using a variety of tools, but that also implies that we are not implementing this in Ory Kratos itself.

Sorry for anyone who was waiting for this here! Thank you for your understanding.

Wow. This is huge.

I think we should at least add guides on docs for these serious problems.

If not, using Kratos or others is the same at this point.

@aeneasr
Copy link
Member

aeneasr commented Jan 10, 2024

We would certainly accept PRs if the code changes are not too huge and do not require significant maintenance on our end. They should also be scoped to tackle one problem only (eg account locking on repeatedly failed attempts). Anything that is IP related should be out of scope (such as credentials stuffing) in my view as that should be dealt with on a rate limiter service.

@atreya2011
Copy link
Contributor Author

@aeneasr

Thank you for the swift response.

That’s good to hear. However, the original design proposal was for the rate limiting feature based on identity ID (not IP address) and then implement account lock in another PR.

Is that still okay?

@Robert-Bosse
Copy link

@aeneasr How do we solve credential stuffing by a bot net where no IP is used more than a handful of times?
We already had this kind of attack more than once.

@frederikhors
Copy link
Contributor

@aeneasr How do we solve credential stuffing by a bot net where no IP is used more than a handful of times? We already had this kind of attack more than once.

What are you using right now? i mean, Cloudflare? What else?

@Robert-Bosse
Copy link

no cloudflare. custom login solution. right now we check ratio of good to bad logins for activating a captcha if the ratio gets too low.

@Robert-Bosse
Copy link

Robert-Bosse commented Jan 22, 2024

@aeneasr
https://thehackernews.com/2024/01/microsofts-top-execs-emails-breached-in.html
If you have something to check good to bad login ratio you have a chance to see this attack ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A request for comments to discuss and share ideas.
Projects
None yet
Development

No branches or pull requests