Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scaling section hydra #53

Closed
aeneasr opened this issue Sep 26, 2018 · 6 comments
Closed

Add scaling section hydra #53

aeneasr opened this issue Sep 26, 2018 · 6 comments

Comments

@aeneasr
Copy link
Member

aeneasr commented Sep 26, 2018

See https://community.ory.sh/t/high-availability-and-scaling/664/11

@ghost
Copy link

ghost commented Jun 14, 2019

We’re considering rolling out ory/hydra as our OpenID Connect and OAuth2 solution - as such, it would be great to understand what best practice is for deploying hydra in a Highly Available manner, in production.

(As per the comments in issue #772, ory/hydra appears to be running well in the wild - any real world HA examples of production hydra deployments would also be much appreciated /cc @rjw57 @dtt101 @pnicolcev-tulipretail)

@aeneasr
Copy link
Member Author

aeneasr commented Jun 14, 2019

Run it against a strong database with e.g. one or two read replicas and scale the hydra pod/container horizontally to e.g. 3 or 4 nodes. Nothing more required.

@ghost
Copy link

ghost commented Jun 17, 2019

Re. the above, we’ve experimented with scaling the number of ory/hydra pod replicas on Kubernetes, with a managed RDBMS specified in the DATABASE_URL/dsn, and have observed intermittent 401 responses when attempting to retrieve tokens:

description="Client authentication failed (e.g., unknown client, no client authentication included, or unsupported authentication method)" error=invalid_client

other than these intermittent errors, token requests are granted.

As issue #1319 highlights, this appears to be the case as

Hydra naturally introduces some level of state when being deployed, which is often not suitable for a Kubernetes environment.

Just to confirm that is indeed the case, and whether or not there are any configuration changes we could make in the meantime while

https://github.com/ory/hydra-k8s-controller

is under development (happy to move this discussion to issue #1319 for the sake of continuity, if that helps)?

/cc @kminehart

@aeneasr
Copy link
Member Author

aeneasr commented Jun 17, 2019

As issue #1319 highlights, this appears to be the case as

This comment is taken out of context and is in your interpretation not true.

Re. the above, we’ve experimented with scaling the number of ory/hydra pod replicas on Kubernetes, with a managed RDBMS specified in the DATABASE_URL/dsn, and have observed intermittent 401 responses when attempting to retrieve tokens:

We have observed this before. It was caused by a lack of resources, specifically CPU/memory of the pods and/or a very lightweight database. Specifically the token endpoint requires substantial CPU time as requests grow because of the bcrypt-hashed OAuth 2.0 Client password.

However, please create an issue on ORY Hydra GitHub. There is so much context missing, like logs, k8s config, db config, hydra config, dsn config, reproducible example, ... - without that it is impossible to help.

ORY Hydra is deployed in environments that handle > 500m API requests per month and works flawlessly when scaled horizontally.

@kminehart
Copy link
Contributor

@category I was merely talking about the database. Databases, especially postgres and mysql, are difficult to maintain in Kubernetes. The "state" introduced would be managed by a Kubernetes "StatefulSet". You yourself use a hosted solution for your database. Hydra itself however can be managed with stateless horizontal scaling.

The Kubernetes controller won't solve your problem, as Hydra will still need a place to store and manage requests, and Kubernetes resources are definitely not the place to do that.

@ghost
Copy link

ghost commented Jun 18, 2019

@aeneasr Thanks very much for the information and guidance here - I'll try adjusting the metrics provided with a view to creating an issue if the errors persist 😄.

@kminehart Thanks very much for the clarification, I wasn't aware that the issue was specifically describing the state introduced by databases deployed within Kubernetes 😄.

Lastly, thank you both for taking the time to reply here, I really appreciate it.

@aeneasr aeneasr closed this as completed Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants