-
-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random dealy in creating or fetching new secrets in namespace (between 4m-7m) #2837
Comments
Hey 👋, thanks for your report! From what i can see: the ExternalSecret had a
It would be great if you could capture the state of ExternalSecret/SecretStore in a broken condition and in a working condition. That would help immensly with the debugging. So, what could be the case is that the ExternalSecret may run into a exponential backoff or ESO's queue was too long and it was not able to process the resources in time (though 6min would be insanely long for that). |
Hi @moolen, logs for new running session with debug log level: |
I took a quick look at the provided information: Small remark: you have to look at the It took a little less than 4 minutes for it to be reconciled.
That leaves us with two assumptions:
ESO does only log if it finished reconiling (in both error / non-error cases). Both assumptions can be proven/disproven by looking at the metrics, see here. The interesting part is (1): workqueue_depth and (2): reconcile latency. |
Hey @moolen, i'm from @oranerez 's team. It looks like we cannot reconcile tha secrets in time, i know that we have a LOT of them. Thanks again! |
Increasing --concurrent should help you fix the symptoms, but the issue seems to be the high reconcile time. It looks like most of the ExternalSecret resources are not reconciled successfully. Increase concurrent slowly, otherwise you may hammer the upstream API and run into rate limits. See: https://external-secrets.io/latest/api/controller-options/ |
We can't increase the Is there anything else that can be done to resolve the issue? Our cluster contains 250-300 ExternalSecrets and we also experience a delay of up to 4 Minutes before the secret is created. However a different cluster with 80 ExternalSecret will have its secret created intstantly |
Stupid question, but can we increase ESO replica count? @moolen ? |
ESO runs with leader election and only a single replica will be reconciling. @Sartigan can you provide more information on how you set up ESO (helm values etc)? I guess it would be best to create a separate issue (or are you with OP?) Are you sure it is server-side? We also have client-side throttling which can be changed with What kind of APIs do you use in You can give |
|
Wait, did that resolve all the problems?! |
Our problem was the same as OP, when a cluster is having high number of The Adding |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
We are experiencing the same slowness -- we didn't try with Related: |
In our case, we are synchronozing an image pull secret and couple of other static secrets to all namespaces. We have ~800 namespaces, each having one ExternalSecret. We also tried by having only a single ClusterExternalSecret and annotated the namepsaces, but it didn't perform any better. The Secret synchronization speed and workqueue depth has become an issue when all the ExternalSecrets are to be refreshed at the same time. With leader election, only a single replica of external-secrets-operator processes the workqueue. This obviously becomes a bottleneck. In our use case the synced secret is static and will rarely change (rotation period is a matter of months).
We settled with the following, having RAM usage ~100Mi with 800 namespaces: - args:
- '--enable-leader-election=true'
- '--enable-secrets-caching=true'
- '--concurrent=5' In our case It seems Related issues |
Describe the bug
When creating new namespace observed that ClusterSecretStore is not creating/fetching secret based on refreshInterval (set to 10s) and there is somewhat of delay in creation of secrets (between 4minutes to 7 minutes).
The ExternalSecret manifest is created almost right away but the actual creation of the secret after fetching it from AWS Secret Manager takes some time (please see screenshot) - there is some minutes passing by till the refreshTime is getting populate which indicate the secret was fetch and created locally.
In the ESO log the following log takes the same amount of time till being reported:
{"level":"info","ts":1698923181.4201677,"logger":"controllers.ExternalSecret","msg":"reconciled secret","ExternalSecret":{"name":"artifactory-docker-registry","namespace":"oran-test9"}}
Till then the log is quite stable without any prompt for a change being made:
{"level":"info","ts":1698923177.8425815,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923177.8426237,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name1"} {"level":"info","ts":1698923177.8426342,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/secret/secret-name2","version":"AWSCURRENT","value":"SECRET"} {"level":"info","ts":1698923179.8481295,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923179.8481748,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name3"} {"level":"info","ts":1698923179.8481805,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/global/playwright-credentials","version":"AWSCURRENT","value":"SECRET"} {"level":"info","ts":1698923180.2435653,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923180.2436075,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name4"} {"level":"info","ts":1698923180.2436125,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/mongo/mongo4-user-and-password","version":"AWSCURRENT","value":"SECRET"} {"level":"info","ts":1698923180.4427361,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923180.44278,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name5"} {"level":"info","ts":1698923180.4427857,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/global/alertmanager-pagerduty","version":"AWSCURRENT","value":"SECRET"} {"level":"info","ts":1698923180.6423898,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923180.6424341,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name6"} {"level":"info","ts":1698923180.6424391,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/secret/secret-name7","version":"AWSCURRENT","value":"SECRET"} {"level":"info","ts":1698923180.844502,"logger":"provider.aws","msg":"using aws session","region":"us-east-1","external id":"","credentials":{}} {"level":"info","ts":1698923180.8445442,"logger":"provider.aws.secretsmanager","msg":"fetching secret map","key":"dev-us-east-1/secret/secret-name8"} {"level":"info","ts":1698923180.844549,"logger":"provider.aws.secretsmanager","msg":"fetching secret value","key":"dev-us-east-1/automation-tests-data-platform","version":"AWSCURRENT","value":"SECRET"}
To Reproduce
![image](https://private-user-images.githubusercontent.com/38244819/279960746-d419b6ac-e7fa-4b14-9f31-b7542245f454.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNjE2NDgsIm5iZiI6MTcyMjI2MTM0OCwicGF0aCI6Ii8zODI0NDgxOS8yNzk5NjA3NDYtZDQxOWI2YWMtZTdmYS00YjE0LTlmMzEtYjc1NDIyNDVmNDU0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI5VDEzNTU0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFkNzJkMjRjNGU5Zjc1Zjg3NWY3Zjc5NjVlNDQyYTgyOWVhY2I5Mzk1N2FmNGU5NWMxYWNiOWQ5ZmQ3Y2RhOGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Mcsl7KEfCH2_LCf1Q9IyBhpQ8YYaRj06z2kjCr8SkBU)
Setup ESO v0.9.7 (using helm-chart) as follow:
Create ClusterSecretStore (we are using Terraform for that one) as follow:
![image](https://private-user-images.githubusercontent.com/38244819/279962052-48a1abe8-8c97-40d3-baa6-167752e78d9e.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjIyNjE2NDgsIm5iZiI6MTcyMjI2MTM0OCwicGF0aCI6Ii8zODI0NDgxOS8yNzk5NjIwNTItNDhhMWFiZTgtOGM5Ny00MGQzLWJhYTYtMTY3NzUyZTc4ZDllLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MjklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzI5VDEzNTU0OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWUzODM5YzhkZGQ0ZTZiNzMzYTI2OTQwNDA4MTI3ZThiMDg1MTJjYTI4MmQxNmQ3NWQyMmY0ZTE3OWUyN2UzMzMmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.AXOxpG-ORgbIJIBoG0oGVDTUmaUXTG_7gkAwULXg6WE)
We are using ArgoCD and local helm-chart repository to bring all applications under new namespace, which in return create all the manifest - therefore create application which instruct creation of local secrets using ESO (secret provider in our scenario is AWS Secret Manager)
Kubernetes version: EKS 1.26
ESO version: 0.9.7
Expected behavior
Creation of new secrets locally using ESO with fetching from AWS Secret Manager should be based on refreshInterval or any other way as of now it is randomly take place (between 4 to 7 minutes) and create delay in creating new namespace which impact our test automation mechanism.
Screenshots
Attached inline
Additional context
N/A
The text was updated successfully, but these errors were encountered: