Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No timeout for calls to rucio auth server #4012

Closed
nikmagini opened this issue Sep 18, 2020 · 4 comments · Fixed by #4729
Closed

No timeout for calls to rucio auth server #4012

nikmagini opened this issue Sep 18, 2020 · 4 comments · Fixed by #4729

Comments

@nikmagini
Copy link
Contributor

Motivation

The requests to the rucio auth server in rucio.client.baseclient don't have any timeout set. This may result in the rucio client hanging for a long time if the server is unavailable. Example below, where the first two auth attempts are aborted after 24h, and the authentication succeeds only on the 3rd attempt after 2 days:

https://aipanda175.cern.ch/condor_logs_1/20-09-14_23/grid.4087595.11.out

2020-09-15 02:24:42,199 | INFO | copytool_in | pilot.util.auxiliary.4841308600 | transfer | trying to use copytool=rucio for activity=['pr', 'default']
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no auth_type passed. Trying to get it from the environment variable RUCIO_AUTH_TYPE and config file.
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no creds passed. Trying to get it from the config file.
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no ca_cert passed. Trying to get it from the config file.
2020-09-15 02:24:43,536 | DEBUG | copytool_in | rucio.client.baseclient | init | no account passed. Trying to get it from the config file.
2020-09-15 02:24:43,536 | DEBUG | copytool_in | rucio.client.baseclient | __get_token | get a new token
2020-09-15 02:24:43,543 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (1): voatlasrucio-auth-prod.cern.ch:443
[...]
2020-09-16 02:25:04,730 | WARNING | copytool_in | rucio.client.baseclient | __get_token_x509 | ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2020-09-16 02:25:04,739 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (2): voatlasrucio-auth-prod.cern.ch:443
[...]
2020-09-17 02:25:19,010 | WARNING | copytool_in | rucio.client.baseclient | __get_token_x509 | ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2020-09-17 02:25:19,019 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (3): voatlasrucio-auth-prod.cern.ch:443
2020-09-17 02:25:19,112 | DEBUG | copytool_in | urllib3.connectionpool | _make_request | https://voatlasrucio-auth-prod.cern.ch:443 "GET /auth/x509_proxy HTTP/1.1" 200 0
2020-09-17 02:25:19,114 | DEBUG | copytool_in | rucio.client.baseclient | __get_token_x509 | got new token

Modification

Proposing to pass the rucio client timeout setting also to auth requests.

Expected result

Auth requests should time out after (by default) 10 mins

@nikmagini
Copy link
Contributor Author

nikmagini commented Sep 29, 2020

For an estimate of the impact: since beginning of September, on avg there are around 4k cores on P1 occupied by pilots waiting for rucio auth until they hit the queue walltime limit

@bari12 bari12 assigned davidpob99 and unassigned mlassnig Jul 2, 2021
@bari12
Copy link
Member

bari12 commented Jul 2, 2021

@davidpob99 I am not sure if this ticket is still accurate. Please check with @rcarpa about it

@rcarpa
Copy link
Contributor

rcarpa commented Jul 2, 2021

May still be relevant. I'll check

@rcarpa
Copy link
Contributor

rcarpa commented Jul 5, 2021

I confirm that it's still relevant. Timeout is not set for x509 auth requests; and also in other places where session is used directly instead of passing via _send_request

davidpob99 added a commit to davidpob99/rucio that referenced this issue Jul 9, 2021
bari12 pushed a commit that referenced this issue Jul 30, 2021
* Clients: timeout for calls to rucio auth server. Closes #4012

* Fix syntax problems
bari12 pushed a commit that referenced this issue Jul 30, 2021
* Clients: timeout for calls to rucio auth server. Closes #4012

* Fix syntax problems
@bari12 bari12 modified the milestones: 1.23.16-clients, 1.26.1-clients Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants