New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No timeout for calls to rucio auth server #4012
Comments
For an estimate of the impact: since beginning of September, on avg there are around 4k cores on P1 occupied by pilots waiting for rucio auth until they hit the queue walltime limit |
@davidpob99 I am not sure if this ticket is still accurate. Please check with @rcarpa about it |
May still be relevant. I'll check |
I confirm that it's still relevant. Timeout is not set for x509 auth requests; and also in other places where session is used directly instead of passing via _send_request |
Motivation
The requests to the rucio auth server in rucio.client.baseclient don't have any timeout set. This may result in the rucio client hanging for a long time if the server is unavailable. Example below, where the first two auth attempts are aborted after 24h, and the authentication succeeds only on the 3rd attempt after 2 days:
https://aipanda175.cern.ch/condor_logs_1/20-09-14_23/grid.4087595.11.out
2020-09-15 02:24:42,199 | INFO | copytool_in | pilot.util.auxiliary.4841308600 | transfer | trying to use copytool=rucio for activity=['pr', 'default']
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no auth_type passed. Trying to get it from the environment variable RUCIO_AUTH_TYPE and config file.
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no creds passed. Trying to get it from the config file.
2020-09-15 02:24:43,535 | DEBUG | copytool_in | rucio.client.baseclient | init | no ca_cert passed. Trying to get it from the config file.
2020-09-15 02:24:43,536 | DEBUG | copytool_in | rucio.client.baseclient | init | no account passed. Trying to get it from the config file.
2020-09-15 02:24:43,536 | DEBUG | copytool_in | rucio.client.baseclient | __get_token | get a new token
2020-09-15 02:24:43,543 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (1): voatlasrucio-auth-prod.cern.ch:443
[...]
2020-09-16 02:25:04,730 | WARNING | copytool_in | rucio.client.baseclient | __get_token_x509 | ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2020-09-16 02:25:04,739 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (2): voatlasrucio-auth-prod.cern.ch:443
[...]
2020-09-17 02:25:19,010 | WARNING | copytool_in | rucio.client.baseclient | __get_token_x509 | ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2020-09-17 02:25:19,019 | DEBUG | copytool_in | urllib3.connectionpool | _new_conn | Starting new HTTPS connection (3): voatlasrucio-auth-prod.cern.ch:443
2020-09-17 02:25:19,112 | DEBUG | copytool_in | urllib3.connectionpool | _make_request | https://voatlasrucio-auth-prod.cern.ch:443 "GET /auth/x509_proxy HTTP/1.1" 200 0
2020-09-17 02:25:19,114 | DEBUG | copytool_in | rucio.client.baseclient | __get_token_x509 | got new token
Modification
Proposing to pass the rucio client timeout setting also to auth requests.
Expected result
Auth requests should time out after (by default) 10 mins
The text was updated successfully, but these errors were encountered: