Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finisher doesn't handle correctly destination RSE protocol change #5497

Closed
rcarpa opened this issue Apr 22, 2022 · 0 comments
Closed

finisher doesn't handle correctly destination RSE protocol change #5497

rcarpa opened this issue Apr 22, 2022 · 0 comments
Assignees
Milestone

Comments

@rcarpa
Copy link
Contributor

rcarpa commented Apr 22, 2022

Motivation

2022-04-22 14:06:30,375 root    1       WARNING conveyor-finisher[18/43]: ERROR WHEN HANDLING REQUEST a658a73865fb4d848d388f85f37c954f DID tests:step14.53620.59502.recon.ESD.66267.51307 AT RSE 48c687eab02a4f3692f53419e958a5b3 STATE ReplicaState.AVAILABLE: Replica not found
Details: No row found for scope: tests name: step14.53620.59502.recon.ESD.66267.51307 rse: NDGF-T1_TEST
2022-04-22 14:06:30,375 root    1       INFO    conveyor-finisher[18/43]: Replica cannot be found. Adding a replica tests:step14.53620.59502.recon.ESD.66267.51307 AT RSE 48c687eab02a4f3692f53419e958a5b3 with tombstone=utcnow
2022-04-22 14:06:30,389 root    1       ERROR   conveyor-finisher[18/43]: Cannot register replica for DID tests:step14.53620.59502.recon.ESD.66267.51307 at RSE 48c687eab02a4f3692f53419e958a5b3 - potential dark data - RSE does not support requested protocol.
Details: No protocol for provided settings found : {'availability_delete': True, 'availability_read': True, 'availability_write': True, 'credentials': None, 'deterministic': True, 'domain': ['lan', 'wan'], 'id': '48c687eab02a4f3692f53419e958a5b3', 'lfn2pfn_algorithm': 'hash', 'protocols': [{'hostname': 'preprod-srm.ndgf.org', 'scheme': 'davs', 'port': 443, 'prefix': '/atlas/disk/atlasdatadisk/rucio/', 'impl': 'rucio.rse.protocols.gfal.Default', 'domains': {'lan': {'read': 1, 'write': 1, 'delete': 1}, 'wan': {'read': 1, 'write': 1, 'delete': 1, 'third_party_copy': 1, 'third_party_copy_read': 1, 'third_party_copy_write': 1}}, 'extended_attributes': None}], 'qos_class': None, 'rse': 'NDGF-T1_TEST', 'rse_type': 'DISK', 'sign_url': None, 'staging_area': False, 'verify_checksum': True, 'volatile': False, 'read_protocol': 1, 'write_protocol': 1, 'delete_protocol': 1, 'third_party_copy_protocol': 1, 'third_party_copy_read_protocol': 1, 'third_party_copy_write_protocol': 1}.
2022-04-22 14:06:30,391 root    1       ERROR   conveyor-finisher[18/43]: Something unexpected happened when updating replica state for transfer tests:step14.53620.59502.recon.ESD.66267.51307 at 48c687eab02a4f3692f53419e958a5b3 (RSE does not support requested protocol.

The (unverified) assumption is that it comes from a protocol configuration change on the RSE

@rcarpa rcarpa self-assigned this Apr 22, 2022
rcarpa added a commit to rcarpa/rucio that referenced this issue Jul 11, 2022
Reduce the scope of try/except blocks and avoid code duplication.

Rework the recovery from add_replica: if RSEProtocolNotSupported is
raised, it's probably that the RSE configuration changed since the
submission. On deterministic RSEs we can safely recover from the issue
by creating a replica with pfn=None: our goal is to add a temporary
replica to trigger deletion by reaper and reduce probability of dark
data, so using a specific protocol for that is not a priority

Add some comments.
@bari12 bari12 closed this as completed in fc567a0 Jul 25, 2022
bari12 added a commit that referenced this issue Jul 25, 2022
…rse_protocol_change

Transfers: rework update_replica in finisher. Closes #5497
bari12 pushed a commit that referenced this issue Jul 25, 2022
Reduce the scope of try/except blocks and avoid code duplication.

Rework the recovery from add_replica: if RSEProtocolNotSupported is
raised, it's probably that the RSE configuration changed since the
submission. On deterministic RSEs we can safely recover from the issue
by creating a replica with pfn=None: our goal is to add a temporary
replica to trigger deletion by reaper and reduce probability of dark
data, so using a specific protocol for that is not a priority

Add some comments.
@bari12 bari12 added this to the 1.29.1 milestone Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants