Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

rcarpa · 2021-03-22T11:02:57Z

Motivation

Xroot prioritization for constituent replicas was implemented in #2313, but the current implementation was rushed and lead to undesired side-effects. We tried to quickfix the problem in #4380, but the fix doesn't seem easy. In particular because we recursively call list_replicas on the archive in order to construct the constituent replicas. Some information needed for correct prioritization is lost during the return from this recursive call.

Modification

_list_replicas has grown to be quite complex. Add more functional tests to ensure lack of regressions; refactor the function; and ensure constituent replicas can be correctly prioritized.

rcarpa · 2021-05-11T07:04:20Z

I'll start working on this one

…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.

…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.

…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.

…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.

…grouping Core & Internals: improve list_replica grouping by scope/name. #4473

The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.

…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.

Core & Internals: directly fetch constituents in list_replicas. #4473

When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.

Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.

…io#4473 Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.

…_replicas Core & Internals: simplify list_replica response creation. Closes #4473

Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in #4473.

rcarpa mentioned this issue Mar 22, 2021

Core: don't filter to 'root' for list_replicas on archives. #2961 #4380

Closed

bari12 added feature Replicas labels Mar 22, 2021

bari12 assigned mlassnig and rcarpa Mar 22, 2021

bari12 added a commit that referenced this issue Jun 15, 2021

Merge pull request #4620 from rcarpa/patch-4473-improve_list_replica_…

c0bf250

…grouping Core & Internals: improve list_replica grouping by scope/name. #4473

bari12 added a commit that referenced this issue Jun 17, 2021

Merge pull request #4632 from rcarpa/patch-4473-refactor_list_replicas

5a14a43

Core & Internals: directly fetch constituents in list_replicas. #4473

bari12 modified the milestone: 1.25.7 Jun 17, 2021

rcarpa mentioned this issue Jun 18, 2021

Core & Internals: simplify list_replica response creation. Closes #4473 #4691

Merged

bari12 closed this as completed in 7a00e4f Sep 6, 2021

bari12 added a commit that referenced this issue Sep 6, 2021

Merge pull request #4691 from rcarpa/patch-4473-cosmetic_changes_list…

b6092bc

…_replicas Core & Internals: simplify list_replica response creation. Closes #4473

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

rcarpa commented Mar 22, 2021

rcarpa commented May 11, 2021

Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

Comments

rcarpa commented Mar 22, 2021

Motivation

Modification

rcarpa commented May 11, 2021