Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor _list_replicas and correctly implement xroot prioritization for archives #4473

Closed
rcarpa opened this issue Mar 22, 2021 · 1 comment
Assignees

Comments

@rcarpa
Copy link
Contributor

rcarpa commented Mar 22, 2021

Motivation

Xroot prioritization for constituent replicas was implemented in #2313, but the current implementation was rushed and lead to undesired side-effects. We tried to quickfix the problem in #4380, but the fix doesn't seem easy. In particular because we recursively call list_replicas on the archive in order to construct the constituent replicas. Some information needed for correct prioritization is lost during the return from this recursive call.

Modification

_list_replicas has grown to be quite complex. Add more functional tests to ensure lack of regressions; refactor the function; and ensure constituent replicas can be correctly prioritized.

@rcarpa
Copy link
Contributor Author

rcarpa commented May 11, 2021

I'll start working on this one

rcarpa added a commit to rcarpa/rucio that referenced this issue May 11, 2021
…4473

The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 11, 2021
…4473

The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 17, 2021
…4473

The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 18, 2021
…4473

The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 18, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 18, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 18, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 7, 2021
…4473

The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 7, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
bari12 added a commit that referenced this issue Jun 15, 2021
…grouping

Core & Internals: improve list_replica grouping by scope/name. #4473
bari12 pushed a commit that referenced this issue Jun 15, 2021
The for loop assumes that the replicas are returned in a sorted order
by (scope, name). While scope/name doesn't change, the replica is
appended to the file. When scope/name changes, the file is yielded and
a new file starts to be constructed. Because the same scope/name can
be returned by _list_replicas_for_datasets and _list_replicas_for_files,
and they are not correctly ordered, this logic is broken in some cases.

Fix the issue by ensuring that the two data sources are correctly
merged into an ordered sequence.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 15, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 16, 2021
…#4473

When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
bari12 added a commit that referenced this issue Jun 17, 2021
Core & Internals: directly fetch constituents in list_replicas. #4473
bari12 pushed a commit that referenced this issue Jun 17, 2021
When listing archive constituent replicas, don't perform a recursive
call anymore to list the replicas of the archive. Thanks to that, we
are able to iterate over all the possible schemes of the archive file
and better prioritize the root protocol for downloading constituents.
@bari12 bari12 modified the milestone: 1.25.7 Jun 17, 2021
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 18, 2021
Extract pfn/rse sorting to a separate function and call it in the 2
needed places (each time before yielding the file).

Only resolve parents once per scope/name. As a side note, parents
are not resolved for archives, but this never worked I don't see
any regression about it in rucio#4473.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 18, 2021
…io#4473

Extract pfn/rse sorting to a separate function and call it in the 2
needed places (each time before yielding the file).

Only resolve parents once per scope/name. As a side note, parents
are not resolved for archives, but this never worked I don't see
any regression about it in rucio#4473.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 18, 2021
…io#4473

Extract pfn/rse sorting to a separate function and call it in the 2
needed places (each time before yielding the file).

Only resolve parents once per scope/name. As a side note, parents
are not resolved for archives, but this never worked I don't see
any regression about it in rucio#4473.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jun 18, 2021
…io#4473

Extract pfn/rse sorting to a separate function and call it in the 2
needed places (each time before yielding the file).

Only resolve parents once per scope/name. As a side note, parents
are not resolved for archives, but this never worked I don't see
any regression about it in rucio#4473.
@bari12 bari12 closed this as completed in 7a00e4f Sep 6, 2021
bari12 added a commit that referenced this issue Sep 6, 2021
…_replicas

Core & Internals: simplify list_replica response creation. Closes #4473
bari12 pushed a commit that referenced this issue Sep 6, 2021
Extract pfn/rse sorting to a separate function and call it in the 2
needed places (each time before yielding the file).

Only resolve parents once per scope/name. As a side note, parents
are not resolved for archives, but this never worked I don't see
any regression about it in #4473.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants