-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor _list_replicas and correctly implement xroot prioritization for archives #4473
Comments
I'll start working on this one |
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 11, 2021
…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 11, 2021
…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 17, 2021
…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 18, 2021
…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 18, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 18, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
May 18, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 7, 2021
…4473 The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 7, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
bari12
added a commit
that referenced
this issue
Jun 15, 2021
…grouping Core & Internals: improve list_replica grouping by scope/name. #4473
bari12
pushed a commit
that referenced
this issue
Jun 15, 2021
The for loop assumes that the replicas are returned in a sorted order by (scope, name). While scope/name doesn't change, the replica is appended to the file. When scope/name changes, the file is yielded and a new file starts to be constructed. Because the same scope/name can be returned by _list_replicas_for_datasets and _list_replicas_for_files, and they are not correctly ordered, this logic is broken in some cases. Fix the issue by ensuring that the two data sources are correctly merged into an ordered sequence.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 15, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 16, 2021
…#4473 When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
bari12
added a commit
that referenced
this issue
Jun 17, 2021
Core & Internals: directly fetch constituents in list_replicas. #4473
bari12
pushed a commit
that referenced
this issue
Jun 17, 2021
When listing archive constituent replicas, don't perform a recursive call anymore to list the replicas of the archive. Thanks to that, we are able to iterate over all the possible schemes of the archive file and better prioritize the root protocol for downloading constituents.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 18, 2021
Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 18, 2021
…io#4473 Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 18, 2021
…io#4473 Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.
rcarpa
added a commit
to rcarpa/rucio
that referenced
this issue
Jun 18, 2021
…io#4473 Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in rucio#4473.
bari12
added a commit
that referenced
this issue
Sep 6, 2021
…_replicas Core & Internals: simplify list_replica response creation. Closes #4473
bari12
pushed a commit
that referenced
this issue
Sep 6, 2021
Extract pfn/rse sorting to a separate function and call it in the 2 needed places (each time before yielding the file). Only resolve parents once per scope/name. As a side note, parents are not resolved for archives, but this never worked I don't see any regression about it in #4473.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
Xroot prioritization for constituent replicas was implemented in #2313, but the current implementation was rushed and lead to undesired side-effects. We tried to quickfix the problem in #4380, but the fix doesn't seem easy. In particular because we recursively call list_replicas on the archive in order to construct the constituent replicas. Some information needed for correct prioritization is lost during the return from this recursive call.
Modification
_list_replicas has grown to be quite complex. Add more functional tests to ensure lack of regressions; refactor the function; and ensure constituent replicas can be correctly prioritized.
The text was updated successfully, but these errors were encountered: