Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source replica expression not always correctly respected on multihops #5170

Closed
rcarpa opened this issue Jan 21, 2022 · 0 comments
Closed

source replica expression not always correctly respected on multihops #5170

rcarpa opened this issue Jan 21, 2022 · 0 comments
Assignees
Milestone

Comments

@rcarpa
Copy link
Contributor

rcarpa commented Jan 21, 2022

Motivation

When a multihop transfers is generated, intermediate transfers are scheduled without enforcing any source replica. If the transfer fails
during submission, and the intermediate transfer remain in "queued" state in the database, the intermediate transfers will be later be submitted on their own, but they don't have any source replica expression set, so they will end using whatever source is judged best.

An example of multihop which resulted in this case :

conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e: Cannot pick transfertool, or create intermediate requests
conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e: Multihop : A request already exists for the transfer between src_rse=CERN-PROD_RAW and dst_rse=BNL-OSG2_DATADISK. Will cancel all the parent requests
conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e: Best path is multihop: CERN-PROD_RAW--8f1682d7464e4404b3074384426461da->CERN-PROD_DATADISK--e6c76217fb284b5baf5c9795cc58654e->BNL-OSG2_DATADISK
conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e(data17_13TeV:data17_13TeV.00324781.physics_Main.daq.RAW._lb0229._SFO-8._0003.data): Ordered sources: multihop: CERN-PROD_RAW:0:52
conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e: 1/2 sources left after filtering. Dropped: IN2P3-CC_DATATAPE
conveyor-submitter[67/181]: e6c76217fb284b5baf5c9795cc58654e(data17_13TeV:data17_13TeV.00324781.physics_Main.daq.RAW._lb0229._SFO-8._0003.data): Found 2 sources

and the associated intermediate replicas living their own lives (which shouldn't normally happen) :

..........................................
conveyor-submitter[95/181]: 8f1682d7464e4404b3074384426461da(data17_13TeV:data17_13TeV.00324781.physics_Main.daq.RAW._lb0229._SFO-8._0003.data): Found 2 sources
conveyor-submitter[95/181]: 8f1682d7464e4404b3074384426461da(data17_13TeV:data17_13TeV.00324781.physics_Main.daq.RAW._lb0229._SFO-8._0003.data): Ordered sources: CERN-PROD_RAW:0:20,IN2P3-CC_DATATAPE:0:21

Modification

The fact that transfers remain in queued state in the database is not normal, they should be cleaned up correctly, but it's not the first time we have issues with this cleanup: there are too many things which can go wrong. To reduce impact of such unexpected corner cases, it may be worth enforcing source_replica_expression=rse_name for all intermediate hops.

@rcarpa rcarpa self-assigned this Jan 21, 2022
@bari12 bari12 added the bug label Jan 21, 2022
rcarpa added a commit to rcarpa/rucio that referenced this issue Jan 21, 2022
…ucio#5170

This shouldn't be necessary, but it's worth adding an additional
protection in case the intermediate hop is submitted alone by error.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jan 21, 2022
…ucio#5170

This shouldn't be necessary, but it's worth adding an additional
protection in case the intermediate hop is submitted alone by error.
rcarpa added a commit to rcarpa/rucio that referenced this issue Jan 21, 2022
…io#5170

This shouldn't be necessary, but it's worth adding an additional
protection in case the intermediate hop is submitted alone by error.
@bari12 bari12 closed this as completed in fb10334 Feb 1, 2022
bari12 added a commit that referenced this issue Feb 1, 2022
…sion_multihops

Transfers: rework concurrent multihop handling. Closes #5170. Closes #5028
bari12 pushed a commit that referenced this issue Feb 1, 2022
This shouldn't be necessary, but it's worth adding an additional
protection in case the intermediate hop is submitted alone by error.
@bari12 bari12 added this to the 1.27.4 milestone Feb 1, 2022
@bari12 bari12 changed the title Transfers: source replica expression not always correctly respected on multihops source replica expression not always correctly respected on multihops Feb 1, 2022
piperov pushed a commit to piperov/rucio that referenced this issue Feb 25, 2022
…io#5170

This shouldn't be necessary, but it's worth adding an additional
protection in case the intermediate hop is submitted alone by error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants