Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_lan activity is not use on upload #4639

Closed
bari12 opened this issue May 25, 2021 · 0 comments
Closed

write_lan activity is not use on upload #4639

bari12 opened this issue May 25, 2021 · 0 comments
Assignees

Comments

@bari12
Copy link
Member

bari12 commented May 25, 2021

Motivation

Recently it was identified that the write_lan activity is not used when uploading. This was supposedly fixed with #3626. My suspicion is that this is only the case in certain configurations though. (E.g. the one of TRIUMF)

https://aipanda403.cern.ch/data/jobs/2021-05-19/TRIUMF/5060545120.out

Shows an upload output of the pilot. The critical part is:

2021-05-19 07:57:42,369 | DEBUG    | Thread-5            | pilot.copytool.rucio             | upload                    | [{u'extended_attributes': None, u'hostname': u'xrootd.lcg.triumf.ca', u'prefix': u'//atlas/atlasdatadisk/rucio/', u'domains': {u'wan': {u'read': 2, u'write': 3, u'third_party_copy': 0, u'delete': 3}, u'lan': {u'read': 2, u'write': 3, u'delete': 3}}, u'scheme': u'root', u'port': 1094, u'impl': u'rucio.rse.protocols.gfal.Default'}, {u'extended_attributes': {u'space_token': u'ATLASDATADISK', u'web_service_path': u'/srm/managerv2?SFN='}, u'hostname': u'srm.triumf.ca', u'prefix': u'/atlas/atlasdatadisk/rucio/', u'domains': {u'wan': {u'read': 3, u'write': 2, u'third_party_copy': 2, u'delete': 2}, u'lan': {u'read': 3, u'write': 2, u'delete': 2}}, u'scheme': u'srm', u'port': 8443, u'impl': u'rucio.rse.protocols.gfal.Default'}, {u'extended_attributes': None, u'hostname': u'webdav-lan.lcg.triumf.ca', u'prefix': u'/atlas/atlasdatadisk/rucio/', u'domains': {u'wan': {u'read': 0, u'write': 0, u'third_party_copy': 0, u'delete': 0}, u'lan': {u'read': 1, u'write': 1, u'delete': 1}}, u'scheme': u'davs', u'port': 2880, u'impl': u'rucio.rse.protocols.gfal.Default'}]
2021-05-19 07:57:42,369 | INFO     | Thread-5            | pilot.copytool.rucio             | upload                    | Trying upload with davs to TRIUMF-LCG2_DATADISK
2021-05-19 07:57:42,369 | DEBUG    | Thread-5            | pilot.copytool.rucio             | upload                    | Processing upload with the domain: lan
2021-05-19 07:57:42,369 | DEBUG    | Thread-5            | pilot.copytool.rucio             | log_format                | gfal.Default: connecting to storage
2021-05-19 07:57:42,380 | INFO     | Thread-5            | gfal2                            | connect                   | [gfal_module_load] plugin /cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase/x86_64/emi/4.0.2-1_200423.fix3/usr/lib64/gfal2-plugins//libgfal_plugin_gridftp.so loaded with success 
...
2021-05-19 07:57:42,391 | DEBUG    | Thread-5            | pilot.copytool.rucio             | _upload_item              | The PFN created from the LFN: davs://webdav.lcg.triumf.ca:2880/atlas/atlasdatadisk/rucio/hc_test/a0/e1/22d809ae-9851-47e1-8fdb-174f45d51776_30166.1.job.log.tgz

Thus the client decides to do a lan upload, however, the fetched protocols of the clients are

[{u'extended_attributes': None,
  u'hostname': u'xrootd.lcg.triumf.ca',
  u'prefix': u'//atlas/atlasdatadisk/rucio/',
  u'domains': {u'wan': {u'read': 2, u'write': 3, u'third_party_copy': 0, u'delete': 3}, u'lan': {u'read': 2, u'write': 3, u'delete': 3}},
  u'scheme': u'root',
  u'port': 1094,
  u'impl': u'rucio.rse.protocols.gfal.Default'},
 {u'extended_attributes': {u'space_token': u'ATLASDATADISK', u'web_service_path': u'/srm/managerv2?SFN='},
  u'hostname': u'srm.triumf.ca',
  u'prefix': u'/atlas/atlasdatadisk/rucio/',
  u'domains': {u'wan': {u'read': 3, u'write': 2, u'third_party_copy': 2, u'delete': 2}, u'lan': {u'read': 3, u'write': 2, u'delete': 2}},
  u'scheme': u'srm',
  u'port': 8443,
  u'impl': u'rucio.rse.protocols.gfal.Default'},
 {u'extended_attributes': None,
  u'hostname': u'webdav-lan.lcg.triumf.ca',
  u'prefix': u'/atlas/atlasdatadisk/rucio/',
  u'domains': {u'wan': {u'read': 0, u'write': 0, u'third_party_copy': 0, u'delete': 0}, u'lan': {u'read': 1, u'write': 1, u'delete': 1}},
  u'scheme': u'davs',
  u'port': 2880,
  u'impl': u'rucio.rse.protocols.gfal.Default'}]

Thus it should do the upload to the webdav-lan.lcg.triumf.ca door, but it chooses the webdav.lcg.triumf.ca one, which is not even part of the list (But it is part of the RSE protocols of that RSE). My assumption is that Rucio doesn't handle well that there are multiple davs protocols with mixed configuration. Thus it correctly detects that davs should be used for the upload, but since there is a second one, the lfn2pfn generation picks the wrong davs protocol.

Modification

Needs a fix :-)

rcarpa added a commit to rcarpa/rucio that referenced this issue May 26, 2021


The upload client has some logic to select the 'lan' domain if the
client and rse 'site' are identical. This results into correctly
selecting the `scheme` which can be used for a lan transfer.
However, the domain is not enforced when the protocol object is
latter created. This will result in using the default "wan" domain
for the upload if a protocol with the same scheme is available on wan.
rcarpa added a commit to rcarpa/rucio that referenced this issue May 26, 2021


The upload client has some logic to select the 'lan' domain if the
client and rse 'site' are identical. This results into correctly
selecting the `scheme` which can be used for a lan transfer.
However, the domain is not enforced when the protocol object is
latter created. This will result in using the default "wan" domain
for the upload if a protocol with the same scheme is available on wan.
bari12 added a commit that referenced this issue Jun 8, 2021
Clients: propagate selected domain in uploadclient. #4639
bari12 pushed a commit that referenced this issue Jun 8, 2021
The upload client has some logic to select the 'lan' domain if the
client and rse 'site' are identical. This results into correctly
selecting the `scheme` which can be used for a lan transfer.
However, the domain is not enforced when the protocol object is
latter created. This will result in using the default "wan" domain
for the upload if a protocol with the same scheme is available on wan.
@bari12 bari12 added this to the 1.25.6-clients milestone Jun 8, 2021
@bari12 bari12 removed the Clients label Jun 8, 2021
@bari12 bari12 closed this as completed Jun 8, 2021
jamesp-epcc pushed a commit to jamesp-epcc/rucio that referenced this issue Jun 10, 2021


The upload client has some logic to select the 'lan' domain if the
client and rse 'site' are identical. This results into correctly
selecting the `scheme` which can be used for a lan transfer.
However, the domain is not enforced when the protocol object is
latter created. This will result in using the default "wan" domain
for the upload if a protocol with the same scheme is available on wan.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants