Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestBinRucio.test_download_pfn fails to download in certain attempts #6506

Closed
rdimaio opened this issue Feb 21, 2024 · 1 comment · Fixed by #6543
Closed

TestBinRucio.test_download_pfn fails to download in certain attempts #6506

rdimaio opened this issue Feb 21, 2024 · 1 comment · Fixed by #6543
Assignees
Milestone

Comments

@rdimaio
Copy link
Contributor

rdimaio commented Feb 21, 2024

Description

The download attempt inTestBinRucio.test_download_pfn fails every once in a while, e.g.: https://github.com/rucio/rucio/actions/runs/7974144072/job/21769416278?pr=6497,

Re-running the job manually, it passes: https://github.com/rucio/rucio/actions/runs/7974144072?pr=6497 (but I assume there's a chance it might fail again on subsequent attempts)

The full error log is:

=================================== FAILURES ===================================
________________________ TestBinRucio.test_download_pfn ________________________
[gw3] linux -- Python 3.10.4 /opt/venv/bin/python
tests/test_bin_rucio.py:692: in test_download_pfn
    assert re.search('Total files.*1', out) is not None
E   AssertionError: assert None is not None
E    +  where None = <function search at 0x7fe2ae41e8c0>('Total files.*1', 'Completed in 2.2481 sec.\n')
E    +    where <function search at 0x7fe2ae41e8c0> = re.search
------------------------------ Captured log setup ------------------------------
DEBUG    urllib3.connectionpool:connectionpool.py:1019 Starting new HTTPS connection (1): localhost:443
DEBUG    urllib3.connectionpool:connectionpool.py:474 https://localhost:443 "POST /accountlimits/local/root/QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ HTTP/1.1" 201 7
DEBUG    urllib3.connectionpool:connectionpool.py:1019 Starting new HTTPS connection (1): localhost:443
DEBUG    urllib3.connectionpool:connectionpool.py:474 https://localhost:443 "POST /rses/QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ/attr/istape HTTP/1.1" 201 7
----------------------------- Captured stdout call -----------------------------
$> rucio upload --rse QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ --scope data13_hip /tmp/file_FYSYFVXORC
 2024-02-20 13:49:56,728	INFO	Preparing upload for file file_FYSYFVXORC
2024-02-20 13:49:56,794	INFO	Successfully added replica in Rucio catalogue at QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ
2024-02-20 13:49:56,833	INFO	Successfully added replication rule at QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ
2024-02-20 13:49:57,469	INFO	Trying upload with file to QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ
2024-02-20 13:49:57,476	INFO	Successful upload of temporary file. file://d1cba3f040f34f26ac82d4a780cfff02.cern.ch/tmp/rucio_rse/test_d1cba3f040f34f26ac82d4a780cfff02/data13_hip/cd/cb/file_FYSYFVXORC.rucio.upload
2024-02-20 13:49:57,476	INFO	Successfully uploaded file file_FYSYFVXORC

Completed in 2.2481 sec.
 2024-02-20 13:49:59,767	DEBUG	baseclient.py	No trace_host passed. Using rucio_host instead
2024-02-20 13:49:59,769	DEBUG	baseclient.py	No creds passed. Trying to get it from the config file.
2024-02-20 13:49:59,769	DEBUG	baseclient.py	HTTPS is required, but no ca_cert was passed. Trying to get it from X509_CERT_DIR.
2024-02-20 13:49:59,769	DEBUG	baseclient.py	HTTPS is required, but no ca_cert was passed and X509_CERT_DIR is not defined. Trying to get it from the config file.
2024-02-20 13:49:59,769	DEBUG	baseclient.py	No account passed. Trying to get it from the RUCIO_ACCOUNT environment variable or the config file.
2024-02-20 13:49:59,769	DEBUG	baseclient.py	No VO passed. Trying to get it from environment variable RUCIO_VO.
2024-02-20 13:49:59,769	DEBUG	baseclient.py	No VO found. Trying to get it from the config file.
2024-02-20 13:49:59,770	DEBUG	baseclient.py	got token from file
2024-02-20 13:49:59,820	INFO	Processing 1 item(s) for input
2024-02-20 13:49:59,820	DEBUG	downloadclient.py	Preparing PFN download of data13_hip:file_FYSYFVXORC (file://d1cba3f040f34f26ac82d4a780cfff02.cern.ch/tmp/rucio_rse/test_d1cba3f040f34f26ac82d4a780cfff02/data13_hip/cd/cb/file_FYSYFVXORC) from QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ
2024-02-20 13:49:59,820	INFO	Using main thread to download 1 file(s)
2024-02-20 13:49:59,820	DEBUG	downloadclient.py	Start processing queued downloads
2024-02-20 13:49:59,820	INFO	Preparing download of data13_hip:file_FYSYFVXORC
2024-02-20 13:49:59,821	INFO	Trying to download with file and timeout of 360s from QWKDGM-BIN-RUCIO-BINRUCIO-DOWNLOAD-PFN-IORIWJ: data13_hip:file_FYSYFVXORC 
2024-02-20 13:50:00,290	INFO	Using PFN: file://d1cba3f040f34f26ac82d4a780cfff02.cern.ch/tmp/rucio_rse/test_d1cba3f040f34f26ac82d4a780cfff02/data13_hip/cd/cb/file_FYSYFVXORC
2024-02-20 13:50:00,290	DEBUG	downloadclient.py	Access to local destination denied.
Details: [Errno 2] No such file or directory: '/opt/rucio/data13_hip/file_FYSYFVXORC.part'
2024-02-20 13:50:00,290	WARNING	Download attempt failed. Try 1/2
2024-02-20 13:50:01,148	DEBUG	downloadclient.py	Access to local destination denied.
Details: [Errno 2] No such file or directory: '/opt/rucio/data13_hip/file_FYSYFVXORC.part'
2024-02-20 13:50:01,148	WARNING	Download attempt failed. Try 2/2
2024-02-20 13:50:02,012	ERROR	Failed to download file data13_hip:file_FYSYFVXORC
2024-02-20 13:50:02,012	ERROR	None of the requested files have been downloaded.

When this happens, we can either:

  1. ignore the test failure when reviewing the PR (but it might hide failures in the test)
  2. re-run the job, hoping it doesn't fail again

I think we should investigate this and figure out a possible fix.

This is the code for the test, it uploads a temporary file and then tries to download it:

def test_download_pfn(self):
"""CLIENT(USER): Rucio download files"""
tmp_file1 = file_generator()
name = os.path.basename(tmp_file1)
# add files
cmd = 'rucio upload --rse {0} --scope {1} {2}'.format(self.def_rse, self.user, tmp_file1)
print(self.marker + cmd)
exitcode, out, err = execute(cmd)
print(out, err)
# download files
replica_pfn = list(self.replica_client.list_replicas([{'scope': self.user, 'name': name}]))[0]['rses'][self.def_rse][0]
cmd = 'rucio -v download --rse {0} --pfn {1} {2}:{3}'.format(self.def_rse, replica_pfn, self.user, name)
exitcode, out, err = execute(cmd)
print(out, err)
assert re.search('Total files.*1', out) is not None
try:
for i in listdir('data13_hip'):
unlink('data13_hip/%s' % i)
rmdir('data13_hip')
except Exception:
pass

It seems like the issue is due to an authorization issue, based on the downloadclient.py Access to local destination denied. log.

Steps to reproduce

Rucio Version

No response

Additional Information

No response

@rdimaio
Copy link
Contributor Author

rdimaio commented Feb 22, 2024

voetberg added a commit to voetberg/rucio that referenced this issue Mar 8, 2024
voetberg added a commit to voetberg/rucio that referenced this issue Mar 8, 2024
@rdimaio rdimaio linked a pull request Mar 13, 2024 that will close this issue
@bari12 bari12 added this to the 34.1.0 / 34.0.1 milestone Mar 19, 2024
voetberg added a commit to voetberg/rucio that referenced this issue Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants