Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download client does not filter out tape sources #5122

Closed
dchristidis opened this issue Dec 21, 2021 · 3 comments · Fixed by #5124
Closed

Download client does not filter out tape sources #5122

dchristidis opened this issue Dec 21, 2021 · 3 comments · Fixed by #5124
Assignees
Milestone

Comments

@dchristidis
Copy link
Contributor

Motivation

This happens also with an old version of the Rucio client. Either the issue has gone unnoticed for a very long time or something changed server-side.

$ rucio --version
rucio 1.25.5
$ rucio list-file-replicas data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1
+------------+---------------------------------------------------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SCOPE      | NAME                                                                      | FILESIZE   | ADLER32   | RSE: REPLICA                                                                                                                                                                                                                                                            |
|------------+---------------------------------------------------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| data15_cos | data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1 | 444.412 kB | 062e8eda  | CERN-PROD_RAW: root://eosctaatlas.cern.ch:1094//eos/ctaatlas/archive/grid/atlas/rucio/raw//data15_cos/physics_CosmicCalo/00266048/data15_cos.00266048.physics_CosmicCalo.merge.RAW/data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1_1432720937 |
| data15_cos | data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1 | 444.412 kB | 062e8eda  | SARA-MATRIX_DATATAPE: srm://srm.grid.sara.nl:8443/srm/managerv2?SFN=/pnfs/grid.sara.nl/data/atlas/atlasdatatape/data15_cos/RAW/other/data15_cos.00266048.physics_CosmicCalo.merge.RAW/data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1         |
+------------+---------------------------------------------------------------------------+------------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
$ rucio --verbose download data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1
2021-12-21 09:40:14,095	INFO	Processing 1 item(s) for input
2021-12-21 09:40:14,095	DEBUG	downloadclient.py	num_unmerged_items=1; num_dids=1; num_merged_items=1
2021-12-21 09:40:14,095	INFO	Getting sources of DIDs
2021-12-21 09:40:14,129	DEBUG	downloadclient.py	schemes: None
2021-12-21 09:40:14,130	DEBUG	downloadclient.py	rse_expression: None
2021-12-21 09:40:14,130	DEBUG	downloadclient.py	num DIDs for list_replicas call: 1
2021-12-21 09:40:14,213	DEBUG	downloadclient.py	num resolved files: 1
2021-12-21 09:40:14,233	DEBUG	downloadclient.py	"unzip -v" returned with exitcode 0
2021-12-21 09:40:14,250	DEBUG	downloadclient.py	"tar --version" returned with exitcode 0
2021-12-21 09:40:14,250	DEBUG	downloadclient.py	Queueing file: data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1
2021-12-21 09:40:14,250	DEBUG	downloadclient.py	real parents: set([])
2021-12-21 09:40:14,251	DEBUG	downloadclient.py	options: {'data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1': {'ignore_checksum': False, 'transfer_speed_timeout': 500.0, 'transfer_timeout': None, 'destinations': set([('.', False)])}}
2021-12-21 09:40:14,253	DEBUG	downloadclient.py	Prepared sources: num_sources=2/2; num_non_cea_sources=2; num_cea_ids=0
2021-12-21 09:40:14,254	INFO	Using main thread to download 1 file(s)
2021-12-21 09:40:14,254	DEBUG	downloadclient.py	Start processing queued downloads
2021-12-21 09:40:14,254	INFO	Preparing download of data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1
2021-12-21 09:40:14,305	INFO	Trying to download with root and timeout of 60s from CERN-PROD_RAW: data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1
…

Here’s the bit that I don’t understand:

# filtering out tape sources
if self.is_tape_excluded:
for file_item in file_items:
unfiltered_sources = copy.copy(file_item['sources'])
for src in unfiltered_sources:
if src in tape_rses:
file_item['sources'].remove(src)
if unfiltered_sources and not file_item['sources']:
logger(logging.WARNING, 'The requested DID {} only has replicas on tape. Direct download from tape is prohibited. '
'Please request a transfer to a non-tape endpoint.'.format(file_item['did']))

src is a dictionary and tape_rses is a list of strings. How would a comparison work?

>>> metalink = client.list_replicas([{'scope': 'data15_cos', 'name': 'data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1'}], metalink=True)
>>> file_items = parse_replicas_from_string(metalink)
>>> pprint.pprint(file_items)
[{'adler32': '062e8eda',
  'bytes': 444412,
  'did': 'data15_cos:data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1',
  'md5': None,
  'parent_dids': set(),
  'sources': [{'client_extract': False,
               'domain': 'wan',
               'pfn': 'root://eosctaatlas.cern.ch:1094//eos/ctaatlas/archive/grid/atlas/rucio/raw//data15_cos/physics_CosmicCalo/00266048/data15_cos.00266048.physics_CosmicCalo.merge.RAW/data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1_1432720937',
               'priority': '1',
               'rse': 'CERN-PROD_RAW'},
              {'client_extract': False,
               'domain': 'wan',
               'pfn': 'srm://srm.grid.sara.nl:8443/srm/managerv2?SFN=/pnfs/grid.sara.nl/data/atlas/atlasdatatape/data15_cos/RAW/other/data15_cos.00266048.physics_CosmicCalo.merge.RAW/data15_cos.00266048.physics_CosmicCalo.merge.RAW._lb0002._SFO-ALL._0001.1',
               'priority': '2',
               'rse': 'SARA-MATRIX_DATATAPE'}]}]

Modification

@bari12
Copy link
Member

bari12 commented Dec 21, 2021

Thanks for the investigation Dimitrios. I guess it should be sufficient to change

if src in tape_rses:

to
src['rse']

I don't think there was a recent change here, my guess is that this was just unnoticed for quite some time.

@joeldierkes
Copy link
Contributor

I had a look into the issue and noticed the following code:

if self.is_tape_excluded:
try:
tape_rses = [endp['rse'] for endp in self.client.list_rses(rse_expression='istape=true')]
except:
logger(logging.DEBUG, 'No tapes found.')

From my POV all tape RSEs would be listed, is this necessary?

Also, is the istape attribute always set for tape RSEs? I found the rse_type in the RSE model, but not the point where the istape attribute is set. Is this really the preffered way to list all tape RSEs from the client?

rse_type = Column(Enum(RSEType, name='RSES_TYPE_CHK',
create_constraint=True,
values_callable=lambda obj: [e.value for e in obj]),
default=RSEType.DISK)

@bari12
Copy link
Member

bari12 commented Dec 21, 2021

Yes, this is correct, you basically never want to download from a tape directly.

The istape attribute is a bit of a historic reason, I am not entirely sure (would have to try) if you can even filter for rse_type right now in an rse expression. In the long run it would probably be the correct thing to switch to that instead of istape, but for now we should leave it like that.

joeldierkes pushed a commit to joeldierkes/rucio that referenced this issue Dec 21, 2021


The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in rucio#4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
joeldierkes pushed a commit to joeldierkes/rucio that referenced this issue Dec 21, 2021


The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in rucio#4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
joeldierkes pushed a commit to joeldierkes/rucio that referenced this issue Jan 7, 2022


The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in rucio#4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
joeldierkes pushed a commit to joeldierkes/rucio that referenced this issue Jan 7, 2022


The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in rucio#4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
bari12 added a commit that referenced this issue Jan 10, 2022
…does_not_filter_out_tape_sources

Clients: Download client does not filter out tape sources Fix #5122
bari12 pushed a commit that referenced this issue Jan 10, 2022
The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in #4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
@bari12 bari12 added this to the 1.27.3 milestone Jan 10, 2022
piperov pushed a commit to piperov/rucio that referenced this issue Feb 25, 2022


The download client should not download files from a tape RSE if the user is not
privileged.

The code to check that had a bug introduced in rucio#4196, so the user could read
from tape RSEs without restriction. This commit fixes the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants