Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rucio download is not able to download files if the scope or name contains "/" #3031

Closed
cserf opened this issue Oct 21, 2019 · 13 comments
Closed

Comments

@cserf
Copy link
Contributor

cserf commented Oct 21, 2019

Motivation

rucio download is not able to download files if the scope or name contains "/"

@TWAtGH TWAtGH changed the title Client : rucio download is not able to download files if the scope or name contains "/" Client: rucio download is not able to download files if the scope or name contains "/" Oct 21, 2019
@TWAtGH TWAtGH changed the title Client: rucio download is not able to download files if the scope or name contains "/" Rucio download is not able to download files if the scope or name contains "/" Oct 21, 2019
@TWAtGH
Copy link

TWAtGH commented Nov 14, 2019

You skipped the 'modifications' section in the description :P
So how should this be resolved?
Is it related to #1393 ?

@TomasJavurek
Copy link
Contributor

Using urllib.quote_plus() for all but containers?

@bari12
Copy link
Member

bari12 commented Nov 15, 2019

I don't think thjis is related to #1393 - There it is specifically if the trailing / is mentioned in the did name, like ATLAS prodsys was doing this in the past. Here the error must be something else.

@TWAtGH
Copy link

TWAtGH commented Nov 18, 2019

I think this is rather an list_replicas issue then. But if the physical filename or the dataset name contains a "/" it will be problematic because it's the directory separator. We could change the download client to replace it by another character or create a subdirectory for each "/" but I think the best thing would be not to have "/" in the LFNs 😅

@bari12
Copy link
Member

bari12 commented Nov 19, 2019

Creating subdirectories is fine. This is in fact how these files are stored on storage. The LFN just includes these structural directories. There is nothing wrong with having / in the LFN.
I would suggest we just create subdirectories. @TWAtGH can you please try and confirm the exact issue. If it is in list_replicas then this goes over to @mlassnig - but we should narrow down what the actual issue is.

@TWAtGH
Copy link

TWAtGH commented Nov 19, 2019

Looks like the download is already implemented like this. But it's difficult to try because the upload doesn't support this. I assume that most client functions that do an REST API call won't work using a DID containing a slash.
A little bit more info about the exact issue would be nice @cserf and how can I create a DID containing a slash?
Also how would double slashes be resolved? Just a single directory I guess.

@TWAtGH
Copy link

TWAtGH commented Nov 19, 2019

Ok in fact it was the server which rejected the upload due to a missing AllowEncodedSlashes On
But the server still don't allow to insert slashed DIDs. I think it happens in the add_replicas call:

2019-11-19 17:25:17,488	DEBUG	File DID does not exist
2019-11-19 17:25:19,032	ERROR	Provided object does not match schema.
Details: Problem validating dids : u'slash/lfn01' does not match '^[A-Za-z0-9][A-Za-z0-9\\.\\-\\_]{1,250}$'

Failed validating 'pattern' in schema['items']['properties']['name']:
    {'description': 'Data Identifier name',
     'pattern': '^[A-Za-z0-9][A-Za-z0-9\\.\\-\\_]{1,250}$',
     'type': 'string'}

On instance[0]['name']:
    u'slash/lfn01'

@bari12
Copy link
Member

bari12 commented Nov 19, 2019

Yes, you have to try this with a schema which allows slashes. Eg. the belleii schema.

@bari12
Copy link
Member

bari12 commented Jan 29, 2020

@TWAtGH is there any news on this?

@cserf
Copy link
Contributor Author

cserf commented Feb 11, 2020

Traceback :

rucio -v download test:/grid/belle/ddm/functional_tests/release-0x-0y-0z/DB00000xxx/2020-02-11/MCyy/26903/dst
2020-02-11 05:54:32,806 INFO    Processing 1 item(s) for input
2020-02-11 05:54:32,806 DEBUG   num_unmerged_items=1; num_dids=1; num_merged_items=1
2020-02-11 05:54:32,806 INFO    Getting sources of DIDs
2020-02-11 05:54:32,806 DEBUG   schemes: None
2020-02-11 05:54:32,806 DEBUG   rse_expression: *\istape=true
2020-02-11 05:54:32,806 DEBUG   num DIDs for list_replicas call: 1
2020-02-11 05:54:34,649 DEBUG   num resolved files: 2
2020-02-11 05:54:34,655 DEBUG   "unzip -v" returned with exitcode 0
2020-02-11 05:54:34,661 DEBUG   "tar --version" returned with exitcode 0
2020-02-11 05:54:34,661 DEBUG   num list_replicas calls: 1
2020-02-11 05:54:34,661 DEBUG   Queueing file: test:/grid/belle/ddm/functional_tests/release-0x-0y-0z/DB00000xxx/2020-02-11/MCyy/26903/dst/bf55cdc6f73e473396c3476ca7047384
2020-02-11 05:54:34,661 DEBUG   real parents: set(['test:/grid/belle/ddm/functional_tests/release-0x-0y-0z/DB00000xxx/2020-02-11/MCyy/26903/dst'])
2020-02-11 05:54:34,661 DEBUG   options: {'test:/grid/belle/ddm/functional_tests/release-0x-0y-0z/DB00000xxx/2020-02-11/MCyy/26903/dst': {'ignore_checksum': False, 'transfer_timeout': 3600, 'destinations': set([('.', False)])}}
2020-02-11 05:54:34,663 DEBUG   Traceback (most recent call last):
  File "/usr/bin/rucio", line 159, in new_funct
    return function(*args, **kwargs)
  File "/usr/bin/rucio", line 975, in download
    result = download_client.download_dids(items, args.ndownloader, trace_pattern)
  File "/usr/lib/python2.7/site-packages/rucio/client/downloadclient.py", line 280, in download_dids
    input_items = self._prepare_items_for_download(did_to_options, merged_items_with_sources, resolve_archives=resolve_archives)
  File "/usr/lib/python2.7/site-packages/rucio/client/downloadclient.py", line 1158, in _prepare_items_for_download
    paths = [os.path.join(self._prepare_dest_dir(dest[0], dataset_name, file_did_name, dest[1]), file_did_name) for dest in destinations]
  File "/usr/lib/python2.7/site-packages/rucio/client/downloadclient.py", line 1395, in _prepare_dest_dir
    os.makedirs(dest_dir_path)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 150, in makedirs
    makedirs(head, mode)
  File "/usr/lib64/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 13] Permission denied: '/grid'

2020-02-11 05:54:34,663 ERROR   [Errno 13] Permission denied: '/grid'
2020-02-11 05:54:34,663 ERROR
Rucio exited with an unexpected/unknown error.

@bari12
Copy link
Member

bari12 commented Jun 3, 2021

@cserf is this still an issue? How did you fix this with belle2?

@cserf
Copy link
Contributor Author

cserf commented Jun 9, 2021

Still an issue. Not fixed yet in Belle II. We have been using a separate tool for download until now, but at some point we plan to move to rucio download API.

@cserf cserf self-assigned this Sep 1, 2021
@cserf
Copy link
Contributor Author

cserf commented Sep 1, 2021

Found the issue. It is due to the behaviour of os.path.join

Help on function join in module posixpath:

join(a, *p)
    Join two or more pathname components, inserting '/' as needed.
    If any component is an absolute path, all previous path components
    will be discarded.  An empty last part will result in a path that
    ends with a separator.

cserf added a commit to cserf/rucio that referenced this issue Sep 1, 2021
@bari12 bari12 closed this as completed in f429a08 Sep 6, 2021
bari12 pushed a commit that referenced this issue Sep 6, 2021
…tains "/" (#4816)

* Rucio download is not able to download files if the scope or name contains "/" : Closes #3031

* Fix variable name

* Protection against empty extract_scope
@bari12 bari12 added this to the 1.26.4-clients milestone Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants