New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2018.3] archive module, filenames with Unicode characters #47572

Merged
merged 4 commits into from May 17, 2018

Conversation

Projects
None yet
4 participants
@garethgreenaway
Member

garethgreenaway commented May 9, 2018

What does this PR do?

Accounting for when files in an archive contain non-ascii characters
Updating integration/modules/test_archive to include filenames with unicode characters.

What issues does this PR fix or reference?

#47546

Tests written?

Yes

Commits signed with GPG?

Yes

Please review Salt's Contributing Guide for best practices.

See GitHub's page on GPG signing for more information about signing commits with GPG.

@rallytime

This comment has been minimized.

Contributor

rallytime commented May 10, 2018

@garethgreenaway The archive state integration tests didn't like this change very much: https://jenkins.saltstack.com/job/PR/job/salt-pr-linode-cent7-py3/4803/

Can you take a look?

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:47546_archive_non_ascii_filenames branch from 0514408 to cca191e May 11, 2018

@rallytime

This comment has been minimized.

Contributor

rallytime commented May 14, 2018

re-run lint

@terminalmage

There are a few places that this fix can be simplified.

👍 for the additional test coverage.

if six.PY2:
full_path = os.path.join(name, path.decode('utf-8'))
else:
full_path = os.path.join(name, path)

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

salt.utils.path.join() can be used here instead of an if/else. That helper will sanely handle mixed unicode and non-unicode path components and return the joined result as a unicode string:

>>> import os
>>> import salt.utils.path
>>> comps = [u'/foo', u'bar', 'спам']
>>> comps
[u'/foo', u'bar', '\xd1\x81\xd0\xbf\xd0\xb0\xd0\xbc']
>>> os.path.join(*comps)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 1: ordinal not in range(128)
>>> salt.utils.path.join(*comps)
u'/foo/bar/\u0441\u043f\u0430\u043c'
>>>
if six.PY2:
tar.extractall(salt.utils.to_bytes(name))
else:
tar.extractall(name)

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

Is the problem here that tar.extractall() just doesn't like a unicode destination path passed to it? If so, then to_bytes is not the best solution. Instead, you don't even need an if/else and you can just use:

tar.extractall(salt.utils.stringutils.to_str(name))

This will ensure that the string is a str type on both Python 2 and Python 3.

This comment has been minimized.

@garethgreenaway

garethgreenaway May 17, 2018

Member

It's only an issue under Py2 and I went with to_bytes since we ripped out all the uses of to_str.

ret['top_level_files'] = [x for x in ret['files']
if x.count('/') == 0]
ret['top_level_links'] = [x for x in ret['links']
if x.count('/') == 0]

This comment has been minimized.

@terminalmage

terminalmage May 16, 2018

Member

It looks like the getmembers method within tarfile is returning paths as str types on Python 2. If this is the case, then the specific handling for Python 2 and 3 won't be necessary if you use salt.utils.data.decode_list() on each list when initializing ret:

ret = {'dirs': sorted(salt.utils.data.decode_list(dirs)),
       'files': sorted(salt.utils.data.decode_list(files)),
       'links': sorted(salt.utils.data.decode_list(links))}

This will ensure that all paths are unicode types, and the list comprehensions which generate the top_level_* keys will therefore not cause a UnicodeDecodeError.

salt.utils.data.decode() would also work here, but it is a general-purpose function that does type checking and would end up simply calling salt.utils.data.decode_list() under-the-hood. Since we know that dirs, files, and links are all lists, using salt.utils.data.decode_list() would be marginally faster here.

garethgreenaway added some commits May 9, 2018

@garethgreenaway garethgreenaway force-pushed the garethgreenaway:47546_archive_non_ascii_filenames branch from 7d456b8 to 5e97b8b May 17, 2018

@rallytime rallytime requested a review from terminalmage May 17, 2018

@rallytime rallytime merged commit 69056e5 into saltstack:2018.3 May 17, 2018

4 of 9 checks passed

jenkins/PR/salt-pr-rs-cent7-n Pull Requests » Salt PR - RS CentOS 7 #19047 — ABORTED
Details
default Build finished.
Details
jenkins/PR/salt-pr-linode-cent7-py3 Pull Requests » Salt PR - Linode CentOS 7 - PY3 #4990 — FAILURE
Details
jenkins/PR/salt-pr-linode-ubuntu14-n Pull Requests » Salt PR - Linode Ubuntu14.04 #22924 — FAILURE
Details
jenkins/PR/salt-pr-linode-ubuntu16-py3 Pull Requests » Salt PR - Linode Ubuntu16.04 - PY3 #9960 — FAILURE
Details
WIP ready for review
Details
jenkins/PR/salt-pr-clone Pull Requests » Salt PR - Clone #25177 — SUCCESS
Details
jenkins/PR/salt-pr-docs-n Pull Requests » Salt PR - Docs #17277 — SUCCESS
Details
jenkins/PR/salt-pr-lint-n Pull Requests » Salt PR - Code Lint #21907 — SUCCESS
Details
@junovitch

This comment has been minimized.

Contributor

junovitch commented Jun 6, 2018

@garethgreenaway I'm seeing a backtrace two lines down from this change on a specific file with unicode characters. I've applied this changeset on top of 2018.3.0 and you can repro the failure with this state:

# patches applied to 2018.3.0
# https://github.com/saltstack/salt/pull/47572

extract ycmd zip:
  archive.extracted:
    - name: /tmp/ycmd-8082b6f241bd911452cfd7f21f8a70fce00abf63                                                     
    - source: https://github.com/Valloric/ycmd/archive/8082b6f241bd911452cfd7f21f8a70fce00abf63.zip                
    - source_hash: sha1=89761c942e8b6501fd0d07513e2c49f86d57edda                                                   

extract ycmd tar:
  archive.extracted:
    - name: /tmp/ycmd-8082b6f241bd911452cfd7f21f8a70fce00abf63                                                     
    - archive_format: tar
    - source: https://codeload.github.com/Valloric/ycmd/tar.gz/8082b6f241bd911452cfd7f21f8a70fce00abf63?dummy=/8082b
6f241bd911452cfd7f21f8a70fce00abf63.tar.gz
    - source_hash: sha1=a23ca2c5b7e12557ef3cd0e0fcb3f3102c54a0c1                                                   
    - if_missing: /tmp/ycmd-8082b6f241bd911452cfd7f21f8a70fce00abf63  

You have much more experience on this than I do and I'd be glad to help you reproduce it and validate a fix. Shall we move this to a new issue and continue some troubleshooting?

edit
This is the backtrace:

[ERROR   ] An exception occurred in this state: Traceback (most recent call last):                           
  File "/usr/local/lib/python2.7/site-packages/salt/state.py", line 1878, in call
    **cdata['kwargs'])                                                                                              
  File "/usr/local/lib/python2.7/site-packages/salt/loader.py", line 1823, in wrapper                              
    return f(*args, **kwargs)                          
  File "/usr/local/lib/python2.7/site-packages/salt/states/archive.py", line 1105, in extracted                    
    path_mode = os.lstat(full_path.rstrip(os.sep)).st_mode                                                         
UnicodeEncodeError: 'ascii' codec can't encode characters in position 120-130: ordinal not in range(128)           
                                                                                                                   
[ERROR   ] Uncaught exception ''ascii' codec can't decode byte 0xd0 in position 69: ordinal not in range(128)' when
listing contents of /var/cache/salt/minion/extrn_files/base/codeload.github.com/Valloric/ycmd/tar.gz/8082b6f241bd911
452cfd7f21f8a70fce00abf63-dummy=/8082b6f241bd911452cfd7f21f8a70fce00abf63.tar.gz. Additional info follows:         
                                                                                                                    
archive location:                                                                                                   
    /var/cache/salt/minion/extrn_files/base/codeload.github.com/Valloric/ycmd/tar.gz/8082b6f241bd911452cfd7f21f8a70f
ce00abf63-dummy=/8082b6f241bd911452cfd7f21f8a70fce00abf63.tar.gz
@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented Jun 6, 2018

@junovitch No need for a new issue, we can just re-open the existing issue and reference the new PR in there. I believe I've got this one figured out. The issue is over in the archive module, here is a diff if you wouldn't mind testing it for me:

diff --git a/salt/modules/archive.py b/salt/modules/archive.py
index dc49fd252f..76cd3eeb97 100644
--- a/salt/modules/archive.py
+++ b/salt/modules/archive.py
@@ -186,12 +186,13 @@ def list_(name,
                 else {'fileobj': cached.stdout, 'mode': 'r|'}
             with contextlib.closing(tarfile.open(**open_kwargs)) as tar_archive:
                 for member in tar_archive.getmembers():
+                    _member = salt.utils.data.decode(member.name)
                     if member.issym():
-                        links.append(member.name)
+                        links.append(_member)
                     elif member.isdir():
-                        dirs.append(member.name + '/')
+                        dirs.append(_member + '/')
                     else:
-                        files.append(member.name)
+                        files.append(_member)
             return dirs, files, links
 
         except tarfile.ReadError:

If everything looks good then I'll prep a new PR. Thanks!

@junovitch

This comment has been minimized.

Contributor

junovitch commented Jun 6, 2018

@garethgreenaway thanks for the quick work. This is an improvement; the tar file now doesn't hit that uncaught exception. Both the tar and zip in the referenced sls snippet above fail at the same point:

  File "/usr/local/lib/python2.7/site-packages/salt/states/archive.py", line 1105, in extracted                   
    path_mode = os.lstat(full_path.rstrip(os.sep)).st_mode                                                        
UnicodeEncodeError: 'ascii' codec can't encode characters in position 120-130: ordinal not in range(128)  

What do you think? Let me know and I will sanity check things. Certainly tag me in the new PR when we get there as well. Thanks!

edit I will note that I have only applied the one pull request on top of 2018.3.0 and I did see in the work in progress changelog there are several more unicode related items. So this could be resolved with work elsewhere.

@junovitch

This comment has been minimized.

Contributor

junovitch commented Jun 6, 2018

On my dev network at work the above patch fixed my issue but I have also pulled a few more unicode related patches back from the 2018.3 branch on to my 2018.3.0 deployment.

@garethgreenaway

This comment has been minimized.

Member

garethgreenaway commented Jun 6, 2018

@junovitch I ran your test state with the tar.gz file in question and it worked as expected. This was using the HEAD of 2018.3 so it installed all the other Unicode related fixes. I'm going to add some additional tests and submit a PR with the above fix.

@junovitch

This comment has been minimized.

Contributor

junovitch commented Jun 7, 2018

Great! I just didn't have $LANG set on my test jail at home and it was set at work. Looks good now! Thanks for the quick response and getting these fixed in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment