Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not handle badly encoded filenames transparently #2

Open
matthijskooijman opened this issue Jun 24, 2014 · 4 comments
Open

Does not handle badly encoded filenames transparently #2

matthijskooijman opened this issue Jun 24, 2014 · 4 comments
Assignees
Labels

Comments

@matthijskooijman
Copy link

When a filename on disk has an invalid name, in the sense that the name is not valid in the character encoding python is using for filenames, pylibacl breaks.

Since python 3.1, there is a mechanism for supporting filenames like these, using surrogate code points: http://legacy.python.org/dev/peps/pep-0383/

This causes these invalid filenames to be converted to normal unicode strings, using the "surrogate code points" in place of the invalid characters. However, when such a filename is passed to pylibacl (through the ACL constructor file argument), an exception occurs:

    actual = posix1e.ACL(file=path)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 99: surrogates not allowed

Looking at the code, I suspect this error comes from the call to PyArg_ParseTupleAndKeywords, which is told to convert into a string using utf-8, which then fails.

Looking at the docs for PyArg_ParseTupleAndKeywords, it suggests that this problem can be solved by using the 0& format:

Note: This format does not accept bytes-like objects. If you want to accept filesystem paths and convert them to C character strings, it is preferable to use the O& format with PyUnicode_FSConverter() as converter.

@iustin
Copy link
Owner

iustin commented Jun 24, 2014

Thanks for the bug report. This is similar (in a sense) to iustin/pyxattr#3 (but that in the Python code). I guess I'll have to have a pass through the code and write some tests with such invalid filenames, before fixing individual functions.

@iustin iustin added the bug label Jun 24, 2014
@iustin
Copy link
Owner

iustin commented Jun 28, 2014

OK, investigated a bit and there are two options to make this work:

  • have separate parsing paths for < 3.1 and >=3.1, which is something that I really don't want to do (the code is already ugly in that area)
  • re-implement a poor-man's PyUnicode_FSConverter for < 3.1

The other case there we deal with filenames is ACL_applyto, but there it will be somewhat simpler to handle.

@iustin iustin self-assigned this Apr 30, 2015
@iustin
Copy link
Owner

iustin commented Nov 14, 2019

I will resolve this in the first release next year, when I'll drop Python 2 support.

@iustin
Copy link
Owner

iustin commented Dec 3, 2019

Git head now switched to PyUnicode_FSConverter, fix will be released in next version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants