Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-ASCII filenames. #1

Closed
mete0r opened this issue Jul 27, 2015 · 2 comments
Closed

Non-ASCII filenames. #1

mete0r opened this issue Jul 27, 2015 · 2 comments

Comments

@mete0r
Copy link
Contributor

mete0r commented Jul 27, 2015

Hello,

Thank you for this handy package. I think it's a really hidden gem.

If i try to open()/listdir() filenames with non-ASCII characters, i.e. UTF-8 encoded filenames, the transaction fails.

======================================================================
ERROR: test_read_write_nonascii_name (acidfs.tests.FunctionalTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/.../acidfs/tests.py", line 132, in test_read_write_nonascii_name
    transaction.commit()
  File "/.../.eggs/transaction-1.4.4-py2.7.egg/transaction/_manager.py", line 111, in commit
    return self.get().commit()
  File "/.../.eggs/transaction-1.4.4-py2.7.egg/transaction/_transaction.py", line 280, in commit
    reraise(t, v, tb)
  File "/.../.eggs/transaction-1.4.4-py2.7.egg/transaction/_transaction.py", line 271, in commit
    self._commitResources()
  File "/.../.eggs/transaction-1.4.4-py2.7.egg/transaction/_transaction.py", line 417, in _commitResources
    reraise(t, v, tb)
  File "/.../.eggs/transaction-1.4.4-py2.7.egg/transaction/_transaction.py", line 394, in _commitResources
    rm.tpc_vote(self)
  File "/.../acidfs/__init__.py", line 521, in tpc_vote
    tree_oid = self.tree.save()
  File "/.../acidfs/__init__.py", line 969, in save
    proc.stdin.write(_b(name))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf2' in position 4: ordinal not in range(128)

Further investigation reveals following:

  • names in _TreeNode.contents are being forced to be ASCII-encode/decoded.
  • git prints filenames containing bytes above 0x80 default quoted. This can be off by core.quotepath = false config.

So I think following is a possible solution:

  • AcidFS.__init__() accepts optional filename encoding, defaulting to 'ascii' or sys.getfilesystemencoding()
  • _Session, _TreeNode also inherit that encoding
  • In _TreeNode.read() / save(), encode/decode names with the given encoding
  • In AcidFS.__init__(), set repository's core.quotepath config off.

If this is viable, I can make a PR (https://github.com/mete0r/acidfs/tree/feature-support-nonascii-filenames)

Any comments?

@chrisrossi
Copy link
Member

Your fix looks pretty good. Thank you! I would be happy to accept a PR.

@chrisrossi
Copy link
Member

Fixed by accepted PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants