-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Unicode characters in file paths #1426
Conversation
Thanks for the patch. However, we should include a test with this. Also, I wonder if we should actually decode |
815a7eb
to
16dd8c7
Compare
mkdocs/tests/utils/utils_tests.py
Outdated
utils.clean_directory(utf8_directory) | ||
utils.clean_directory(ascii_directory) | ||
self.assertTrue(os.path.exists(utf_8_path), False) | ||
self.assertTrue(os.path.exists(ascii_path), False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be:
self.assertFalse(os.path.exists(utf_8_path))
self.assertFalse(os.path.exists(ascii_path))
Might also want to add:
self.assertFalse(os.path.exists(utf8_utf8_directory))
self.assertFalse(os.path.exists(ascii_ascii_directory))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you're right, I use assertTrue wrong, I use it as assertEqual.
Wonder why this failed in appveypr, in Python2.7 it passed both on my local and travis
|
I ran your test in a Windows system (Appveyopr runs the tests on Windows) and I am getting the same error. I believe the issue is related to how Windows encodes filenames and the fact that you are concatenating paths with some parts as Unicode strings and some parts as byte strings. As I understand it, the path related functions in Python will do the right thing if you pass them all Unicode strings. However, in the entry = dec(entry)
if entry.startswith('.'):
continue
path = os.path.join(directory, entry) That ensures that the conditional works (as in your fix) and that a Unicode string is passed to Finally, I'm not sure if its related, but apparently Windows does not encode its file and directory names with |
mkdocs/tests/utils/utils_tests.py
Outdated
|
||
def test_unicode_clean_directory(self): | ||
temp_dir = tempfile.mkdtemp() | ||
utf8_directory = os.path.join(temp_dir, '导航'.decode('utf-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary. Up at the top of the file we have from __future__ import unicode_literals
which ensures that all literal strings defined in in the module are Unicode strings. Its basically the same as the u
prefix: u'unicode string'
.
What we need t check is that temp_dir
is Unicode. Or any other path which is not created by us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now I figure out that, when user os.path.join(arg1, arg2), if both arg1 and arg2 is str or unicode, it will work, but if arg1 and arg2 are mixed of str and unicode, python will encode the str arg with system defaultencoding, which on windows is not utf-8, and it throws exceptions.
as tempfile didn't work with utf-8 well , so I make utf-8 dir myself:
temp_dir = tempfile.mkdtemp()
utf8_temp_dir = os.path.join(dec('tmp'), dec('tmp_utf8'))
db7e8a7
to
b9bc336
Compare
I wonder if it would help to use backports.os which provides 3.2+ implementations of os.fsdecode and os.fsencode. That way we would get consistent behavior across Python 2 and 3. |
Another solution is to use a Path object library, such as path.py or pathlib. By wrapping all paths in the Path object, we get support for all path related operations out-of-the-box. It would allow us to remove much of our current utils methods. Of course, it would require a larger refactor. Perhaps this would make sense as part of the Pages Refactor Here are some differences between the two libs, in no particular order:
|
While researching another issue I came upon this:
So, it sounds like we don't need to bother with determining the system's encoding. Just be sure to always pass Unicode strings to the various path related functions and we should be okay. |
when I have unicode directory name under docs, such as '导航', when I change the content under the direcory, and the mkdoc server auto detect the change and make build, it will throw UnicodeDecodeError: