Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent change to support bytes and python3 is breaking Windows build #106

Closed
ikus060 opened this issue Aug 25, 2019 · 13 comments
Closed

Recent change to support bytes and python3 is breaking Windows build #106

ikus060 opened this issue Aug 25, 2019 · 13 comments
Assignees
Labels
Milestone

Comments

@ikus060
Copy link
Contributor

ikus060 commented Aug 25, 2019

Related to #105, the recent pull request to support python3 and make uses of bytes completely broke the Windows build.
All the function os.* on windows platform are only supporting str, not bytes.
All the function from pywin32 are only supporting str, not bytes.

@ericzolf As this point, I'm not sure what to do with the Path. Bytes or Str make it a problem. On Linux Path are clearly manage as bytes, on Windows.

If I was managing this project, I would revert back the py2to3 pull request and work on it again and make sure to make the unittest working on all the platform: Linux, Windows And OSx. Basically, the recent changes broke the build.

@ottok
Copy link
Contributor

ottok commented Aug 25, 2019

Can you please test on Windows if the Python 3 built-in Path object works as expected? See #53 (comment)

I didn't understand the problem fully and I have no test case to verify anything, so this using Pathlib was just a general suggestion.

I do understand your viewpoint on reverting the merge, but as stated in the last paragraph of #53 (comment) Eric finds it easier to have this on master and then clean up and fix from there, so we will go with that as Eric wrote the Python 3 migration. We could revert though if Eric finds that fixing does not work out.

In the mean time, getting a CI in place is a big priority and we are currently working on both Travis-CI (me) and Appveyor (you). Related issues and PRs (just for reminder): #48, #105, #94, #96.

@ikus060
Copy link
Contributor Author

ikus060 commented Aug 25, 2019

Can you please test on Windows if the Python 3 built-in Path object works as expected? See #53 (comment)
I didn't understand the problem fully and I have no test case to verify anything, so this using Pathlib was just a general suggestion.

I still trying to figure out the extend of this problem my self. Long story short, is bytes are not working well on Windows.

On the same system, different version of python will return different bytes representation of the same files ! The point of using bytes, at least on Linux, was to take the unaltered version of the paths regardless of the encoding defined by the user (LANG environment variable). But on Windows it seams more difficult. NTFS is UTF-16, but python is reporting something different depending of the version used.

Python 3.7

Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 19:29:22) [MSC v.1916 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.getfilesystemencoding()
'utf-8'
>>> os.lstat('évelyne.txt')
os.stat_result(st_mode=33206, st_ino=10414574138403302, st_dev=1286172926, st_nl
ink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1566728707, st_mtime=1566728707,
st_ctime=1566728707)
>>> os.fsencode('évelyne.txt')
b'\xc3\xa9velyne.txt'
>>> os.lstat(b'\xc3\xa9velyne.txt')
os.stat_result(st_mode=33206, st_ino=10414574138403302, st_dev=1286172926, st_nl
ink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1566728707, st_mtime=1566728707,
st_ctime=1566728707)
>>>

Python 3.5

Python 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.getfilesystemencoding()
'mbcs'
>>> os.lstat('évelyne.txt')
os.stat_result(st_mode=33206, st_ino=10414574138403302, st_dev=1286172926, st_nl
ink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1566728707, st_mtime=1566728707,
st_ctime=1566728707)
>>> os.fsencode('évelyne.txt')
b'\xe9velyne.txt'
>>> os.lstat(b'\xe9velyne.txt')
os.stat_result(st_mode=33206, st_ino=10414574138403302, st_dev=1286172926, st_nl
ink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1566728707, st_mtime=1566728707,
st_ctime=1566728707)

Compared to Python 2.7

Python 2.7.16 (v2.7.16:413a49145e, Mar  4 2019, 01:30:55) [MSC v.1500 32 bit (In
tel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, sys
>>> sys.getfilesystemencoding()
'mbcs'
>>> os.lstat(u'évelyne.txt')
nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0L, st_nlink=0, st_uid=0, st_gid
=0, st_size=0L, st_atime=1566728707L, st_mtime=1566728707L, st_ctime=1566728707L

I found the relevant PEP (https://www.python.org/dev/peps/pep-0529/) that explain the switch to UTF-8 for Python 3.6. It put rdiff-backup in a bad position in regards to how path are managed on Linux vs Windows.

My thinking is as follow:

  1. Keep path as bytes in rdiff-backup. This is the canonical representation of a path on Linux as I've already stated in a previous mail chain
  2. On Windows, will need to replace all the os.* call by our own implementation that will take care of the encoding and decoding according to the python version. Question here is which encoding ?
    • If we want to be backward compatible, we should stick with mbcs.
    • If we change the encoding for utf-16-le, which is the real canonical representation of the path on Windows, all the path manipulation become a mess, because those function assume '/' separator is a single byte, which is not the case for utf-16.
    • If we keep UTF-8 like PEP is proposing, it solves the '/' seperator issue, but we can't represent all the invalid characters. It also introduce a backward incompatibility with any repositories generated by rdiff-backup < 1.3.3.

My recommendation is to use mbcs

@ottok @ericzolf

In the mean time, getting a CI in place is a big priority and we are currently working on both Travis-CI (me) and Appveyor (you). Related issues and PRs (just for reminder): #48, #105, #94, #96.

Should we get the CI merged even if the build is broken ? At this point I would say yes. And our goal should be to make is pass ASAP.

@ericzolf
Copy link
Member

Adding to the complexity of the discussion, I'm not sure what happens if we consider cross-OS communication, e.g. back-up from Windows to Linux Server and vice-versa.

@ikus060
Copy link
Contributor Author

ikus060 commented Aug 25, 2019 via email

@ericzolf
Copy link
Member

I'm not sure this is that easy, the code was a pretty mess in terms of unicode on and off, but the fact that everything was a string was hiding a lot of the complexity, but we can try. I would like to push this topic after having cleaned the code in regard to PEP8 before though (editing files currently is a mess in Vim, and the tox tests are failing on flake8).

@ikus060
Copy link
Contributor Author

ikus060 commented Aug 26, 2019

I would like to push this topic after having cleaned the code in regard to PEP8 before though (editing files currently is a mess in Vim, and the tox tests are failing on flake8).

We clearly have different priorities in that regards. Having a working version in master branch has a higher priority on my list then making the code PEP8 compliant.

@ikus060 ikus060 added the bug label Aug 26, 2019
@ericzolf
Copy link
Member

I would create myself a clean windows VM to understand what's going on, but I need help as I haven't touched windows for years and never for Python development.

Could you describe how to create a Dev and test environment on Windows? Adding a chapter or two to DEVELOP.adoc would make it useful for others than me.

@ikus060
Copy link
Contributor Author

ikus060 commented Aug 28, 2019 via email

@ericzolf
Copy link
Member

ericzolf commented Aug 28, 2019

Not sure it helps when you want to try things interactively, but I'll look into it in due time.

@ericzolf
Copy link
Member

Looking into this bug, few questions:

  1. I don't understand the statement that os.* does NOT work with bytes under Windows. Can you be more precise about what doesn't work? Even in your example, the result between bytes and strings in Python 3 look the same to me.
  2. Can you provide some error examples of what doesn't work when you run the built rdiff-backup? Can you provide the binary?
  3. If the behaviour of Python is erratic on Windows, we should focus on one version of Python, and it should be the latest one, i.e. 3.7. Once we've fixed it, we can still try to extend the support to older versions (not sure it is a priority as we would offer a packaged/compiled rdiff-backup with its own Python embedded).

Anyway, I've got currently python3.7.4 running, all modules installed. Need now to get librsync compiled.
Working branch is https://github.com/rdiff-backup/rdiff-backup/tree/ericzolf-windows-dev-env/tools/windows for reference.

@ericzolf
Copy link
Member

So that we're aligned, some more information about versions:

  • Windows 10
  • Python 3.7.4
  • py2exe 0.9.2.2
  • pywin32 224
  • VisualStudio 2017 - MSBuild 15.9.21.664
  • as written in another thread, I'll most probably use librsync in a more recent version than 0.9.7 - I'm hesitating between the version on Fedora (2.0.2), which I know should work, and the latest available at https://github.com/librsync/librsync/releases i.e. 2.1.0

@ikus060
Copy link
Contributor Author

ikus060 commented Sep 21, 2019 via email

@ericzolf ericzolf added this to the 2.0.0 milestone Oct 2, 2019
ericzolf added a commit that referenced this issue Oct 19, 2019
- use fsdecode on bytes paths when using win32 functions
- ignore rdiff-backup.spec created by Windows build
- use bytes instead of strings for win_acls
- replace os.path.join and os.getcwdb with own RORPath methods to
  make sure slashes are used under Windows.
- try to make life of a Windows developer on Linux easier:
    - add 7zip to Windows build to be able to compress/uncompress
    - try to add broken cygwin chocolatey package and comment it
    - create helper script for debugging rdiff-backup from build
    - fetch automatically the rdiff-backup.exe binary

rdiff-backup can backup, increment and restore backups with this commit.
More tests should follow but for now it closes issue #106
ericzolf added a commit that referenced this issue Oct 25, 2019
- use fsdecode on bytes paths when using win32 functions
- ignore rdiff-backup.spec created by Windows build
- use bytes instead of strings for win_acls
- replace os.path.join and os.getcwdb with own RORPath methods to
  make sure slashes are used under Windows.
- try to make life of a Windows developer on Linux easier:
    - add 7zip to Windows build to be able to compress/uncompress
    - try to add broken cygwin chocolatey package and comment it
    - create helper script for debugging rdiff-backup from build
    - fetch automatically the rdiff-backup.exe binary

rdiff-backup can backup, increment and restore backups with this commit.
More tests should follow but for now it closes issue #106
@ericzolf
Copy link
Member

Closing this bug based on PR #141 - just open a new one with specific error messages when discovering new issues with the resulting build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants