Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip install --editable git+... on Windows OS converts line returns from unix to windows style even in binary files in the downloaded data #11952

Closed
1 task done
lrq3000 opened this issue Apr 12, 2023 · 4 comments
Labels
type: support User Support

Comments

@lrq3000
Copy link

lrq3000 commented Apr 12, 2023

Description

So this is a hairy bug, I had a very hard time tracking it down, but I think I managed to produce a minimal example that will ease tracking it down a lot.

When using pip install --editable git+... on some Windows OSes (not all, but both windows-2019 and windows-2022 which are Windows Server versions available on GitHub Actions), then pip will download from the remote git repository but change all "\n" characters into "\r\n", even in binary files where "\n" does not necessarily equate to a line return! So this can have a good potential to break things, and all silently.

Here is the minimal example repository:

https://github.com/lrq3000/dummypiplinereturnbug

Especially look at the tests subfolder and the only test script:

https://github.com/lrq3000/dummypiplinereturnbug/blob/main/src/dummypiplinereturnbug/tests/test_gen_files_with_line_returns.py

There are 2 tests:

  • one which generates a text file with unix line returns and then compares it with the same file but pregenerated on my machine and saved in the python package. In principle, both files should be exactly the same. In practice, on some Windows OSes, the pregenerated file ends up having Windows-style line return characters, with none of my code modifying this file!
  • the second test is the same process but with a binary file instead. Which is worse, because it's the same result, so it means that binary data files are also tampered.

Now the second part of this minimal example is the GitHub workflow:

https://github.com/lrq3000/dummypiplinereturnbug/blob/main/.github/workflows/ci-build-downstream.yml

In it, I tried to determine the minimal conditions to make the bug happen, which so far seems to include:

  • pip install --editable git+...
  • on windows-2019 or windows-2022 on GitHub Actions

You can have a look at the results GitHub Actions runs, such as this one:

https://github.com/lrq3000/dummypiplinereturnbug/actions/runs/4673779912/jobs/8277352051

The workflow runs on ubuntu-latest, macos-latest (both succeed, line returns are not changed), then windows-2019 and windows-2022 (both fail). I could not reproduce the issue on my Windows 10 machine. But I first noticed this issue in another project, pyFileFixity, with a different code and files. So it seems very reproducible given the right set of conditions.

Note that in the workflow, I added a step to display the files contents with a hexadecimal view, so you can check for yourself the changes that happened in each of the 4 data files (2 generated, 2 pre-generated, only the 2 pre-generated get tampered).

Also the workflow updates pip before doing anything.

Expected behavior

No response

pip version

23.0.1

Python version

3.11

OS

Windows

How to Reproduce

See above.

Output

No response

Code of Conduct

@lrq3000 lrq3000 added S: needs triage Issues/PRs that need to be triaged type: bug A confirmed bug or unintended behavior labels Apr 12, 2023
@uranusjr
Copy link
Member

This is Git's doing, not pip. Git by default detect text files and convert their line endings; if the detection algorithm fails for you, there are various ways to disable line ending conversation altogether, or manually mark files as text or non-text. Some discussions available here: https://www.reddit.com/r/git/comments/s8kr76/comment/hth54to/

Closing since this is out of pip's scope.

@uranusjr uranusjr closed this as not planned Won't fix, can't repro, duplicate, stale Apr 12, 2023
@uranusjr uranusjr added type: support User Support and removed type: bug A confirmed bug or unintended behavior S: needs triage Issues/PRs that need to be triaged labels Apr 12, 2023
@lrq3000
Copy link
Author

lrq3000 commented Apr 14, 2023

Thank you for pointing this out, this was indeed the culprit! Although I was expecting that the issue stemmed from a dependency of pip and not pip itself, I am a bit surprised as I did not expect pip to use git under the hood to checkout from a git repo, like git does not use chromium to checkout https addresses. But it also means that pip can likely do a lot more with git repos than I expected.

Anyway, in practice, if others have the same issue, here is how I fixed it: simply add a .gitattributes file at the root of your git repository with the following content:

* -text

This will disable automatic line endings conversion for all files. If you want per filetype conversion, templates are available here.

@uranusjr
Copy link
Member

When you use HTTP(S) in Git, it actually uses cURL under the hood, so the situation is not that dissmilar actually 🙂 And similar to pip respecting Git configurations, Git also respects HTTP(S) configurations in e.g. .netrc.

@lrq3000
Copy link
Author

lrq3000 commented Apr 15, 2023 via email

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: support User Support
Projects
None yet
Development

No branches or pull requests

2 participants