Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default to utf-8 for csv opening. #5712

Closed
wants to merge 3 commits into from
Closed

Default to utf-8 for csv opening. #5712

wants to merge 3 commits into from

Conversation

julienmalard
Copy link

@julienmalard julienmalard commented Aug 17, 2018

This solves problems when installing packages with unicode module or package names on Windows and resolves this issue originally reported to setuptools:
pypa/setuptools#1399

@@ -64,7 +64,7 @@ def rehash(path, blocksize=1 << 20):
return (digest, length)


def open_for_csv(name, mode):
def open_for_csv(name, mode, encoding='utf8'):
Copy link
Member

@benoit-pierre benoit-pierre Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoding is not used... You really need to add a test for this.

Copy link
Author

@julienmalard julienmalard Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had copied over the changes from my local copy of pip.
Would you have any guidance as to where/how to add a test for this case? The error occurs when installing wheels which have unicode module names.

Copy link
Member

@benoit-pierre benoit-pierre Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a minimal project that triggers the bug, add its sources to tests/data/src/ and a resulting wheel to tests/data/packages/, then add a test similar to test_basic_install_from_wheel in tests/functional/test_install_wheel.py. Do that without your changes, run your new test with something like tox -- -k test_install_wheel_with_unicode_filenames, and make sure you can reproduce the failure, then fix the test by adding your changes. Check Python 2 still work, and push.

You also need to add a news entry: https://pip.pypa.io/en/stable/development/#adding-a-news-entry

Copy link
Member

@benoit-pierre benoit-pierre Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And encoding is not a valid open keyword argument in Python 2.

Copy link
Author

@julienmalard julienmalard Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the guidance! I will work on this and get back to you.

Copy link
Author

@julienmalard julienmalard Jan 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
I just made some changes for open and added a test. Will see how the pull request works now (I am not working from a Windows machine at the moment).
Could you help with the wheel test please? I could not figure out how to point a new wheel test to the new unicode_file example in the data folder.
Also, mypy seems to be failing but I am not sure why.

Thanks!

Copy link
Author

@julienmalard julienmalard Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Any help please?
Many thanks for your support!

@@ -71,7 +71,7 @@ def open_for_csv(name, mode, encoding='utf8'):
else:
nl = {'newline': ''}
bin = ''
return open(name, mode + bin, **nl)
return open(name, mode + bin, encoding=encoding, **nl)
Copy link
Member

@benoit-pierre benoit-pierre Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you really need to add a test: what about Python 2 support (the file is opened in binary mode)?

Copy link
Member

@pradyunsg pradyunsg left a comment

As noted by @benoit-pierre

@BrownTruck
Copy link
Contributor

BrownTruck commented Dec 16, 2018

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

@BrownTruck BrownTruck added the needs rebase or merge PR has conflicts with current master label Dec 16, 2018
@pypa-bot pypa-bot removed the needs rebase or merge PR has conflicts with current master label Jan 1, 2019
@pradyunsg
Copy link
Member

pradyunsg commented Jan 2, 2019

I think that merge went wrong @julienmalard. :/

@julienmalard
Copy link
Author

julienmalard commented Jan 2, 2019

Oops...
I guess could simply clone a new branch from the main repo but then would have to reopen a new pull request and close this one.
Would you happen to know how I could fix it instead? (I suppose from the terminal?)

Thanks and sorry,
Julien

@pradyunsg
Copy link
Member

pradyunsg commented Jan 3, 2019

This should be a simple rebase. You'll need to fix a merge conflict, for updating the type signature for open_for_csv to include the encoding parameter type.

There's instructions for this at https://pip.pypa.io/en/latest/development/contributing/#updating-your-branch, but if they're not super helpful, please feel free to ping me and I'll be happy to provide step-by-step guidance for the rebase to you. :)

@julienmalard
Copy link
Author

julienmalard commented Jan 8, 2019

@pradyunsg
Copy link
Member

pradyunsg commented Jan 9, 2019

Sure!

This is a simple rebase, you have to fix one merge conflict and ignore one patch manually.

  1. To start the rebase, run:

    $ git rebase master
  2. The rebase will stop due to a merge conflict, around line 83 in pip._internal.wheel, change that section of code to be:

    def open_for_csv(name, mode, encoding='utf8'):
        # type: (str, Text, str) -> IO

    Then, run:

    $ git add src/pip/_internal/wheel.py
    

    This will add this resolved merge conflict to git's "staging area", so we can proceed now.

  3. To continue the rebase, you will run:

    $ git rebase --continue
    
  4. After that, it will complain about another commit. This time, you want to ignore it since it's applied. You should just run:

    $ git rebase --skip
    

    That will proceed and complete the rebase.

  5. Now you can do a git push --force (or however you usually push, with --force flag added) to push the change to your repository's branch, which will update this PR. :)

Feel free to ping me in case something is not clear.

This solves problems when installing packages with unicode module or package names.
@julienmalard
Copy link
Author

julienmalard commented Jul 14, 2019

Hello,
Any news? (See my comment above asking for clarification on how to implement some of the changes you had requested.)
Many thanks for your help!

@xavfernandez
Copy link
Member

xavfernandez commented Jul 18, 2019

Many thanks for your help!

From this PR, it is unclear what you are trying to fix.

As asked in pypa/setuptools#1399 it would be helpful to produce a minimal example failing in your case.

Your tests/data/src/unicode_files/ doesn't seem to be it since it fails with:

$ pip install  tests/data/src/unicode_files/
Looking in indexes: https://pypi.org/simple
Processing ./tests/data/src/unicode_files
    Complete output from command python setup.py egg_info:
    running egg_info
    error: Invalid distribution name or version syntax: --0.0.1
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-req-build-a7fdv7vd/

which does not seem to fail on a csv.

@julienmalard
Copy link
Author

julienmalard commented Jul 18, 2019

Thanks for the quick response! Will investigate and get back to you soon.

@@ -80,15 +81,15 @@ def rehash(path, blocksize=1 << 20):
return (digest, str(length)) # type: ignore


def open_for_csv(name, mode):
def open_for_csv(name, mode, encoding='utf8'):
# type: (str, Text) -> IO
Copy link
Member

@pradyunsg pradyunsg Jul 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# type: (str, Text, str)

@chrahunt chrahunt added the S: awaiting response Waiting for a response/more information label Aug 16, 2019
@chrahunt
Copy link
Member

chrahunt commented Sep 17, 2019

Since it has been some time without progress here I will close this, but anyone should feel free to re-raise this as a PR or issue if the information above can be provided.

Thanks!

@chrahunt chrahunt closed this Sep 17, 2019
@julienmalard
Copy link
Author

julienmalard commented Oct 2, 2019

Hello,

Any chance this could be reopened? I am trying to fix this PR and it would be helpful to see if the tests now pass.

Thanks,
Julien

@chrahunt
Copy link
Member

chrahunt commented Oct 2, 2019

GitHub won't allow reopening at this time. The hover message on the disabled "Reopen and comment" button states: "The master branch was force-pushed or recreated."

I would try creating a new branch off of your current commit, fetch and overwrite the master from the main pypa/pip repository, and then rebase your new branch on your local master. It would look something like (assuming you have the main pypa/pip repo set as a remote called upstream):

git checkout -b my-fixes
git fetch-f upstream master:master
git rebase master

Then you can push my-fixes back to your GitHub repository and submit a new PR, or git branch --move master my-fixes then git push -f origin master to update this PR.

@julienmalard
Copy link
Author

julienmalard commented Oct 4, 2019

Done, thank you! I continued it in a new pull request (#7138).

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Nov 13, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Nov 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation S: awaiting response Waiting for a response/more information
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants