Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that authors and maintainers are normalized. #276

Merged
merged 1 commit into from Feb 28, 2022
Merged

Ensure that authors and maintainers are normalized. #276

merged 1 commit into from Feb 28, 2022

Conversation

canburak
Copy link
Contributor

@canburak canburak commented Jan 30, 2022

This is a bugfix from my point of view, but feel free to reject if you feel this is expected.

Steps to reproduce

  1. Insert a decomposed unicode string as an author in pyproject.toml.
  2. poetry install.

Expected outcome is that the installation takes place but instead you get a mismatch on the AUTHOR_REGEX.

An alternative way to reproduce is to update user.name in ~/.gitconfig and create a new project with poetry new. Since poetry seems to read the author name from the git config, you will have the same issue with the generated pyproject.toml.

While this sounds evil or an attempt to break things on purpose, I actually had such a string in my .gitconfig but unfortunately I don't remember how it got there, I blame one of the git UIs.

Explanation

Unicode can represent characters in one of two ways, combined or decomposed. My surname contains the letter Ç:

  • Combined: LATIN CAPITAL LETTER C WITH CEDILLA
  • Decomposed: LATIN CAPITAL LETTER C + COMBINING CEDILLA.

When I have the decomposed version, looks like it is a mismatch with the AUTHOR_REGEX.

Fix

I normalized the authors and maintainers which seemed to solve the issue. Note that you will see I've decomposed the é in the sample_project but I have not changed the assert in the related test case, thus, the same test case serves my purpose.

Please note that the output is rendered same for the both versions of the strings, thus the diff seems funny.

@sonarcloud
Copy link

sonarcloud bot commented Jan 30, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@dimbleby dimbleby mentioned this pull request Feb 9, 2022
@dimbleby
Copy link
Contributor

dimbleby commented Feb 9, 2022

I'm not entirely sure, but it looks as though #136 was probably a previous try at this. That one seems to have fixed by using regex instead of re, I don't have strong feelings about whether that's a better or worse approach.

Either way it now looks to be languishing in a sea of merge conflicts.

cf python-poetry/poetry#221, I guess.

@neersighted neersighted merged commit ae64adf into python-poetry:master Feb 28, 2022
@Secrus Secrus mentioned this pull request Aug 22, 2022
2 tasks
@julienmalard
Copy link

@dimbleby Not entirely sure myself, but I think this might not fix #136 as of yet. The problem with that one was a long-standing bug in the re module (see here and here) mostly regarding South Asian scripts. I'm not sure if normalising unicode will fix these.
Anyways I'd be happy to recreate a pull request for that one if the community is interested, though I'd need a bit of help figuring out how to add regex as a dependency to poetry.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants