Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop repeatedly normalising project names #7974

Open
pfmoore opened this issue Apr 3, 2020 · 3 comments
Open

Stop repeatedly normalising project names #7974

pfmoore opened this issue Apr 3, 2020 · 3 comments
Labels
type: maintenance Related to Development and Maintenance Processes type: refactor Refactoring code

Comments

@pfmoore
Copy link
Member

pfmoore commented Apr 3, 2020

What's the problem this feature will solve?
There's no clear picture in pip's internal code of when a project name is correctly normalised. As a result, code tends to call canonicalize_name() "just in case".

While the cost of the extra calls is small, it also makes it difficult to reason about the logic, and the more difficult it is, the more likely that people will "just call canonicalize_name() to be sure", compounding the issue.

Describe the solution you'd like
A well-documented and clear indication of which parts of the code are responsible for ensuring that project names are in canonical form, so that the rest of pip's code can confidently just use names as provided to it.

Alternative Solutions
It would be nice if we could use type checks to enforce normalisation hygene, but apparently MyPy treats type aliases as identical, so doesn't support this. It's not (IMO) obvious that it's worth the cost of having an actual NormalizedName class. I could be convinced otherwise, but I don't think it's a productive place to start.

Additional context
The issue here is similar in principle to Unicode-safety, and can be viewed in the same way - normalise at the boundaries of the code, and use normalised values exclusively throughout the internal code.

@triage-new-issues triage-new-issues bot added the S: needs triage Issues/PRs that need to be triaged label Apr 3, 2020
@pfmoore pfmoore added type: maintenance Related to Development and Maintenance Processes type: refactor Refactoring code labels Apr 3, 2020
@triage-new-issues triage-new-issues bot removed the S: needs triage Issues/PRs that need to be triaged label Apr 3, 2020
@pfmoore
Copy link
Member Author

pfmoore commented Apr 3, 2020

Note: I do not think this should be classed as a "good first issue", as it will involve working on some of the more complex parts of pip's internals, and it's not clear that there's good test coverage of the types of issue that could arise from mistakes in the refactoring.

@uranusjr
Copy link
Member

uranusjr commented Apr 5, 2020

I found python/mypy#1284 (typing.NewType) which seems to be the type-dinstinct. From PEP 484:

UserId = NewType('UserId', int)

def name_by_id(user_id: UserId) -> str:
    ...

UserId('user')          # Fails type check

name_by_id(42)          # Fails type check
name_by_id(UserId(42))  # OK

num = UserId(5) + 1     # type: int

We can either introduce this upstream to packaging, or implement our own wrapper canonicalize_name (that casts to CanonicalName) to ensure type safty.

@uranusjr
Copy link
Member

uranusjr commented Apr 5, 2020

I submitted pypa/packaging#290 for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: maintenance Related to Development and Maintenance Processes type: refactor Refactoring code
Projects
None yet
Development

No branches or pull requests

2 participants