Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the parser regression for the package names starting with the the same substrings as keywords. #619

Closed
wants to merge 7 commits into from

Conversation

KOLANICH
Copy link

introduced in 2e5593c

Fixes: #618

Note: please never ever hand-rewrite parsers, it is very error-prone. My experience with parsers says to me that it is better to deal with them as grammars being fed to parser generators. That will notify you about conflicts and do a lot of work for you.

Shameless plug: UniGrammar is an abstraction layer for parser generators and their runtimes.

@brettcannon brettcannon self-requested a review November 25, 2022 18:36
@brettcannon
Copy link
Member

please never ever hand-rewrite parsers, it is very error-prone.

Unfortunately we needed to drop our dependency on pyparsing to be dependency-free. That led to someone contributing the hand-written parser, so we accepted the contribution. Had someone contributed a parser-generator solution that generating a static parser we could check into the repository, we would have accepted that, but alas that was not what was given to us.

Copy link
Member

@brettcannon brettcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the code is failing type checks.

packaging/_parser.py Outdated Show resolved Hide resolved
@KOLANICH
Copy link
Author

Thanks for the fix, added it.

Unfortunately we needed to drop our dependency on pyparsing to be dependency-free.

TBH: I know this issuel but it is not so big issue. There are pretty a lot of cyclic dependencies (I have to work around them with custom build scripts and PYTHONPATH modification), untying packaging from pyparsing is only one of them. Besides pip, wheel and setuptools there are also flit_core and hatchling, all having a dependency that needs them to build itself. See https://github.com/KOLANICH-tools/ipi.py/blob/master/ipi/bootstrap/setuptools.py for more infol

Had someone contributed a parser-generator solution that generating a static parser we could check into the repository, we would have accepted that, but alas that was not what was given to us.

  1. What's the problem to choose the parser generator with the smallest runtime and vendor the runtime?
  2. Why not to use regexps instead of an own tokenizer? The structure of the requirement feels like it can be tokenized with a regexp (disclaimer: I have not tried to implement it this way, so I can be wrong about it), and I guess a regexp would be more readable than the custom tokenizer.

@brettcannon
Copy link
Member

  1. What's the problem to choose the parser generator with the smallest runtime and vendor the runtime?

Could have been another option. Once again, the solution we were presented with was this hand-written parser.

2. Why not to use regexps instead of an own tokenizer?

See above. 😁

@KOLANICH
Copy link
Author

I guess you may want to try waxeye parser gen, it has a rather compact runtime. But again, TBH I currently don't consider packaging having a dependency like pyparsing (or waxeye) as a big issue.

@brettcannon brettcannon self-requested a review November 25, 2022 19:18
@pradyunsg
Copy link
Member

Hmm... this is a bit more broken. Names like v0, a0 are also being picked up incorrectly (anything that matches VERSION).

I'll dig into the hand-written parser and spend some cycles on it to see what we might want to change about it. :)

@KOLANICH KOLANICH force-pushed the fix_parser_regression branch 4 times, most recently from 1c2f26a to 3399330 Compare December 2, 2022 12:40
KOLANICH and others added 3 commits December 2, 2022 15:46
Co-Authored-By: Pradyun Gedam <pradyunsg@gmail.com>
…ding multiple ones used as a dirty hack in order not to write `None`.
…or convenience just a string is considered a set containing one element.
Implement skipping of tokens.
Whitespaces are skipped by default.
…ents:

1. with the package names starting with the the same substrings as keywords.
2. preceeded or followed by spaces.

Co-Authored-By: Brett Cannon <brett@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Requirement('installer') errors after 2bd5da391c302f2f5a18fd3e2bd9fb3b75c02e34
3 participants