Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid nested quantifiers with overlapping character space on git url parsing (#1902 #1913

Merged
merged 5 commits into from Jan 22, 2020

Conversation

@finswimmer
Copy link
Member

finswimmer commented Jan 17, 2020

This PR started as a fix for #1902 to avoid nested quantifiers with overlapping character space during git url parsing.

It' result is a rework of the pattern matching and adding a test for parse_url to make this part more reliable.

Fixes: #1902

@finswimmer finswimmer requested a review from python-poetry/core Jan 17, 2020
@k4nar

This comment has been minimized.

Copy link
Contributor

k4nar commented Jan 20, 2020

Is there a coverage in unit test for those regexp? It looks like there should be 😄 .

@sdispater

This comment has been minimized.

Copy link
Member

sdispater commented Jan 20, 2020

@k4nar Do you mean adding the cases mentioned in the original issue #1902? If so, I concur that we should add tests that cover these.

Copy link
Member

sdispater left a comment

We should add tests that cover the problematic strings given as examples in #1902

@@ -22,7 +22,7 @@
re.compile(
r"^(git\+)?"
r"(?P<protocol>https?|git|ssh|rsync|file)://"
r"(?:(?P<user>.+)@)*"
r"(?:(?P<user>\w+)@)*"

This comment has been minimized.

Copy link
@sdispater

sdispater Jan 20, 2020

Member
Suggested change
r"(?:(?P<user>\w+)@)*"
r"(?:(?P<user>[a-zA-Z0-9_.-])@)*"

I think we only want standard characters here with the addition of _, . and -. If you think \w is still needed, however I don't think so, adding those three characters is still needed.

This comment has been minimized.

Copy link
@finswimmer

finswimmer Jan 20, 2020

Author Member

What ever works is fine for me :)

I thinks we should use the same pattern for the same groups. For example resource is one time [a-z0-9_.-]* and one time [\w.\-]+. But I'm not totally sure, what is allowed in each group. Any suggestions?

This comment has been minimized.

Copy link
@sdispater

sdispater Jan 20, 2020

Member

I agree we should have consistency between the different regexps. I think for resource [a-z0-9_.-]+ should be enough.

This comment has been minimized.

Copy link
@finswimmer

finswimmer Jan 21, 2020

Author Member

I did some rework to be consistent within the pattern. I also added tests for parse_url which uncovered some problems with regexes. Now only the first of the two pattern for matching "git+protocol://" and for matching "user@" are used for our test data. Do you think we can remove the two other? The less regex we need, the better 😃

@finswimmer finswimmer requested a review from sdispater Jan 21, 2020
@sdispater

This comment has been minimized.

Copy link
Member

sdispater commented Jan 21, 2020

@finswimmer Unless I'm mistaken, the affected strings mentioned (like "https://" + "@" * 64 + "!") in #1902 are not present in the tests. I think we should add them to avoid regressions.

@finswimmer

This comment has been minimized.

Copy link
Member Author

finswimmer commented Jan 21, 2020

@sdispater I've added a test for this. But something went wrong with the checks on MacOS?

@sdispater

This comment has been minimized.

Copy link
Member

sdispater commented Jan 21, 2020

@finswimmer Yes. I don't know what happened since nothing has changed, but it seems that the cached venv for MacOS is missing pip somehow. And since there is not easy way to clear the Github Actions cache at the moment I will need to make a PR to invalidate it.

@sdispater

This comment has been minimized.

Copy link
Member

sdispater commented Jan 21, 2020

@finswimmer Cache issues should be fixed on master. You will have to rebase your branch to pick up the fixes

@finswimmer finswimmer force-pushed the finswimmer:issue-01902-git-parsing branch from 9cd7961 to 35291b9 Jan 22, 2020
Copy link
Member

sdispater left a comment

👍

@sdispater sdispater merged commit 2df0d2c into python-poetry:master Jan 22, 2020
16 checks passed
16 checks passed
Linting
Details
Linux (2.7)
Details
Linux (3.5)
Details
Linux (3.6)
Details
Linux (3.7)
Details
Linux (3.8)
Details
MacOS (2.7)
Details
MacOS (3.5)
Details
MacOS (3.6)
Details
MacOS (3.7)
Details
MacOS (3.8)
Details
Windows (2.7)
Details
Windows (3.5)
Details
Windows (3.6)
Details
Windows (3.7)
Details
Windows (3.8)
Details
MrGreenTea added a commit to MrGreenTea/poetry that referenced this pull request Feb 14, 2020
…parsing (python-poetry#1902 (python-poetry#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.