Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid nested quantifiers with overlapping character space on git url parsing (#1902 #1913

Merged
merged 5 commits into from
Jan 22, 2020

Conversation

finswimmer
Copy link
Member

@finswimmer finswimmer commented Jan 17, 2020

This PR started as a fix for #1902 to avoid nested quantifiers with overlapping character space during git url parsing.

It' result is a rework of the pattern matching and adding a test for parse_url to make this part more reliable.

Fixes: #1902

@finswimmer finswimmer requested a review from a team January 17, 2020 18:31
@k4nar
Copy link
Contributor

k4nar commented Jan 20, 2020

Is there a coverage in unit test for those regexp? It looks like there should be 😄 .

@sdispater
Copy link
Member

@k4nar Do you mean adding the cases mentioned in the original issue #1902? If so, I concur that we should add tests that cover these.

Copy link
Member

@sdispater sdispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tests that cover the problematic strings given as examples in #1902

@@ -22,7 +22,7 @@
re.compile(
r"^(git\+)?"
r"(?P<protocol>https?|git|ssh|rsync|file)://"
r"(?:(?P<user>.+)@)*"
r"(?:(?P<user>\w+)@)*"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r"(?:(?P<user>\w+)@)*"
r"(?:(?P<user>[a-zA-Z0-9_.-])@)*"

I think we only want standard characters here with the addition of _, . and -. If you think \w is still needed, however I don't think so, adding those three characters is still needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ever works is fine for me :)

I thinks we should use the same pattern for the same groups. For example resource is one time [a-z0-9_.-]* and one time [\w.\-]+. But I'm not totally sure, what is allowed in each group. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should have consistency between the different regexps. I think for resource [a-z0-9_.-]+ should be enough.

Copy link
Member Author

@finswimmer finswimmer Jan 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some rework to be consistent within the pattern. I also added tests for parse_url which uncovered some problems with regexes. Now only the first of the two pattern for matching "git+protocol://" and for matching "user@" are used for our test data. Do you think we can remove the two other? The less regex we need, the better 😃

@sdispater
Copy link
Member

@finswimmer Unless I'm mistaken, the affected strings mentioned (like "https://" + "@" * 64 + "!") in #1902 are not present in the tests. I think we should add them to avoid regressions.

@finswimmer
Copy link
Member Author

@sdispater I've added a test for this. But something went wrong with the checks on MacOS?

@sdispater
Copy link
Member

@finswimmer Yes. I don't know what happened since nothing has changed, but it seems that the cached venv for MacOS is missing pip somehow. And since there is not easy way to clear the Github Actions cache at the moment I will need to make a PR to invalidate it.

@sdispater
Copy link
Member

@finswimmer Cache issues should be fixed on master. You will have to rebase your branch to pick up the fixes

Copy link
Member

@sdispater sdispater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@sdispater sdispater merged commit 2df0d2c into python-poetry:master Jan 22, 2020
sdispater added a commit that referenced this pull request Feb 21, 2020
* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>
sdispater added a commit that referenced this pull request Mar 20, 2020
* export: fix exporting extras sub-dependencies (#1294)

* Support POETRY_HOME for install (#794)

Allow the `POETRY_HOME` environment variable to be passed during installation to change the default installation directory of `~/.poetry`:

```
POETRY_HOME=/etc/poetry python get-poetry.py
```

* * check if relative filename is in excluded file list (#1459)

* * check if relative filename is in excluded file list
* removed find_excluded_files() method from wheel.py

* added test for excluding files in wheels

* creating an own test data folder, for testing excluding files by pyproject.toml

* use as_posix() to respect windows file path delimiters

* Exclude nested items (#784) (#1464)

* This PR impliments the feature request #784.

When a folder is explicit defined in `pyproject.toml` as excluded, all nested data, including subfolder, are excluded. It is no longer neccessary to use the glob `folder/**/*`

* use `Path` instead of `os.path.join` to create string for globbing

* try to fix linting error

* create glob pattern string by concatenating and not using Path

* using `os.path.isdir()`` for checking of explicit excluded name is a folder, because pathlib's `is_dir()` raises in exception under windows of name contains globing characters

* Remove nested data when wildcards where used.

Steps to do this are:
1. expand any wildcard used
2. if expanded path is a folder append  **/* and expand again

* fix linting

* only glob a second time if path is dir

* implement @sdispater 's suggestion for better readability

* fix glob for windows?

* On Windows, testing if a path with a glob is a directory will raise an OSError

* pathlibs  glob function doesn't return the correct case (https://bugs.python.org/issue26655). So switching back to  glob.glob()

* removing obsolete imports

* Update dependencies

* Deprecate allows-prereleases in favor of allow-prereleases for consistency

* Fix tests for Python 2.7

* Fix linting

* Fix linting

* Fix linting

* Fix typing import

* Correct a couple typos in get-poetry.py (#573)

* Docs: `self:update` changed to `self update` (#1588)

* Fix GitHub actions cache issues on develop (#1918)

* Fix Github actions cache issues

* Fix Github Actions cache issues (#1928)

* Add --source option to "poetry add" (#1912)

* Add --source option to 'poetry add'

* Add tests for 'poetry add --source'

* Merge master into develop (#2070)

* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>

* pre-commit: replace isort mirror with isort upstream (#2118)

The isort pre-commit mirror has been deprecated. This change updates
configuration to use the upstream package repository instead of the
mirror.

* Add cache list command (#1187)

* Add poetry.locations.REPOSITORY_CACHE_DIR

The repository cache directory is used in multiple places in the
codebase. This change ensures that the value is reused.

* Add cache list command

This introduces a new cache sub-command that lists all available
caches.

Relates-to: #1162

Co-authored-by: Tom Milligan <tommilligan@users.noreply.github.com>
Co-authored-by: David Cramer <dcramer@users.noreply.github.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Kyle Altendorf <sda@fstab.net>
Co-authored-by: Justin Mayer <entroP@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andrii Maletskyi <andrii.maletskyi@gmail.com>
Co-authored-by: Arun Babu Neelicattu <arun.neelicattu@gmail.com>
sdispater added a commit that referenced this pull request Mar 20, 2020
* Fix Github actions cache issues (#1908)

* Fix case of `-f` flag

* Make it clearer what options to pass to `--format`

* fix (masonry.api): `get_requires_for_build_wheel` must return additional list of requirements for building a package, not listed in `pyproject.toml` and not dependencies for the package itself (#1875)

fix (tests): adopted tests

* Lazy Keyring intialization for PasswordManager (#1892)

* Fix Github Actions cache issues (#1928)

* Avoid nested quantifiers with overlapping character space on git url parsing (#1902 (#1913)

* fix (git): match for `\w` instead of `.` for getting user

* change (vcs.git): hold pattern of the regex parts in a dictionary to be consistent over all regexs

* new (vcs.git): test for `parse_url` and some fixes for the regex pattern

* new (vcs.git): test for `parse_url` with string that should fail

* fix (test.vcs.git): make flake8 happy

* fix: correct parsing of wheel version with regex. (#1932)

The previous regexp was only taking the first integer of the version number,
this presented problems when the major version number reached double digits.

Poetry would determine that the version of the dependency is '1', rather than,
ie: '14'. This caused failures to solve versions.

* Fix errors when using the --help option (#1910)

* Fix how repository credentials are retrieved from env vars (#1909)

# Conflicts:
#	poetry/utils/password_manager.py

* Fix downloading packages from Simplepypi (#1851)

* fix downloading packages from simplepypi

* unused code removed

* remove unused imports

* Upgrade dependencies for the 1.0.3 release (#1965)

* Bump version to 1.0.3 (#1966)

* Fix non-compliant Git URL matching

RFC 3986 § 2.3 permits more characters in a URL than were matched. This
corrects that, though there may be other deficiencies. This was a
regression from v1.0.2, where at least “.” was matched without error.

* Update README.md "Updating Poetry"

Currently the note in "Updating Poetry" is different from the one below in "Enable tab completion for Bash, Fish, or Zsh". This MR is to make them more consistent.

* init: change dev dependency prompt

* Fix CI issues (#2069)

* fix (setup_reader): check if `func.value` has attr `id` (#2041)

* fix(git): get commit sha of git commit from annotated tags (#1948)

* fix(git): have annotated tags resolve to the commit sha

* fix(git): fix quote

* fix(git): change to rev-parse

* fix: use correct badge on README (#2065)

* Fix #1791: Load repository URL from config (#2061)

* Fix #1791: Load repository URL from config

* Ran black to fix linting errors

* Add test for repo URL env variable

* Changed schema to support url in multi dependencies (#2035)

* Fix handling of forward slashes and url encoding in credentials (#1911)

* Add support for forward slashes and url encoding in credentials

* Remove extra newline

* Remove unquote

* Bump actions/checkout from v1 to v2 (#2075)

* Update release.yml

* Update main.yml

* Fix vendor package as installed package (#1883) (#1981)

* Fix vendor package as installed package (#1883)

* import from

Co-Authored-By: Sébastien Eustace <sebastien.eustace@gmail.com>

* test vendor package as installed

* refactor

* remove blank line

Co-authored-by: Sébastien Eustace <sebastien.eustace@gmail.com>

* fix(utils.env): import cli_run from virtualenv (#2096)

* fix(utils.env): import cli_run from virtualenv if create_environment import failes

* fix (utils.env): added accidentally removed code

* list .venv when it exists (#1762)

* list .venv when it exists

* only list when in-project is true

* missing config

* move logic to manager.list

* Add .venv when it exists

* fix: exclude subpackage from `setup.py` if `__init__.py` is excluded (#1009) (#1626)

* fix: exclude subpackage from `setup.py` if `__init__.py` is excluded

Fixes: #1009

* fix: added missing test data

* fix: lint test data

* change (sdist.git): exclude folders with no python file

* fix (sdist.git): make black happy

* get_vcs starts searching git folder from tmp dir instead of project (#1946) (#1947)

* fix (builder): take `self._original_path` if available to find `.git` folder

* change (vcs): use `git rev-parse --show-toplevel` to find git root folder

* fix (vcs): change back to original working dir after finding vcs

* change (builder): introduce self._original_path to keep original path
if(vcs): resolve directory for `get_vcs`

* Normalize author name unicode before matching (#2006)

* Fix accented characters not being matched in author name

Fixes #2004

* Normalized the strings instead of modifying the pattern

* Applied isort & black

* Fix the url used for installation when fallbacking on PyPI (#2099)

* Upgrade dependencies before the 1.0.4 release (#2100)

* Upgrade dependencies before the 1.0.4 release (#2103)

* Release 1.0.4 (#2101)

* Update release script

* Bump version to 1.0.4

* Fix release script (#2104)

* Fix VCS when git is not in PATH

* Upgrade dependencies before the 1.0.5 release (#2111)

* Bump version to 1.0.5 (#2112)

* Fix GitHub URL for black

Black is now officially supported by the Python Software Foundation

* Update Contributing.md* Fix markdown formatting* Update link to official website FAQ

* Update managing-environments.md

Co-authored-by: brandonaut <brandon@hubermx.com>
Co-authored-by: finswimmer <finswimmer77@gmail.com>
Co-authored-by: Yannick PÉROUX <yannick.peroux@gmail.com>
Co-authored-by: Edward George <edwardgeorge@gmail.com>
Co-authored-by: Jan Škoda <skoda@jskoda.cz>
Co-authored-by: Andrew Marshall <andrew@johnandrewmarshall.com>
Co-authored-by: Andrew Selzer <andrewfselzer@gmail.com>
Co-authored-by: Andriy Maletsky <andriy.maletsky@gmail.com>
Co-authored-by: Julien Lhermitte <705366+jrmlhermitte@users.noreply.github.com>
Co-authored-by: Michael Aquilina <michaelaquilina@gmail.com>
Co-authored-by: Joshua Cannon <joshdcannon@gmail.com>
Co-authored-by: László Velinszky <laszlo.velinszky@meltwater.com>
Co-authored-by: Lu Zhu <misterzhu@gmail.com>
Co-authored-by: BSKY <git@bsky.moe>
Co-authored-by: Trim21 <github@trim21.me>
Co-authored-by: Frost Ming <frostming@tencent.com>
Co-authored-by: Raphael Yancey <raphael@badfile.net>
Co-authored-by: adisbladis <adisbladis@gmail.com>
Co-authored-by: Dimitri Merejkowsky <dimitri.merejkowsky@tanker.io>
Co-authored-by: Jules Chéron <jules.cheron@gmail.com>
Co-authored-by: Alex Povel <48824213+alexpovel@users.noreply.github.com>
@finswimmer finswimmer deleted the issue-01902-git-parsing branch April 4, 2020 19:07
Copy link

github-actions bot commented Mar 1, 2024

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regular expression catastrophic backtracking in Git URL parsing
3 participants