Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix other source of slowness in git URL parser + limit URL length to 1024 #7955

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

mjambon
Copy link
Member

@mjambon mjambon commented Jun 6, 2023

This fixes more of the same problems as #7943

test plan:

  • CI for correctness
  • For performance, follow these steps:
~/semgrep/cli/src $ python3
Python 3.8.10 (default, Mar 13 2023, 10:26:41) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from semgrep.meta import get_repo_name_from_repo_url
>>> get_repo_name_from_repo_url('0' * 1024)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/martin/semgrep/cli/src/semgrep/meta.py", line 60, in get_repo_name_from_repo_url
    result = p.parse()
  File "/home/martin/semgrep/cli/src/semgrep/external/git_url_parser.py", line 123, in parse
    raise ParserError(msg)
semgrep.external.git_url_parser.ParserError: Invalid URL '0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'

should fail instantly by Python standards.

The following should fail even faster with a different error message:

>>> get_repo_name_from_repo_url('0' * 1025)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/martin/semgrep/cli/src/semgrep/meta.py", line 60, in get_repo_name_from_repo_url
    result = p.parse()
  File "/home/martin/semgrep/cli/src/semgrep/external/git_url_parser.py", line 115, in parse
    raise ParserError(msg)
semgrep.external.git_url_parser.ParserError: URL exceeds maximum supported length of 1024: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

PR checklist:

  • Purpose of the code is evident to future readers
  • Tests included or PR comment includes a reproducible test plan
  • Documentation is up-to-date
  • A changelog entry was added to changelog.d for any user-facing change
  • Change has no security implications (otherwise, ping security team)

If you're unsure about any of this, please see:

@mjambon
Copy link
Member Author

mjambon commented Jun 6, 2023

@aryx @sabrinabrogren GitHub says you recently edited these files 🤷 so you're the reviewers.

@mjambon mjambon merged commit 7a232ef into develop Jun 6, 2023
41 checks passed
@mjambon mjambon deleted the mj-fix-other-redos branch June 6, 2023 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant