-
Notifications
You must be signed in to change notification settings - Fork 1.3k
perf: refactor URLInfo to not use ParseResult inside #2707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Here is a bench script I used: from funcy import chain
from dvc.path_info import URLInfo, CloudURLInfo
urls = [URLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]
curls = [CloudURLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]
urls_set = set(urls) | set(curls)
for url in chain(urls, curls):
urls_set.add(url)
x = 0
for u in urls:
for u2 in urls[:50]:
x += u == u2
add = [u / "x" for u in urls]
cadd = [u / "x" for u in curls] |
This permites URLInfo construction from parts without building/reparsing string, which makes it around 20% faster. Code is also simpler and a bit shorter now. Other improvements: - added URLInfo.replace(path=...) - URLInfo._path now stringifies properly - protected against passing path not starting with / - do not allow query, params, fragments and passwords in urls (they were silently ignored previously)
ghost
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat 👌 !
| self.parsed = urlparse(url) | ||
| assert self.parsed.scheme != "remote" | ||
| p = urlparse(url) | ||
| assert not p.query and not p.params and not p.fragment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need these assertions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't store that, so in the absence of asserts:
URLInfo("http://host.com/a/b?q=1&p=2#ref") == URLInfo("http://host.com/a/b")Which might surprise people at some point, i.e. lead to an "entertaining" debugging session.
shcheklein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too much meta-programming to actually digest it in way where good feedback can be given :) so have to trust our tests and your ability to put even more tests in place if it's necessary - this is the only way to prevent us breaking something here in the future - this code is low level and has a lot of nuances you don't hit in regular programming
|
I wonder what you guys would say when actual metaprogramming would be involved :) |
|
@efiop may you take a look? |
This permites URLInfo construction from parts without building/reparsing
string, which makes it around 20% faster. Code is also simpler and a bit
shorter now.
Other improvements:
silently ignored previously)
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.