Skip to content

Conversation

@Suor
Copy link
Contributor

@Suor Suor commented Nov 1, 2019

This permites URLInfo construction from parts without building/reparsing
string, which makes it around 20% faster. Code is also simpler and a bit
shorter now.

Other improvements:

  • added URLInfo.replace(path=...)
  • URLInfo._path now stringifies properly
  • protected against passing path not starting with /
  • do not allow query, params, fragments and passwords in urls (they were
    silently ignored previously)
  • Have you followed the guidelines in our
    Contributing document?

  • Does your PR affect documented changes or does it add new functionality
    that should be documented? If yes, have you created a PR for
    dvc.org documenting it or at
    least opened an issue for it? If so, please add a link to it.


@Suor
Copy link
Contributor Author

Suor commented Nov 1, 2019

Here is a bench script I used:

from funcy import chain
from dvc.path_info import URLInfo, CloudURLInfo


urls = [URLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]
curls = [CloudURLInfo('ssh://host.com/a/b/%s' % x) for x in range(2000)]

urls_set = set(urls) | set(curls)
for url in chain(urls, curls):
    urls_set.add(url)

x = 0
for u in urls:
    for u2 in urls[:50]:
        x += u == u2

add = [u / "x" for u in urls]
cadd = [u / "x" for u in curls]

This permites URLInfo construction from parts without building/reparsing
string, which makes it around 20% faster. Code is also simpler and a bit
shorter now.

Other improvements:
- added URLInfo.replace(path=...)
- URLInfo._path now stringifies properly
- protected against passing path not starting with /
- do not allow query, params, fragments and passwords in urls (they were
silently ignored previously)
@Suor Suor requested review from a user and shcheklein November 1, 2019 17:07
Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat 👌 !

self.parsed = urlparse(url)
assert self.parsed.scheme != "remote"
p = urlparse(url)
assert not p.query and not p.params and not p.fragment
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need these assertions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't store that, so in the absence of asserts:

URLInfo("http://host.com/a/b?q=1&p=2#ref") == URLInfo("http://host.com/a/b")

Which might surprise people at some point, i.e. lead to an "entertaining" debugging session.

Copy link
Contributor

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too much meta-programming to actually digest it in way where good feedback can be given :) so have to trust our tests and your ability to put even more tests in place if it's necessary - this is the only way to prevent us breaking something here in the future - this code is low level and has a lot of nuances you don't hit in regular programming

@Suor
Copy link
Contributor Author

Suor commented Nov 2, 2019

I wonder what you guys would say when actual metaprogramming would be involved :)

@Suor
Copy link
Contributor Author

Suor commented Nov 2, 2019

@efiop may you take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants