-
-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove quadratic search when tagging a file in Chroot #2193
remove quadratic search when tagging a file in Chroot #2193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Would you be open to separating out the types changes from the quadratic search change? It would be helpful to separate refactor vs. performance improvement.
yes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm very unconvinced about the tagging optimization worth and its leading to one gedankenbug in the current incarnation - consider dropping it and just keeping the type additions - which are welcome.
pex/common.py
Outdated
@@ -598,7 +633,7 @@ def zip( | |||
if labels: | |||
selected_files = set( | |||
itertools.chain.from_iterable(self.filesets.get(label, ()) for label in labels) | |||
) | |||
) # type: Iterable[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code actually cares about the underlying setness in reality of the types in both branches since zips will happily store a given filename more than once and we don't want that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it more deterministic if we allow that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said above - I think the whole dup business is fake and there are no dups in practice. But if there were I'd want PEX to error unless the file contents were identical (which PEX checks and does elsewhere - for example venv creation). My main point is, in the small, the current Chroot code permits dups and has certain behavior wrt to them. That behavior should not change unless your point is very deliberately to do that. I think that's not the point of your change though if I read correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, great! I have vastly restricted this change to only cover the quadratic behavior when searching for whether a file already exists.
b8deba2
to
15bd0ba
Compare
9a036c1
to
db06790
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, but commenting on the larger context as I understand it: you were working on trying to get a zipapp creation perf win, started poking about, saw this theoretical perf issue and fixed it, moved on to the real perf fixes and then circled back here to get some of the work submitted. I'm guessing you never benchmarked this because I'm guessing its in the noise / doesn't show up when actually zipping up a zipapp. I think it would be a good idea in general to include a significant win benchmark - say >= 2% - as a PR comment at least when complicating code to show it's worth it.
pex/common.py
Outdated
if fn in fs and fs_label != label: | ||
raise self.ChrootTaggingException(fn, fs_label, label) | ||
"""Raises ChrootTaggingException if a file was added under more than one label.""" | ||
fs_label = self._file_index.get(fn, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this works, but it's clunkier than need be. Maybe:
$ python2.7
Python 2.7.18 (default, Jul 1 2022, 10:30:50)
[GCC 11.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = {}
>>> def check(key, label):
... if label != x.setdefault(key, label):
... raise Exception("{} already labeled as {}".format(key, label))
...
>>> check("foo", "bar")
>>> check("foo", "baz")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in check
Exception: foo already labeled as baz
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Um, my quick repl session obviously has a bug in the exception message, but you get the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
All correct!
I'm totally fine agreeing to that in general!
I saw this change in specific as making the code slightly clearer, which is why I thought it was reasonable to break off despite being of unproven perf improvement. You'll note in the PR description I explicitly made it clear that this was more of a vibes-based proposed change. |
db06790
to
b14436c
Compare
Ok. Our eyes behold differently although I might agree with the setdefault change suggestion - that code is maybe clearer than the original in my eyes at the expense of new state scattered about in the dict that needed to be added. As another general observation, I personally stay away from vibes when submitting changes to repos I don't work in until the point I've worked in them a bunch. In other words, submit fixes or implement features early, build up, and then when there is established skin in the game start expressing more opiniony things. The general idea being the existing maintainers have demonstrated skin in the game and will have to deal with all fallout. |
Well, my informal PR description was explicitly intended to make it easy for reviewers to decide not to accept. But I will keep that in mind for the future. |
Yeah - thanks for that. There is still a cost for every review too though - its not free to consider a piece of work in good faith. Thanks for taking a crack at some Pex work. |
Thanks for this discussion! |
The CI failure looks unrelated and transient - I've restarted the failed shard. |
b14436c
to
26bc34b
Compare
It made me feel bad to see us redoing work, so I tried to fix it. I doubt this is a consequential performance improvement.