Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove quadratic search when tagging a file in Chroot #2193

Merged

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Jul 24, 2023

It made me feel bad to see us redoing work, so I tried to fix it. I doubt this is a consequential performance improvement.

@cosmicexplorer cosmicexplorer marked this pull request as ready for review July 24, 2023 19:04
Copy link
Contributor

@Eric-Arellano Eric-Arellano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Would you be open to separating out the types changes from the quadratic search change? It would be helpful to separate refactor vs. performance improvement.

@cosmicexplorer
Copy link
Contributor Author

yes!

Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm very unconvinced about the tagging optimization worth and its leading to one gedankenbug in the current incarnation - consider dropping it and just keeping the type additions - which are welcome.

pex/common.py Outdated Show resolved Hide resolved
pex/common.py Outdated Show resolved Hide resolved
pex/common.py Outdated
@@ -598,7 +633,7 @@ def zip(
if labels:
selected_files = set(
itertools.chain.from_iterable(self.filesets.get(label, ()) for label in labels)
)
) # type: Iterable[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code actually cares about the underlying setness in reality of the types in both branches since zips will happily store a given filename more than once and we don't want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it more deterministic if we allow that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above - I think the whole dup business is fake and there are no dups in practice. But if there were I'd want PEX to error unless the file contents were identical (which PEX checks and does elsewhere - for example venv creation). My main point is, in the small, the current Chroot code permits dups and has certain behavior wrt to them. That behavior should not change unless your point is very deliberately to do that. I think that's not the point of your change though if I read correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great! I have vastly restricted this change to only cover the quadratic behavior when searching for whether a file already exists.

pex/common.py Outdated Show resolved Hide resolved
@cosmicexplorer cosmicexplorer force-pushed the very-small-optimization branch 2 times, most recently from 9a036c1 to db06790 Compare July 25, 2023 14:14
@cosmicexplorer cosmicexplorer changed the title add more types to Chroot + remove quadratic search remove quadratic search when tagging a file in Chroot Jul 25, 2023
Copy link
Member

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, but commenting on the larger context as I understand it: you were working on trying to get a zipapp creation perf win, started poking about, saw this theoretical perf issue and fixed it, moved on to the real perf fixes and then circled back here to get some of the work submitted. I'm guessing you never benchmarked this because I'm guessing its in the noise / doesn't show up when actually zipping up a zipapp. I think it would be a good idea in general to include a significant win benchmark - say >= 2% - as a PR comment at least when complicating code to show it's worth it.

pex/common.py Outdated
if fn in fs and fs_label != label:
raise self.ChrootTaggingException(fn, fs_label, label)
"""Raises ChrootTaggingException if a file was added under more than one label."""
fs_label = self._file_index.get(fn, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this works, but it's clunkier than need be. Maybe:

$ python2.7
Python 2.7.18 (default, Jul  1 2022, 10:30:50)
[GCC 11.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = {}
>>> def check(key, label):
...   if label != x.setdefault(key, label):
...     raise Exception("{} already labeled as {}".format(key, label))
...
>>> check("foo", "bar")
>>> check("foo", "baz")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in check
Exception: foo already labeled as baz
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, my quick repl session obviously has a bug in the exception message, but you get the point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@cosmicexplorer
Copy link
Contributor Author

commenting on the larger context as I understand it:

All correct!

I think it would be a good idea in general to include a significant win benchmark - say >= 2% - as a PR comment at least when complicating code to show it's worth it.

I'm totally fine agreeing to that in general!

when complicating code

I saw this change in specific as making the code slightly clearer, which is why I thought it was reasonable to break off despite being of unproven perf improvement. You'll note in the PR description I explicitly made it clear that this was more of a vibes-based proposed change.

@jsirois
Copy link
Member

jsirois commented Jul 25, 2023

I saw this change in specific as making the code slightly clearer, which is why I thought it was reasonable to break off despite being of unproven perf improvement.

Ok. Our eyes behold differently although I might agree with the setdefault change suggestion - that code is maybe clearer than the original in my eyes at the expense of new state scattered about in the dict that needed to be added.

As another general observation, I personally stay away from vibes when submitting changes to repos I don't work in until the point I've worked in them a bunch. In other words, submit fixes or implement features early, build up, and then when there is established skin in the game start expressing more opiniony things. The general idea being the existing maintainers have demonstrated skin in the game and will have to deal with all fallout.

@cosmicexplorer
Copy link
Contributor Author

Well, my informal PR description was explicitly intended to make it easy for reviewers to decide not to accept. But I will keep that in mind for the future.

@jsirois
Copy link
Member

jsirois commented Jul 25, 2023

Well, my informal PR description was explicitly intended to make it easy for reviewers to decide not to accept. But I will keep that in mind for the future.

Yeah - thanks for that. There is still a cost for every review too though - its not free to consider a piece of work in good faith.

Thanks for taking a crack at some Pex work.

@cosmicexplorer
Copy link
Contributor Author

Thanks for this discussion!

@jsirois
Copy link
Member

jsirois commented Jul 25, 2023

The CI failure looks unrelated and transient - I've restarted the failed shard.

@cosmicexplorer cosmicexplorer merged commit a821b28 into pex-tool:main Aug 2, 2023
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants