Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster diffs: Don't always run pygit2.Tree.diff_tree #962

Merged
merged 2 commits into from
Jan 31, 2024
Merged

Conversation

olsen232
Copy link
Collaborator

@olsen232 olsen232 commented Jan 30, 2024

Don't run it when the user requests the diff of a particular feature

  • we can look up the feature for both revs quicker than pygit2 can find all diffs in the entire revision.

There will be an extremely great speedup in the extreme case where the user wants to see a single feature that has changed, across a revision-range where a million features have changed.

If the user requests enough features, eventually there will be a crossover where this code path is slower. Eg if the user requests to see more features than have actually changed. However, mostly the size of the list of features requested will be much smaller than the size of the layer / size of the changes.

Checklist:

  • Have you reviewed your own change?
  • Have you included test(s)?
  • Have you updated the changelog?

Don't run it when the user requests the diff of a particular feature
- we can look up the feature for both revs quicker than pygit2
can find all diffs in the entire revision.
@@ -160,6 +166,66 @@ def get_decoder(dataset, method_name):
_INSERT_UPDATE = _INSERT_TYPES + _UPDATE_TYPES
_UPDATE_DELETE = _UPDATE_TYPES + _DELETE_TYPES

def get_raw_deltas_for_keys(
self, other, key_encoder_method, key_filter, reverse=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type hints on args wouldn't go amiss, the names don't make them very obvious

(we should probably be moving towards type hints everywhere)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - added type hints to this file

):
def _get_raw_diff_for_subtree(self, *args, **kwargs):
# When only nz_pa_points_topo_150k:1182 is requested, a different more efficient code-path should be used.
raise AssertionError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise AssertionError(
pytest.fail(

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@olsen232 olsen232 merged commit fe9958e into master Jan 31, 2024
12 checks passed
@olsen232 olsen232 deleted the quicker-diffs branch January 31, 2024 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants