Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization to gufe_to_digraph to avoid repeated traversals, calls to GufeTokenizable.to_shallow_dict #219

Merged
merged 3 commits into from
Jan 9, 2024

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Jan 6, 2024

Builds on the solution to #216.

Added use of a shallow_dicts index to avoid potentially expensive repeated calls of GufeTokenizable.to_shallow_dict, such as for ProteinComponents. Also used as our check if we've already processed the object being visited to avoid going down its dependency tree again, avoiding repeated traversals.

For an AlchemicalNetwork with ~1000 ChemicalSystems and ~2000 Transformations, this optimization reduces gufe_to_digraph execution time from 6min 31s to 811ms.

… to `GufeTokenizable.to_shallow_dict`

Added use of a `shallow_dicts` index to avoid potentially expensive
repeated calls of `GufeTokenizable.to_shallow_dict`, such as for
`ProteinComponent`s. Also used as our check if we've already processed
the object being visited to avoid going down its dependency tree again,
avoiding repeated traversals.

For an `AlchemicalNetwork` with ~1000 `ChemicalSystem`s and ~2000
`Transformation`s, this optimization reduces `gufe_to_digraph` execution
time from 6min 31s to 811ms.
@dotsdl dotsdl requested a review from ianmkenney January 6, 2024 01:29
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (16b2776) 82.10% compared to head (1d9c3d1) 81.77%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #219      +/-   ##
==========================================
- Coverage   82.10%   81.77%   -0.33%     
==========================================
  Files          23       23              
  Lines        2939     2941       +2     
==========================================
- Hits         2413     2405       -8     
- Misses        526      536      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ianmkenney ianmkenney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could move a lot of the logic added to modifier to add_edges instead, allowing add_edges to check if the shallow dict has already been processed. On the other hand, keeping add_edges smaller makes sense (at the expense of code duplication).

By combining you could remove one of the

sd = gufe_obj.to_shallow_dict()
shallow_dicts[gufe_obj.key] = sd

blocks.

Either way works for me, though.

@dotsdl
Copy link
Member Author

dotsdl commented Jan 9, 2024

Simplified based on your suggestion @ianmkenney! Looks much better now!

@dotsdl dotsdl merged commit 518ea88 into main Jan 9, 2024
3 of 4 checks passed
@dotsdl dotsdl deleted the optimize-gufe_to_digraph branch January 9, 2024 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants