Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-31839: Reduce memory usage in MatchPessemisticB #148

Merged
merged 2 commits into from Oct 6, 2021

Conversation

morriscb
Copy link
Contributor

No description provided.

Copy link
Contributor

@laurenam laurenam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me (some minor comments). Could I ask you to be a bit less opaque in the commit message? I.e.,

Change temp id array to uint16 --> WHY (AND FROM WHAT)?

Change to unit16 --> HOW IS THIS DISTINCT FROM ABOVE?

Square root dists. --> IS THIS TO IMPLY YOU WEREN'T DOING THIS BEFORE?

Fix linting. --> HIGHLIGHTS THAT THIS SHOULD'VE BEEN DONE ON A SEPARATE COMMIT (but I'm not too fussed about this one 😉)

Fix math bug. --> PLEASE ELABORATE (IF THERE WAS A BUG THAT GOT FIXED, HOW ARE THE PRE AND POST RESULTS IDENTICAL AS YOU STATE ON THE TICKET?)

tmp_ref_dist_array = np.sqrt(
((self._reference_array
- self._reference_array[ref_id])
** 2).sum(axis=1)).astype("float32")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to mix between using
dtype=np.uint16 above vs. dtype="uint16" here and
astype("float32") here vs. astype(np.float32) above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change them to be consistent.

@@ -95,15 +95,7 @@ def _build_distances_and_angles(self):

# Initialize the arrays we will need for quick look up of pairs once
# have a candidate spoke center.
self._pair_id_array = np.empty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the above comment now be removed?

dtype=np.float32)

# Create empty lists to temporarily store our pair information per
# reference object. These will be concatenated into our final arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment actually still relevant?

tmp_ref_dist_array = np.sqrt(
((self._reference_array
- self._reference_array[ref_id])
** 2).sum(axis=1)).astype("float32")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand what the sum(axis=1) is doing here (not saying it's wrong, I just don't know what's going on...oh, is it a sum over the squares of the deltas in all three axes which was done more "spelled out" for sub_dist_array above?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum without an axis argument sums the full array. sum(axis=1) sums only along the second axis. These are 3 vector distances so this is basically a dot product.

@morriscb
Copy link
Contributor Author

morriscb commented Oct 5, 2021

I'll just remove those extra comments from the commit. They are left overs from developing, debugging, and committing at the same time.

Simplify precomputation of distances.
Remove/add comments.
Consistently invoke types.
@morriscb morriscb merged commit bf906a7 into master Oct 6, 2021
@morriscb morriscb deleted the tickets/DM-31839 branch October 6, 2021 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants