-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in linspace optimization of bin
and hist
#2923
Merged
Merged
Changes from 4 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
7d9fb48
Fix buf in linspace optimization of `bin`
SimonHeybrock c3313f2
Fix same bug for `hist`
SimonHeybrock 1883f4f
Support lower-precision groups and edges in bin and hist
SimonHeybrock c8a75a0
Release notes
SimonHeybrock 0072f41
Extract common code
SimonHeybrock 5c7c6b4
Improve new unit tests
SimonHeybrock File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -760,3 +760,56 @@ def test_make_binned_via_bin_optimized_path_yields_equivalent_results(params): | |
expected = expected.bin(sizes) | ||
expected = expected.bin(binning).hist() | ||
assert sc.identical(result.hist(), expected) | ||
|
||
|
||
def test_bin_linspace_handles_large_positive_values_correctly(): | ||
table = sc.data.table_xyz(10) | ||
table.coords['x'].values[0] = 1e16 | ||
da = table.bin(x=sc.linspace('x', 0.0, 1.0, 3, unit='m', dtype='float64')) | ||
assert da.bins.size().sum().value == 9 | ||
|
||
|
||
def test_bin_linspace_handles_large_negative_values_correctly(): | ||
table = sc.data.table_xyz(10) | ||
table.coords['x'].values[0] = -1e16 | ||
da = table.bin(x=sc.linspace('x', 0.0, 1.0, 3, unit='m', dtype='float64')) | ||
assert da.bins.size().sum().value == 9 | ||
|
||
|
||
def test_hist_linspace_handles_large_positive_values_correctly(): | ||
table = sc.data.table_xyz(10) | ||
table.values[...] = 1.0 | ||
table.coords['x'].values[0] = 1e20 | ||
da = table.hist(x=sc.linspace('x', 0.0, 1.0, 3, unit='m', dtype='float64')) | ||
assert da.sum().value == 9 | ||
|
||
|
||
def test_hist_linspace_handles_large_negative_values_correctly(): | ||
table = sc.data.table_xyz(10) | ||
table.values[...] = 1.0 | ||
table.coords['x'].values[0] = -1e20 | ||
da = table.hist(x=sc.linspace('x', 0.0, 1.0, 3, unit='m', dtype='float64')) | ||
assert da.sum().value == 9 | ||
|
||
|
||
def test_group_with_explicit_lower_precision_drops_rows_outside_domain(): | ||
table = sc.data.table_xyz(100) | ||
table.coords['label'] = (table.coords['x'] * 10).to(dtype='int64') | ||
table.coords['label'].values[0] = 0 | ||
da = table.group(sc.arange('label', 5, unit='m', dtype='int32')) | ||
size0 = da.bins.size()['label', 0].value | ||
size = da.bins.size().sum().value | ||
table.coords['label'].values[0] = np.iinfo(np.int32).max + 100 | ||
da = table.group(sc.arange('label', 5, unit='m', dtype='int32')) | ||
assert da.bins.size()['label', 0].value == size0 - 1 | ||
assert da.bins.size().sum().value == size - 1 | ||
|
||
|
||
def test_bin_with_explicit_lower_precision_drops_rows_outside_domain(): | ||
table = sc.data.table_xyz(100) | ||
x = sc.linspace('x', 0.0, 1.0, 3, unit='m', dtype='float32') | ||
da = table.bin(x=x) | ||
size = da.bins.size().sum().value | ||
table.coords['x'].values[0] = 2.0 * np.finfo(np.float32).max | ||
da = table.bin(x=x) | ||
assert da.bins.size().sum().value == size - 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. None of these tests check that the correct element was dropped, only that the size was reduced by 1. Instead, you could slice out the bad element and group/bin/hist the slice and use that as the expected result. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this fix is nearly identical in
bin
andhistogram
, is it possible to extract the common code into a function that computes the targetbin
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried now, this causes a single-threaded performance regression of
hist
bin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scratch that, the implementation in this PR has the 5% regression (no visible difference when multi-threading is on), but extracting the function seems to perform as the original. Need to investigate more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote it slightly, and see no significant performance impact. There may be a very small slowdown, in particular for
hist
of the order of 5-10%, but I doubt it would be relevant in practice (i.e., with multi-threading).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good