Support Dask in `min`/`max` #135

quentinblampey · 2025-10-29T12:20:55Z

Closes #134

I updated the dtype behavior for dask to fix #134.

I also added support for DiskArray in mean_var - I think we just needed to always np.power instead of the ** notation. Except if you had a specific reason to use ** @flying-sheep?
I think it's very inefficient though, since it will move the result of the power operation directly in memory (at least, this is what I understand, but it may be wrong). We would like to have it in memory only after the mean reduction, but maybe there is no other way to do that - I'm not familiar enough with h5.Datasets.

I wanted to add some tests but I don't understand all the details of the tests, is there any instructions or CONTRIBUTING.md file I could use to run and update the tests?

codecov · 2025-10-29T12:21:50Z

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.13%. Comparing base (6af5d9a) to head (a963725).

Files with missing lines	Patch %	Lines
src/fast_array_utils/stats/_generic_ops.py	25.00%	3 Missing ⚠️
src/fast_array_utils/stats/_utils.py	60.00%	2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (6af5d9a) and HEAD (a963725). Click for more details.

HEAD has 5 uploads less than BASE

Flag BASE (6af5d9a) HEAD (a963725)

7 2

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #135       +/-   ##
===========================================
- Coverage   99.13%   47.13%   -52.00%     
===========================================
  Files          19       15        -4     
  Lines         464      367       -97     
===========================================
- Hits          460      173      -287     
- Misses          4      194      +190

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2025-10-29T12:33:56Z

CodSpeed Performance Report

Merging #135 will degrade performances by 22.41%

_{Comparing quentinblampey:mean_var_h5 (a963725) with main (6af5d9a)}

Summary

❌ 8 regressions
✅ 224 untouched

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`test_stats_benchmark[numpy.ndarray-1d-all-int32-max]`	1.6 ms	2 ms	-22.31%
❌	`test_stats_benchmark[numpy.ndarray-1d-all-int32-min]`	1.6 ms	2 ms	-22.31%
❌	`test_stats_benchmark[scipy.sparse.csc_array-1d-all-int32-max]`	1.6 ms	2 ms	-22.39%
❌	`test_stats_benchmark[scipy.sparse.csc_array-1d-all-int32-min]`	1.6 ms	2 ms	-22.39%
❌	`test_stats_benchmark[scipy.sparse.csc_array-2d-all-int32-max]`	1.6 ms	2 ms	-22.41%
❌	`test_stats_benchmark[scipy.sparse.csc_array-2d-all-int32-min]`	1.6 ms	2 ms	-22.41%
❌	`test_stats_benchmark[scipy.sparse.csr_array-2d-all-int32-max]`	1.6 ms	2 ms	-22.39%
❌	`test_stats_benchmark[scipy.sparse.csr_array-2d-all-int32-min]`	1.6 ms	2 ms	-22.39%

flying-sheep · 2025-10-30T12:37:49Z

OK, as the failing benchmarks show, the ** had a reason.

Also you’re right: just creating a copy of the full-size Dataset isn’t a good idea, we’d have to do something smarter.

Therefore let’s just keep this PR single-purpose for min-max Dask support, and do the disk stuff in a separate PR. I reverted the disk stuff in here and fixed mypy

flying-sheep · 2025-10-30T12:58:01Z

OK, looks like there are more issues with min/max, I’m sorry that I didn’t catch that before making the release.

quentinblampey · 2025-10-30T13:01:44Z

OK, as the failing benchmarks show, the ** had a reason.
Also you’re right: just creating a copy of the full-size Dataset isn’t a good idea, we’d have to do something smarter.

Yes, it seems so...
No problem, we can handle that in a future PR! I'll think more about it later

quentinblampey added 2 commits October 29, 2025 12:10

pass dtype only when needed in dask

c263986

support DiskArray in power and mean_var

ec7f279

flying-sheep added 2 commits October 30, 2025 13:28

dtype kw helper

6a82c72

undo disk array

54116a4

flying-sheep changed the title ~~Support DiskArray on mean_var and support Dask in min/max~~ Support Dask in min/max Oct 30, 2025

flying-sheep added the run-gpu-ci Apply this label to run GPU CI once label Oct 30, 2025

add tests

a963725

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Dask in `min`/`max` #135

Support Dask in `min`/`max` #135

Uh oh!

quentinblampey commented Oct 29, 2025

Uh oh!

codecov bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

flying-sheep commented Oct 30, 2025 •

edited

Loading

Uh oh!

flying-sheep commented Oct 30, 2025 •

edited

Loading

Uh oh!

quentinblampey commented Oct 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support Dask in min/max #135

Are you sure you want to change the base?

Support Dask in min/max #135

Uh oh!

Conversation

quentinblampey commented Oct 29, 2025

Uh oh!

codecov bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #135 will degrade performances by 22.41%

Summary

Benchmarks breakdown

Uh oh!

flying-sheep commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flying-sheep commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quentinblampey commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support Dask in `min`/`max` #135

Support Dask in `min`/`max` #135

codecov bot commented Oct 29, 2025 •

edited

Loading

codspeed-hq bot commented Oct 29, 2025 •

edited

Loading

flying-sheep commented Oct 30, 2025 •

edited

Loading

flying-sheep commented Oct 30, 2025 •

edited

Loading

quentinblampey commented Oct 30, 2025 •

edited

Loading