New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ds.where
and ds.summary
and add selector
#5805
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5805 +/- ##
==========================================
+ Coverage 88.19% 88.20% +0.01%
==========================================
Files 307 307
Lines 63305 63399 +94
==========================================
+ Hits 55829 55920 +91
- Misses 7476 7479 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
With selector: screenrecord-2023-07-17_15.47.31.mp4 |
Looking good! |
ds.where
and ds.summary
and add selector
15faf72
to
16d72a1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! It will need a whole new documentation section though (which we'll have @jbednar write) and really the datashader user guide will need to be split up. I did ask one clarification question about the ways in which aggregator=where(...)
and selector=...
interact because I can't quite wrap my head around it.
Very happy with this PR already, just used it successfully in a talk at EuroPython! :-) I'll be taking a closer look soon but my initial question is whether you have any idea how Alternatively, |
params["vdims"] = [params["vdims"]] | ||
sum_agg = ds.summary(**{str(params["vdims"][0]): agg_fn, "index": ds.where(sel_fn)}) | ||
agg = self._apply_datashader(dfdata, cvs_fn, sum_agg, agg_kwargs, x, y) | ||
_ignore = [*params["vdims"], "index"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it is a good default, the name "index" is a magic value. I could be wrong but couldn't this clash with a column name?
With the index array approach, first_3 should always only be 3x larger than the aggregate array, while with the approach of returning all columns the size and time taken would scale with the number and size of columns. Seems unsafe as a default! An intermediate approach could be to return a fixed number of scalar columns only (up to e.g. 3) by default, but that seems quite arbitrary. |
I'd say we handle the |
I'm going to merge. I'll look into the first_n and last_n thing separately. |
Example code:
With
agg = ds.where(ds.min("val"))
:val2.mp4
With
agg = ds.where(ds.min("s"))
:s.mp4