Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔍 add searchsorted to get equidistant x-bins #11

Merged
merged 30 commits into from Dec 29, 2022
Merged

🔍 add searchsorted to get equidistant x-bins #11

merged 30 commits into from Dec 29, 2022

Conversation

jvdd
Copy link
Member

@jvdd jvdd commented Dec 22, 2022

This PR alleviates the assumption that when x has fixed sampling rate (& no gaps) for certain downsample algorithms.
Should fix #10.

This PR has the following contributions:

  • add sequential & parallel iterator that generates equidistant x-bin idxs (aka searchsorted) in utils.rs
    • use f64 for val_step as this avoids rounding errors with integer / usize x-data types
      !! => seems to be significantly slower for minmaxlttb (10 - 20 %)
  • minmax algo with x
  • m4 algo with x
  • minmax lttb with x?
  • 🐍 update Python bindings (check n_out requirements etc.)
  • ⚠️ update Python warnings + check new assertions
  • 🐎 update benchmarks
  • 🕵️ search & fix out of bounds index issues in generic code of the downsample algorithms

Algorithm changes:

  • M4:
    • n_out should be multiple of 4
    • removed the code which added the last datapoint to the output
      => when arr.len % n_out = 0 & equally sampled (+ no gaps) => difference betweeen with x and without x implementation
  • MinMax:
    • n_out should be multiple of 2
    • removed the code which added the first and last datapoint to the output
      => when arr.len % n_out = 0 & equally sampled (+ no gaps) => difference betweeen with x and without x implementation

Python binding changes:

  • Removed f16 as supported x datatype (IMO, no 16-bit dtypes make sense as x datatype... as these at max (u16) can only represent up to 65_535 values)
  • M4 requires n_out to be multiple of 4
  • MinMax requires n_out to be multiple of 2

Comment on lines -4 to -6
// TODO
// - m4 & minmax should determine bin size on x-range!
// code now assumes equal bin size
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extremely happy to tick this one off 🔥

@@ -157,13 +153,13 @@ macro_rules! create_pyfuncs_without_x_with_ratio {

macro_rules! create_pyfuncs_with_x {
($resample_mod:ident, $resample_fn:ident, $mod:ident) => {
_create_pyfuncs_with_x_generic!(_create_pyfunc_with_x, $resample_mod, $resample_fn, $mod, f16 f32 f64 i16 i32 i64 u16 u32 u64, f16 f32 f64 i8 i16 i32 i64 u8 u16 u32 u64);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed f16 support as this (i) does not make sense IMO as x-data type (for this library which focuses on resampling very large time series) and (ii) is probably (haven't tested this) a lot slower than the other datatypes bc equidistant binning not (yet) adapted to f16 (which is not supported by most CPUs)

};
}

macro_rules! create_pyfuncs_with_x_with_ratio {
($resample_mod:ident, $resample_fn:ident, $mod:ident) => {
_create_pyfuncs_with_x_generic!(_create_pyfunc_with_x_with_ratio, $resample_mod, $resample_fn, $mod, f16 f32 f64 i16 i32 i64 u16 u32 u64, f16 f32 f64 i8 i16 i32 i64 u8 u16 u32 u64);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See ⬆️

downsample_rs/src/m4/generic.rs Outdated Show resolved Hide resolved
Copy link
Member Author

@jvdd jvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed my own code 🤡, LGTM

@jvdd jvdd merged commit b4cfff0 into main Dec 29, 2022
@jvdd jvdd deleted the searchsorted branch January 29, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not assume equally sampled x-data
1 participant