Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ new interface #6

Merged
merged 25 commits into from Dec 19, 2022
Merged

✨ new interface #6

merged 25 commits into from Dec 19, 2022

Conversation

jvdd
Copy link
Member

@jvdd jvdd commented Nov 30, 2022

In this PR we transfer from pandas.Series as representation for time series to a purely np.ndarray based representation
(pandas.Series its index can at this time not represent most numeric dtypes, e.g. int16)

The signature for the downsample methdod is changed to:

downsample([x], y, n_out, **kwargs)

Note the following:

  • x is optional, allowing the same (memory) efficient runtime as with pd.RangeIndex
  • x and y are both positional arguments (this is in line with how the matplotlib,pyplot.plot signature is)
  • n_out is a mandatory keyword argument

Example:

from tsdownsample import MinMaxLTTBDownsampler
import numpy as np

# Create a time series
y = np.random.randn(10_000_000)

# Downsample to 1000 points
s_ds = MinMaxLTTBDownsampler().downsample(y, n_out=1000)

Other aspects of this PR

  • support x and y datetime64: view the array as np.int64
  • support x and y timedelta64: view the array as np.int64
  • support y bool: view the array as np.int8 (under the hood numpy represents a bool as 8 bits)
  • support y int8 & uint8
  • support categorical data => out of scope for this PR, would mostly be some Python code converting it to an encoding (which is perhaps not the responsibility of this library...) - also not really sure how numpy would represent categorical data

Major TODOs:

  • documentation 📒 => will leave this for another PR
  • do not assume fixed sampling rate => will leave this for another PR

@jvdd jvdd requested a review from jonasvdd November 30, 2022 17:35
@jonasvdd
Copy link
Member

I find it a little cleaner to import the concrete methods/variables in tsdownsample/__init__.py than applying from .downsamplers import *

@jonasvdd
Copy link
Member

LGTM, see those comments!

@jvdd jvdd requested a review from jonasvdd December 13, 2022 20:51
@jvdd
Copy link
Member Author

jvdd commented Dec 13, 2022

Will merge this if you approve @jonasvdd & then I can make first real release! 🚀

Besides documentation, the (afaik) only major TODO is getting rid of the assumption of fixed sampling rate 🤔

README.md Show resolved Hide resolved
let mut avg_y: Ty = Ty::default();
// let mut avg_x: Tx = Tx::default();
// let mut avg_y: Ty = Ty::default();
// TODO: check the impact of using f64 (is necessary to avoid overflow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been checked? @jvdd

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, but will leave this to another PR (that thoroughly profiles the code)

@@ -93,17 +96,19 @@ pub fn lttb_without_x<Ty: Num>(y: ArrayView1<Ty>, n_out: usize) -> Array1<usize>

for i in 0..n_out - 2 {
// Calculate point average for next bucket (containing c).
let mut avg_y: Ty = Ty::default();
// let mut avg_y: Ty = Ty::default();
// TODO: check impact of using f64 (is necessary to avoid overflow)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔝

tests/test_config.py Outdated Show resolved Hide resolved
Copy link
Member

@jonasvdd jonasvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you just provide me with an answer to the questions? If these are resolved. I think this PR is ready to merge!

Impressive work 💪🏼

@jvdd
Copy link
Member Author

jvdd commented Dec 19, 2022

Will merge this if you approve @jonasvdd & then I can make first real release! rocket

Besides documentation, the (afaik) only major TODO is getting rid of the assumption of fixed sampling rate thinking

Guess we won't make a first real release yet as the distribution is not 100% watertight (see #9)

@jonasvdd
Copy link
Member

@jvdd, I think we can tackle the M2 building in a new PR and merge this one? 🙏🏼

@jvdd
Copy link
Member Author

jvdd commented Dec 19, 2022

Ready to squash and merge 🚀

@jvdd jvdd merged commit ec2f80a into main Dec 19, 2022
@jvdd jvdd deleted the interface branch January 29, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants