-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series with +inf
or -inf
values completely fail to display when zoomed out
#166
Comments
+inf
or -inf
values completely fail to display when zoomed out.+inf
or -inf
values completely fail to display when fully zoomed out.
+inf
or -inf
values completely fail to display when fully zoomed out.+inf
or -inf
values completely fail to display when zoomed out
Hey! Thank your for creating this issue! I'll gladly look into this and try to help you :) Do I understand correctly that you want to have gaps (i.e., disconnected lines) in your plot? Cheers, Jeroen |
Thanks - I'll get that sorted out ASAP for you. |
Reproduction codeimport time
from random import random
import pandas as pd
import plotly.graph_objects as go
import plotly.subplots
from pandas import DataFrame
from plotly_resampler import FigureResampler
################
# Setup the data
################
seriesLength = 100_000
values = [0]
# create a copy with flat values
flat_values = [0]
times = [time.time()]
for i in range(1, seriesLength):
prev = values[i-1]
prevTime = times[i-1]
if random() > 0.999:
delta = random() * 2 - 1
flat_values.append(float('+inf'))
else:
delta = 0
flat_values.append(1+prev)
values.append(prev + delta)
times.append(prevTime + 60)
all_data = {'times': times, 'values': values, 'flat_values': flat_values }
################
# SETUP THE PLOT
################
dataframe = DataFrame(all_data)
dataframe['times'] = pd.to_datetime(dataframe['times'], unit = 's')
dataframe = dataframe.set_index('times')
fig = FigureResampler(plotly.subplots.make_subplots())
trace = go.Scattergl(name = 'values', showlegend = True)
fig.add_trace(trace, hf_x = dataframe.index, hf_y = dataframe['values'])
fig.add_trace(trace, hf_x = dataframe.index, hf_y = dataframe['flat_values'])
fig.show_dash(mode = "inline") This glitches horribly when the second trace is enabled, making series randomly invisible and breaking the plot dimensions/fitting. The goal of the second series is to produce discrete, horizontal-only lines corresponding to the points in the first, without any interpolation lines going between changes in the Y values. The use case for this is highlighting key thresholds in line charts during financial time-series analysis. This should generate the code. It was run inside PyCharm Professional 2022.3.1
|
@jvdd - Thanks again for looking at this. Just a reminder on this. |
Looking into it @varon! |
Hi @varon, First of all thanks for the reproducible code 👍 After looking into this issue, I arrived at the following observations:
My takeaways from this issue: correctly handling gaps is an issue for this project (& time series downsapling in general). The underlying issue is that most downsampling algorithms (written in lower programming languages, e.g. C / Rust) cannot deal with Possible solutions:
|
@jvdd - Thank you very much for such a super-detailed explanation and reporting experience - I'm glad the code was useful to reproduce on your side. I'm very familiar with low-level programming, and I'm looking to brush up on my Rust. I'd be happy to give that a go. If possible, I'd love if you could provide as much detail on how to go about the task as you can. I'm an experienced developer, but I have no idea how to test or verify correctness here as I'm not familiar with the Rust ecosystem. It's unlikely, but if I do get horrendously stuck I'd love if you could lend a hand. Maybe a good opportunity to connect and collaborate - I certainly appreciate the rigor in your approach. |
@varon - I am very happy to hear that you are interested in contributing!! 🚀 Some background / additional info: How can we handle NaNs: How can you contribute? Other meaningful contributions:
I'll add a CONTRIBUTING.md file to P.S.: I learned Rust a couple of months ago. This is how I did it:
As long as you have a goal & it remains fun, motivation will be a direct side-effect :) |
Regarding the unstable behavior of the from tsdownsample import EveryNthDownsampler, LTTBDownsampler, MinMaxLTTBDownsampler, MinMaxDownsampler
import numpy as np
import pandas as pd
# construct data
n = 1_000_000
x = pd.Series(np.random.randn(n))
x.index -= x.index[0]
x[::150_000] = np.nan # ~ 7 nans
np.where(np.isnan(x))[0]
# downsample using various dtypes, downsamplers, and n_outs
for dtype in [
# np.float16,
np.float32,
np.float64]:
x_ = x.values.astype(dtype)
print(dtype, np.isnan(x_).sum())
print(LTTBDownsampler().downsample(x_, n_out=8))
# print(MinMaxLTTBDownsampler().downsample(x_, n_out=10))
print(MinMaxDownsampler().downsample(x_, n_out=8))
# print(MinMaxDownsampler().downsample(x_, n_out=10))
# print(MinMaxDownsampler().downsample(x_, n_out=20))
print(MinMaxDownsampler().downsample(x_, n_out=26))
print('-'*88) Which gave me the following output and error <class 'numpy.float32'> 7
[ 0 1 166667 333333 500000 666666 900000 999999]
[ 0 0 312263 444720 526056 724563 750000 750000]
thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/argminmax-0.3.1/src/task.rs:68:75
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
/tmp/ipykernel_11988/1501389405.py in <cell line: 15>()
22 # print(MinMaxDownsampler().downsample(x_, n_out=10))
23 # print(MinMaxDownsampler().downsample(x_, n_out=20))
---> 24 print(MinMaxDownsampler().downsample(x_, n_out=26))
25 print('-'*88)
~/.cache/pypoetry/virtualenvs/plotly-resampler-X8YSXkmq-py3.10/lib/python3.10/site-packages/tsdownsample/downsampling_interface.py in downsample(self, n_out, parallel, *args, **kwargs)
320 ):
321 """Downsample the data in x and y."""
--> 322 return super().downsample(*args, n_out=n_out, parallel=parallel, **kwargs)
~/.cache/pypoetry/virtualenvs/plotly-resampler-X8YSXkmq-py3.10/lib/python3.10/site-packages/tsdownsample/downsampling_interface.py in downsample(self, n_out, *args, **kwargs)
109 if x is not None:
110 self._supports_dtype(x, y=False)
--> 111 return self._downsample(x, y, n_out, **kwargs)
112
113
~/.cache/pypoetry/virtualenvs/plotly-resampler-X8YSXkmq-py3.10/lib/python3.10/site-packages/tsdownsample/downsampling_interface.py in _downsample(self, x, y, n_out, parallel, **kwargs)
301 if x is None:
302 downsample_f = self._switch_mod_with_y(y.dtype, mod)
--> 303 return downsample_f(y, n_out, **kwargs)
304 elif np.issubdtype(x.dtype, np.datetime64):
305 # datetime64 is viewed as int64
PanicException: called `Option::unwrap()` on a `None` value |
Hey @varon, Thanks again for aiding with this codebase, we greatly appreciate you helping us out! 🤗 At the moment @jvdd and I have limited bandwidth as we are busy with writing two papers. Afterwards, this at the top of our todo-list! As usual there is, still enough work to be done. Regarding other |
Thank you for the update - I'll definitely throw in a review here. As this is a pretty important task for me, is there any other work that I can try to tackle? I'm obviously not as familiar with the projects, but while you guys are stuck on bandwidth I'm happy to help out where I can! |
Hey @varon In #154 we also decoupled the gap handling code - users can now pass a gap handler per trace! You can try this out using our latest pre-release Hope this helps & thx again for your help with integrating tsdownsample 🤝 |
Thank you for creating this great library.
We are using this to plot time-series data of discrete values.
In order to avoid having interpolation lines between these discrete values, we insert
+inf
values into the series prior to display so that it only displays in horizontal line segments of the series.This works as expected for plotly, but fails with the Resampler as it can frequently selects a stumble upon a +inf value when sampling from a series. This can cause the the series to display unreliably, depending on the exact sample chosen, the entire line often disappears failing to display anything at all for that section.
The suggested fix is, that when sampling, if a +inf value is found, try to nudge either left/right by one value place to find a
non-inf
neighbouring sample to use.The text was updated successfully, but these errors were encountered: