After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115

Alexander-Serov · 2022-08-25T12:15:47Z

I have several figures in a notebook. All other figures plot correctly and I can wrap PlotlyResampler around and show them. However, one particular figure plots just fine, but when I wrap it in PlotlyResampler my cell keeps running indefinitely. This unfortunately blocks my update to 0.8.1. Do you have any idea @jonasvdd ?

#32

FigureResampler(fig, default_n_shown_samples=MAX_POINTS).show_dash(mode="inline")

Other observations:

if I downgrade to 0.3.0 it still does not work,
if I downgrade to 0.3.0 and remove the dash show, it works fine

The text was updated successfully, but these errors were encountered:

jonasvdd · 2022-08-25T12:27:45Z

Hi @Alexander-Serov,

I'm a little confused; does it keep running because of the show_dash or the wrapping with the FigureResampler?

Could you also provide a minimalistic code example of the plotly- figure (data) that gets stuck?

Cheers,
Jonas

Alexander-Serov · 2022-08-25T12:31:59Z

Hey @jonasvdd,

No, I think it is the FigureResampler itself. Here are my latest observations for the same code, same data:

v0.7.2, no dash: plots fine,
v0.8.1, no dash: gets stuck.

I will see if I can provide a minimal example, the problem is I am working with copyright code and data, so I cannot just copy-paste... I will see what I can do. But the problem clearly appears in the 0.8.0 release.

jonasvdd · 2022-08-25T12:34:12Z

Hi, is it possible to create a summary of your data properties (e.g. which trace sizes are we speaking of, what dtype does your data have) and share your plotting code! Than I can try to look whether I can replicate it!

Alexander-Serov · 2022-08-25T12:39:44Z

The plotting code for this function is very simple, the problem likely comes from the data 🤔. Here is a simplified version of the code:

fig = go.Figure()
for param in sorted(data["param"].unique()):
    param_data = data.loc[data["param"] == param,
                ["param1", "param2"]].set_index(
        "param1", drop=True
    )
    param_data = param_data.sort_index()
    fig.add_trace(
        go.Scattergl(
            x=param_data.index,
            y=param_data["param2"],
            name=f"{param}",
            mode="markers",
            marker=dict(color='b', size=5),
        )
    )
fig.update_xaxes(title_text="D")
fig.update_yaxes(title_text='L')
fig.update_layout(title="Av")

Basically grouping a dataset with time index by a parameter and plotting several curves.

Alexander-Serov · 2022-08-25T12:51:12Z

And here is a sample of data. I hope you are able to reproduce. Remember to convert the index to a time index (perhaps it has an effect?).
sample_data.csv

jonasvdd · 2022-08-25T13:06:44Z

For me, this still works, when I run this in an isolated notebook. What is your behavior with this code snippet (note that it uses the sample_data you shared with me)?

from plotly_resampler import FigureResampler
import plotly.graph_objects as go
import pandas as pd

data = pd.read_csv('sample_data.csv', index_col='time')
fig = go.Figure()
for param in sorted(data["param"].unique()):
    param_data = data.loc[data["param"] == param,
                ["param1", "param2"]].set_index(
        "param1", drop=True
    )
    param_data = param_data.sort_index()
    fig.add_trace(
        go.Scattergl(
            x=param_data.index,
            y=param_data["param2"],
            name=f"{param}",
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.update_xaxes(title_text="D")
fig.update_yaxes(title_text='L')
fig.update_layout(title="Av")
fig.show()
print("-"*80)

FigureResampler(fig, default_n_shown_samples=10_000).show_dash(mode='inline')

📷 ⬇️

Alexander-Serov · 2022-08-25T13:25:01Z

Hey!

It does work indeed, but the index in your example is of object type. Once I convert it to the DateTimeIndex, it gets stuck. Here is what's missing in your snippet.

data = pd.read_csv('sample_data.csv', index_col='time')
data.index = pd.to_datetime(data.index)

Let me know.

Alexander-Serov · 2022-08-30T11:23:29Z

Hey!

It does work indeed, but the index in your example is of object type. Once I convert it to the DateTimeIndex, it gets stuck. Here is what's missing in your snippet.
data = pd.read_csv('sample_data.csv', index_col='time')
data.index = pd.to_datetime(data.index)
Let me know.

From your silence, I understand you were able to reproduce the issue @jonasvdd? Let me know if you need anything else.

jvdd · 2022-09-07T13:38:08Z

Hi @Alexander-Serov,

I was able to find 2 errors based on minor adjustments of your snippet and implemented a solution in PR #116

Error 1: we should parse hf_x arrays to numeric or datetime when they are of dtype "object"
Error 2: to work with multiple time zones in the same hf_x array (we should ignore the time zone information away in the visualization)

We can create a pre-release so that you can test on your end whether this resolves the issue or not :)

PS: we (@jonasvdd & me) were both afk because we were backpacking in the Alps! ⛺ ⛰️

Alexander-Serov · 2022-09-09T11:51:32Z

PS: we (@jonasvdd & me) were both afk because we were backpacking in the Alps! ⛺ ⛰️

Hey!

No worries, it seems you guys were closer to my place than I expected. 🏔️

Glad to hear you found the problems and no problem testing the pre-release, let me know!

Just a short question for error 2 though. You are saying you are considering dropping the TZ information and using UTC. This might render the FigureResampler unusable for our applications, for example, because UTC time could be hard to interpret for specific projects. E.g. for this particular data set, with a -7h UTC offset, the daytime data would be shown at night. Perhaps, there is a way to keep the TZ? It seems plotly is able to plot the original figure just fine. Or could we use UTC internally, but show the original labels on the x axis? It is plotting non-UTC data in the UTC timezone that would create the most confusion for us.

Also, a remark just to be sure we are talking about the same things. In the data set I've sent you, you will see the -07:00 and -08:00 UTC offsets. However, it is still the same timezone and it is just the result of the daylight saving time (DST) change. To properly process this kind of DST data, one normally casts them to a named TZ, for instance, for the data I've sent you, you could use

import pandas as pd
data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index, utc=True)
data.index = data.index.tz_convert('America/Los_Angeles')
data

Would this adjustment solve the problem you were having? (Surely, not all input data can be converted to such format if you don't know the TZ name, but this is a very common case. And perhaps you could process differently [try-catch] those cases, when this is not a problem?).

jvdd · 2022-09-09T12:55:04Z

Hey! It was a long (night)train-ride away to get there, but what a coincidence indeed 😃

Internally, when there are multiple timezones in the x-data, we now omit the timezone information (but do not convert the time) In the other cases nothing special happens - see additions to our parsing code;

plotly-resampler/plotly_resampler/figure_resampler/figure_resampler_interface.py

Lines 690 to 718 in c0cd89e

    
           # Try to parse the hf_x data if it contains string values 
        
           if hf_x.dtype.type is np.str_ or hf_x.dtype == "object": 
        
               if len(hf_x) and isinstance(hf_x[0], str): 
        
                   try: 
        
                       # Try to parse to numeric 
        
                       hf_x = pd.to_numeric(hf_x, errors="raise") 
        
                   except: 
        
                       try: 
        
                           # Try to parse to datetime 
        
                           hf_x = np.asarray([pd.Timestamp(x) for x in hf_x]) 
        
                       except: 
        
                           raise ValueError( 
        
                               "plotly-resampler requires the x-data to be numeric or datetime-like" 
        
                           ) 
        
           # Check and update timezones of the hf_x data when there are multiple 
        
           # timezones in the data 
        
           if hf_x.dtype == "object": 
        
               if len(hf_x) and isinstance(hf_x[0], pd.Timestamp): 
        
                   # Assumes that all values in hf_x are pd.Timestamps 
        
                   if len(set(x.tz for x in hf_x)) > 1: 
        
                       # Remove the timezone data for plotting when multiple timezones 
        
                       warnings.warn( 
        
                           "x-data of multiple timezones / fixedoffsets is passed, " 
        
                           + "omitting the timezone data for plotting", 
        
                           UserWarning, 
        
                       ) 
        
                       hf_x = np.asarray( 
        
                           list(map(lambda x: x.replace(tzinfo=None), hf_x)) 
        
                       )

To answer your two questions;

Perhaps, there is a way to keep the TZ? It seems plotly is able to plot the original figure just fine.

Indeed, plotly can cope well with this! I've checked in tests if omitting timezone information (when there are multiple timezones in the x-data) results in the same x-values as vanilla plotly would show. You are more than welcome to validate this yourself :)

Would this adjustment solve the problem you were having? (Surely, not all input data can be converted to such format if you don't know the TZ name, but this is a very common case. And perhaps you could process differently [try-catch] those cases, when this is not a problem?)

Yes, because in that case the data has only 1 timezone (and not two fixedoffsets). Perhaps we should also include this in the warning message 🤔

I've created pre-release 0.8.2rc1, let us know if this fixes the issue correctly & results in the same behavior as plain plotly :)

Also, feel free to check out the pull request for this issue #116!
I believe you are more experienced with timezones than @jonasvdd & me 🙃

jvdd · 2022-09-24T12:42:06Z

Hey @Alexander-Serov, does the pre-release fixes the issue?

Alexander-Serov · 2022-09-26T07:38:36Z

Hey @Alexander-Serov, does the pre-release fixes the issue?

Hey @jvdd,
Sorry for the delay, I will check this today!

Alexander-Serov · 2022-09-26T08:33:20Z

Hey @jvdd,

Sorry again for the delay. I had wanted to test but had forgotten about it! But I've found time to test it now. Here is what I see:

the snippet I've originally posted works great when upgraded to 0.8.2rc1, good job!
however, when I try to test a different one, plotting the same data set, but using its time index, my kernel dies 💀 . Strange...

Here is the new (even shorter snippet).

from plotly_resampler import FigureResampler
import plotly.graph_objects as go
import pandas as pd

data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index)
fig = go.Figure()
fig.add_trace(
        go.Scattergl(
            x=data.index,
            y=data["param2"],
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.show()
print("-"*80)

FigureResampler(fig).show_dash(mode='inline')

As before, the original plotly plot works fine. Do you have any idea of what could go wrong?

Also, I see that you were testing with default_n_shown_samples=10_000, but I've just checked, it works even with v0.8.1. However, when I remove it, it doesn't. So I performed all my tests without this option.

jvdd · 2022-09-27T09:11:59Z

Hi @Alexander-Serov,

First of all thanks for the thorough review on PR #116, @jonasvdd and me will take your remarks into account and update the PR accordingly in the following days. It is really nice to dive into & discuss code with other (more) experienced people - this is the power of open source 😄

Regarding your comment above: I was also able to reproduce the kernel crash (actually I encountered this issue as well when creating the first implementation in PR #116, but I thought the issue was fixed in the pre-release - which is apparently not the case 🙃 ). We'll try to properly find the root cause for this issue and hopefully fix it as well in PR #116

jvdd · 2022-09-27T14:16:19Z

@Alexander-Serov, taking your comments into account I updated the code and created a new pre-release 0.8.2rc2. Now arrays of datetime.datetime will be parsed as well (which should fix the kernel crash issue).

P.S., I still have to look into finding the most optimal way to drop (or converting?) the tzinfo when there are multiple timezones before we can create a new major release

Alexander-Serov · 2022-09-28T15:53:33Z

Hey!
You are welcome, it has been a great pleasure to work with you guys so far. 😃 I wouldn't say I am very knowledgeable in plotly in general (you guys should be much better), but I do indeed manipulate datetimes every day.

Considering the question of dropping/conversion of datetimes with different TZs, I think it's important to ask oneself why different TZs got there in the first place:

As I mentioned, the most common case I see is that it's the same named TZ, which - if read as unnamed - will appear as different TZs before and after DST (daylight saving time change) date. For example, for Los Angeles (the dataset I sent you) you will see UTC-07:00 and UTC-08:00. In this case, the only way to show it correctly would be to ask the user to convert to a named TZ. A priori, you cannot really infer you. For me, this is 99% of cases you will see as resampler input with different TZs.
if the data really come from different named TZs, there is not much we can do either. We could convert them to UTC or drop the TZ. Conversion to UTC has the disadvantage that the apparent datetime may be wrong. What would have happened at 6 pm in LA would be at ~midninght in UTC. The figures may be confusing. However, if you drop the TZ altogether, you may have an incorrect time shift around the DST dates.

This said, I know plotly itself manages some of these cases without converting all to UTC. I wonder if we can look up what they do in their code.

I will now try the rc2 to see if it fixes the kernel crash and have a look at the PR replies. :) If all goes well, we should have a new version soon. :)

Alexander-Serov · 2022-09-28T16:08:17Z

After first tests, it seems like I am having this AssertionError although all my index datetimes are in the same TZ:

I am trying to plot

fig = go.Figure()
fig.add_trace(
        go.Scattergl(
            x=data.index,
            y=data["param2"],
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.show()
print("-"*80)

FigureResampler(fig).show_dash(mode='inline')

on the above data set converted to a single named TZ by applying

data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index,utc=True).tz_convert('America/Los_Angeles')

Do you confirm?

Alexander-Serov · 2022-09-28T16:09:35Z

At the same time, if I plot this plot (the one discussed above):

fig = go.Figure()
for param in sorted(data["param"].unique()):
    param_data = data.loc[data["param"] == param,
                ["param1", "param2"]].set_index(
        "param1", drop=True
    )
    param_data = param_data.sort_index()
    fig.add_trace(
        go.Scattergl(
            x=param_data.index,
            y=param_data["param2"],
            name=f"{param}",
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.update_xaxes(title_text="D")
fig.update_yaxes(title_text='L')
fig.update_layout(title="Av")
fig.show()
print("-"*80)

FigureResampler(fig).show_dash(mode='inline')

on the dataset with a single named TZ, but DST, this cell continues to run indefinitely.

Let me know if you can confirm. I will continue looking at it tomorrow. 😃 Thanks for your efforts!

jvdd · 2022-09-28T16:10:31Z

After first tests, it seems like I am having this AssertionError although all my index datetimes are in the same TZ.
...
Do you confirm?

I do not get an AssertionError when all the there is only a single named TZ.

Code from screenshot above:

data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index,utc=True).tz_convert('America/Los_Angeles')

fig = go.Figure()
fig.add_trace(
        go.Scattergl(
            x=data.index,
            y=data["param2"],
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.update_layout(height=300)
fig.show()
print("-"*80)

FigureResampler(fig).show_dash(mode='inline')

However, I did get an AssertionError when the data was not converted to a single named TZ.

...
on the dataset with a single named TZ, but DST, this cell continues to run indefinitely.

This plot works perfectly on my computer as well

It is rather bizarre that I cannot reproduce your errors.. To be sure that we are comparing apples to apples I created a new pre-release rc3 (which contains the fixes from earlier today)

jvdd · 2022-09-28T21:11:57Z

This PR still needs some more thought - I tend to agree with your comment @Alexander-Serov #115 (comment)

At the moment, I'm unsure if dropping the timezone information is the responsibility of this library - as users can convert it to UTC or drop them themselves..
Perhaps we should (just as you explained in your comment)

support the 99% of the cases by supporting single named TZs (which is currently supported - if you are able to confirm with the new rc)
raise an error with a clear message that we do accept this single named TZ (with DST) when there are multiple TZs detected (but leave them untouched)

Alexander-Serov · 2022-09-30T08:20:14Z

Hey guys!

I have tested the latest 0.8.2rc3 version, thanks for releasing it! I confirm that the plots behavior has changed, but I am still having a problem. When I launch this plot (with datetime index with a single named TZ), the resampled plot never plots and just runs indefinitely. I think the problem may be due to the long cycle I have commented on in the PR, I will try to investigate (if I have time). Here is the code that never terminates on my system:

from plotly_resampler import FigureResampler
import plotly.graph_objects as go
import pandas as pd

data = pd.read_csv("sample_data.csv", index_col='time')
data.index = pd.to_datetime(data.index,utc=True).tz_convert('America/Los_Angeles')

fig = go.Figure()
fig.add_trace(
        go.Scattergl(
            x=data.index,
            y=data["param2"],
            mode="markers",
            marker=dict(color='lightblue', size=5),
        )
    )
fig.show()
print("-"*80)

FigureResampler(fig).show_dash(mode='inline')

I think it pretty much fits into our 99% discussed above.

I also see that this conde is exactly the same as your code above. Are we running different plotly/numpy versions? 🤔

Here is my simplified pip freeze.

black==22.6.0
bleach==5.0.1
cachetools==5.2.0
certifi==2022.6.15
jupyter-client==7.3.4
jupyter-core==4.11.1
jupyter-dash==0.4.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.2
lttbc==0.2.0
matplotlib==3.5.3
matplotlib-inline==0.1.6
mypy-extensions==0.4.3
nbclient==0.6.7
nbconvert==7.0.0
nbformat==5.4.0
nbmake==1.3.4
notebook==6.4.12
numpy==1.21.6
pandas==1.3.5
plotly==5.9.0
plotly-resampler==0.8.2rc3
pydivsufsort==0.0.6
pytz==2022.2.1
scikit-learn==1.0.2
scipy==1.7.3
seaborn==0.10.1
widgetsnbextension==4.0.2
xarray==0.20.2
xarray-einstats==0.2.2

I am running py3.7 in a Windows environment.

Alexander-Serov · 2022-09-30T09:20:35Z

I am running py3.7 in a Windows environment.

Update.
I have installed the plotly resampler in a separate environment and the cell works fine. I must have a package incompatibility somewhere. I am checking.

Alexander-Serov · 2022-09-30T12:16:21Z

Update.
I have investigated further and it is the .show_dash(mode='inline') command that hangs for some reason when x is provided as index. If I use a simple show(), everything is fine. It only occurs in my other environment. When I've created one specifically for the resampler, I do not reproduce with the same code.

Do you have an idea why it could hang on show_dash if it's an index and not a Series?
Specifically, the command app.run_server(mode=mode, **kwargs) is launched by show_dash() and hangs instead of plotting a figure.

jvdd · 2022-10-07T06:54:01Z

Hi @Alexander-Serov,

Thx for investigating this - we both appreciate your efforts :)

I could not reproduce your issue in a Python 3.7.14 environment (with the same packages as from the pip freeze output).. :/

To further answer your questions / comments:

Update.
I have investigated further and it is the .show_dash(mode='inline') command that hangs for some reason when x is provided as index.

So the issue only occurs when you pass data.index as x?
E.g. passing data.index.values as x works fine? 🤯

Do you have an idea why it could hang on show_dash if it's an index and not a Series?

Nope... really boggles my mind

jvdd added a commit that referenced this issue Sep 7, 2022

🐛 parse object arrays for datetime or numeric dtypes, fixes #115

ca96471

jvdd mentioned this issue Sep 27, 2022

🐛 parse object arrays for hf_x #116

Merged

jvdd mentioned this issue Oct 10, 2022

🐛 fix bug with update axes range #122

Closed

5 tasks

jvdd mentioned this issue Oct 20, 2022

🐛 update layout axes range bug #126

Merged

4 tasks

jvdd closed this as completed in 1b8ee36 Dec 2, 2022

jonasvdd mentioned this issue Dec 30, 2022

"Segmentation fault (core dumped)" when using string timestamp for x-axis #153

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115

After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115

Alexander-Serov commented Aug 25, 2022 •

edited

jonasvdd commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 25, 2022 •

edited

jonasvdd commented Aug 25, 2022

Alexander-Serov commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 25, 2022

jonasvdd commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 25, 2022

Alexander-Serov commented Aug 30, 2022 •

edited

jvdd commented Sep 7, 2022 •

edited

Alexander-Serov commented Sep 9, 2022

jvdd commented Sep 9, 2022 •

edited

jvdd commented Sep 24, 2022

Alexander-Serov commented Sep 26, 2022

Alexander-Serov commented Sep 26, 2022

jvdd commented Sep 27, 2022

jvdd commented Sep 27, 2022

Alexander-Serov commented Sep 28, 2022

Alexander-Serov commented Sep 28, 2022

Alexander-Serov commented Sep 28, 2022 •

edited

jvdd commented Sep 28, 2022 •

edited

jvdd commented Sep 28, 2022

Alexander-Serov commented Sep 30, 2022 •

edited

Alexander-Serov commented Sep 30, 2022

Alexander-Serov commented Sep 30, 2022 •

edited

jvdd commented Oct 7, 2022

After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115

After upgrading from 0.3 to 0.8.1, one of my notebook cells with resampler runs indefinitely #115

Comments

Alexander-Serov commented Aug 25, 2022 • edited

jonasvdd commented Aug 25, 2022 • edited

Alexander-Serov commented Aug 25, 2022 • edited

jonasvdd commented Aug 25, 2022

Alexander-Serov commented Aug 25, 2022 • edited

Alexander-Serov commented Aug 25, 2022

jonasvdd commented Aug 25, 2022 • edited

Alexander-Serov commented Aug 25, 2022

Alexander-Serov commented Aug 30, 2022 • edited

jvdd commented Sep 7, 2022 • edited

Alexander-Serov commented Sep 9, 2022

jvdd commented Sep 9, 2022 • edited

jvdd commented Sep 24, 2022

Alexander-Serov commented Sep 26, 2022

Alexander-Serov commented Sep 26, 2022

jvdd commented Sep 27, 2022

jvdd commented Sep 27, 2022

Alexander-Serov commented Sep 28, 2022

Alexander-Serov commented Sep 28, 2022

Alexander-Serov commented Sep 28, 2022 • edited

jvdd commented Sep 28, 2022 • edited

jvdd commented Sep 28, 2022

Alexander-Serov commented Sep 30, 2022 • edited

Alexander-Serov commented Sep 30, 2022

Alexander-Serov commented Sep 30, 2022 • edited

jvdd commented Oct 7, 2022

Alexander-Serov commented Aug 25, 2022 •

edited

jonasvdd commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 25, 2022 •

edited

jonasvdd commented Aug 25, 2022 •

edited

Alexander-Serov commented Aug 30, 2022 •

edited

jvdd commented Sep 7, 2022 •

edited

jvdd commented Sep 9, 2022 •

edited

Alexander-Serov commented Sep 28, 2022 •

edited

jvdd commented Sep 28, 2022 •

edited

Alexander-Serov commented Sep 30, 2022 •

edited

Alexander-Serov commented Sep 30, 2022 •

edited