New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series histogram plot example #18733
Time series histogram plot example #18733
Conversation
This seems to jumpy through painful hoops to get the data into the right form. Suggest not to use |
I'm not sure I understand; |
is the interpolation step? I've not checked this, but I'd do:
My dislike of |
Oh, you don't have to pack x and y together, I just did that because it was more convenient. The call signature of |
Also, do you know why the build is failing? I have found the CI process emits a warning like
but I'm not sure how to fix it. Are there supposed to be documentation files accompanying the python scripts? |
in rst will try and resolve some module called that. But you mean it as a literal:
|
@jklymak I replaced the interpolation code with your suggestion, then also kept |
OK, so one more suggestion - rather than use just random, maybe try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the speed is one of your points, but we can't easily have something this slow in the docs. Consider pointing out the speed difference in the text, but don't make the actual example be so slow. We run our CI very often....
_, axes = plt.subplots(nrows=3, figsize=(10, 6 * 3)) | ||
|
||
# Make some data; a 1D random walk + small fraction of sine waves | ||
num_series = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example takes over 40 seconds to run, out of a 15 minute process. It needs to be much faster to be worth it and I'm sure the effect can be achieved with fewer than 10,000 time series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the CI pipeline run over the same code multiple times or something? On my machine it only takes 5 seconds to run! I'll take a look and see if I can reduce the number of series while still illustrating the other advantages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just look at the output on CI - it says the time...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of the above example(s)? The fact that you can see the sinusoid now when using plot
kind of detracts from the proposed advantages of increased visibility, but the histograms definitely look "sexier" with the plasma
color coding IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thats fine. I think its also reasonable to increase the alpha. That alpha effect is pretty fragile at such a low value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so CI now says it only takes 2.4 seconds after reducing the number of series by a factor of 10 (not quite linear scaling unfortunately). Does that seem low enough, or should we try and reduce the build time a bit more?
…etter illustration, eliminated loop over each time series line plot, changed color map to plasma
Hmm from looking at the CI logs, it seems that the failure is due to some entirely unrelated test called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine, but a few suggestions....
# sinusoidal signal | ||
num_signal = int(round(SNR * num_series)) | ||
phi = (np.pi / 8) * np.random.randn(num_signal, 1) # small random offest | ||
Y[-num_signal:] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit I'm not super clear what this is doing. It would be nice if you could explain the underlying signal in the intro or here so it is clear what the reader should be looking for in the data. Are you saying some signals are random walk and others are sines? Why does the amplitude of the sine change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so what I'm trying to do is show that this method of plotting can be helpful to find patterns buried under some noisy background. The random walk is the "noise/background", and the sinusoid is the "signal/pattern".
I add a little bit of random noise to the sine by 1) adding a small random offset phi
to each series to shift it a bit left/right, and 2) add a little bit of additive noise with the np.random.randn
to shift each point up/down. I would never expect a perfect sine signal in real data, so this is just to make it a bit more "realistic".
The amplitude of the sine changes simply because it would not be very visible on the plot otherwise. For a Gaussian random walk with stddev of σ
(in this case, σ=1
), the RMS displacement from the origin after n
steps is σ*sqrt(n)
. So I scale the amplitude of the sine by this value so that it grows along with the random walk. Otherwise, the range of the sine would be restricted to +1/-1
, while the random walk grows to have an RMS amplitude of +10/-10
(and non-negligible probability to have amplitudes of even higher magnitude as well; most of the time this plot will produce random walk series that go as high as +30/-30
).
Does that make sense? I can add these details to the intro for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, final nitpick. This figure is really large and barely fits vertically onto my 27" screen when rendered in the docs. Nobody is going to publish a 10" wide figure, so suggest you make at least 60% the size it currently is.
cmap = copy(plt.cm.plasma) | ||
cmap.set_bad(cmap(0)) | ||
h, xedges, yedges = np.histogram2d(x_fine, y_fine, bins=[400, 100]) | ||
axes[1].pcolormesh(xedges, yedges, h.T, cmap=cmap, norm=LogNorm(vmax=1.5e2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this will render faster if you do rasterized=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, doesn't really seem to make a difference on my machine, but maybe it will on others? I added another digit to the time elapsed output since it's getting pretty short already.
Ahh yeah that's a holdover from the default size I use when I'm plotting in jupyter lab; my monitor is fairly large so I try and make the plots take up a bit more real estate. How does 6x8 sound? Keeps roughly the same aspect ratio (maybe a little shorter). |
|
||
fig.tight_layout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constrained layout? I guess the other nit, is that the pcolormeshs should have colorbars...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured I'd use tight_layout()
to reduce the whitespace in the margins. Is this a bad practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if you add the colorbars, you will find that constrained_layout works better...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I didn't even realize constrained_layout
was even a thing. Only ever used tight_layout
before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I wrote it, so I try to twist peoples' arms to use it ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Does it actually use CSP or LP to optimize the layout, or is it like a heuristic solver?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what CSP or LP are 😉. It uses a linear constraint solver (kiwi solver) and relatively straight forward constraints....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh; CSP = Constraint Satisfaction Problem/Programming, and LP = Linear Programming. They're very broad classes of numerical techniques to optimize some objective under constraints. Looks like kiwi solver is based off an LP algorithm, so I guess you know more about it than you think 😄
Thanks for all your work on this - it is a nice example, and I think others will find this a useful technique.... |
No problem. And thanks for all the feedback, I think it really made the end result way better! |
PR Summary
Follow up to issue #18643. This PR seeks to add an example to the gallery which shows an alternative way to visualize time series by reinterpreting them as 2d histograms, allowing for
The example script generates a plot like the following, where a sinusoidal signal is visible in the 2nd/3rd plots despite large amounts of noise:
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
andpydocstyle<4
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).