New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series histogram plot example #18733
Merged
jklymak
merged 8 commits into
matplotlib:master
from
ecotner:topic/time-series-histogram-example
Oct 27, 2020
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
401e4fe
added script to create time series histogram plot
ecotner bf47ef2
replace plt.hist2d with np.histogram2d and plt.pcolormesh
ecotner 7eb563c
fix docs and split xy into separate vectors
ecotner 22a1392
change random noise to random walk, use jet cmap on log color scale plot
ecotner 4d2d6f7
reduced number of series for faster CI, modified alpha and vmax for b…
ecotner 818636e
add more detail to intro/comments, remove superfluous comments/code, …
ecotner 920588d
resize plot, tight layout, rasterization, time sig figs
ecotner ea26862
add colorbars, constrained layout
ecotner File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
""" | ||
===================== | ||
Time Series Histogram | ||
===================== | ||
|
||
This example demonstrates how to efficiently visualize large numbers of time | ||
series in a way that could potentially reveal hidden substructure and patterns | ||
that are not immediately obvious, and display them in a visually appealing way. | ||
|
||
In this example, we generate multiple sinusoidal "signal" series that are | ||
buried under a larger number of random walk "noise/background" series. For an | ||
unbiased Gaussian random walk with standard deviation of σ, the RMS deviation | ||
from the origin after n steps is σ*sqrt(n). So in order to keep the sinusoids | ||
visible on the same scale as the random walks, we scale the amplitude by the | ||
random walk RMS. In addition, we also introduce a small random offset ``phi`` | ||
to shift the sines left/right, and some additive random noise to shift | ||
individual data points up/down to make the signal a bit more "realistic" (you | ||
wouldn't expect a perfect sine wave to appear in your data). | ||
|
||
The first plot shows the typical way of visualizing multiple time series by | ||
overlaying them on top of each other with ``plt.plot`` and a small value of | ||
``alpha``. The second and third plots show how to reinterpret the data as a 2d | ||
histogram, with optional interpolation between data points, by using | ||
``np.histogram2d`` and ``plt.pcolormesh``. | ||
""" | ||
from copy import copy | ||
import time | ||
|
||
import numpy as np | ||
import numpy.matlib | ||
import matplotlib.pyplot as plt | ||
from matplotlib.colors import LogNorm | ||
|
||
fig, axes = plt.subplots(nrows=3, figsize=(6, 8), constrained_layout=True) | ||
|
||
# Make some data; a 1D random walk + small fraction of sine waves | ||
num_series = 1000 | ||
num_points = 100 | ||
SNR = 0.10 # Signal to Noise Ratio | ||
x = np.linspace(0, 4 * np.pi, num_points) | ||
# Generate unbiased Gaussian random walks | ||
Y = np.cumsum(np.random.randn(num_series, num_points), axis=-1) | ||
# Generate sinusoidal signals | ||
num_signal = int(round(SNR * num_series)) | ||
phi = (np.pi / 8) * np.random.randn(num_signal, 1) # small random offest | ||
Y[-num_signal:] = ( | ||
np.sqrt(np.arange(num_points))[None, :] # random walk RMS scaling factor | ||
* (np.sin(x[None, :] - phi) | ||
+ 0.05 * np.random.randn(num_signal, num_points)) # small random noise | ||
) | ||
|
||
|
||
# Plot series using `plot` and a small value of `alpha`. With this view it is | ||
# very difficult to observe the sinusoidal behavior because of how many | ||
# overlapping series there are. It also takes a bit of time to run because so | ||
# many individual artists need to be generated. | ||
tic = time.time() | ||
axes[0].plot(x, Y.T, color="C0", alpha=0.1) | ||
toc = time.time() | ||
axes[0].set_title("Line plot with alpha") | ||
print(f"{toc-tic:.3f} sec. elapsed") | ||
|
||
|
||
# Now we will convert the multiple time series into a histogram. Not only will | ||
# the hidden signal be more visible, but it is also a much quicker procedure. | ||
tic = time.time() | ||
# Linearly interpolate between the points in each time series | ||
num_fine = 800 | ||
x_fine = np.linspace(x.min(), x.max(), num_fine) | ||
y_fine = np.empty((num_series, num_fine), dtype=float) | ||
for i in range(num_series): | ||
y_fine[i, :] = np.interp(x_fine, x, Y[i, :]) | ||
y_fine = y_fine.flatten() | ||
x_fine = np.matlib.repmat(x_fine, num_series, 1).flatten() | ||
|
||
|
||
# Plot (x, y) points in 2d histogram with log colorscale | ||
# It is pretty evident that there is some kind of structure under the noise | ||
# You can tune vmax to make signal more visible | ||
cmap = copy(plt.cm.plasma) | ||
cmap.set_bad(cmap(0)) | ||
h, xedges, yedges = np.histogram2d(x_fine, y_fine, bins=[400, 100]) | ||
pcm = axes[1].pcolormesh(xedges, yedges, h.T, cmap=cmap, | ||
norm=LogNorm(vmax=1.5e2), rasterized=True) | ||
fig.colorbar(pcm, ax=axes[1], label="# points", pad=0) | ||
axes[1].set_title("2d histogram and log color scale") | ||
|
||
# Same data but on linear color scale | ||
pcm = axes[2].pcolormesh(xedges, yedges, h.T, cmap=cmap, | ||
vmax=1.5e2, rasterized=True) | ||
fig.colorbar(pcm, ax=axes[2], label="# points", pad=0) | ||
axes[2].set_title("2d histogram and linear color scale") | ||
|
||
toc = time.time() | ||
print(f"{toc-tic:.3f} sec. elapsed") | ||
plt.show() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit I'm not super clear what this is doing. It would be nice if you could explain the underlying signal in the intro or here so it is clear what the reader should be looking for in the data. Are you saying some signals are random walk and others are sines? Why does the amplitude of the sine change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, so what I'm trying to do is show that this method of plotting can be helpful to find patterns buried under some noisy background. The random walk is the "noise/background", and the sinusoid is the "signal/pattern".
I add a little bit of random noise to the sine by 1) adding a small random offset
phi
to each series to shift it a bit left/right, and 2) add a little bit of additive noise with thenp.random.randn
to shift each point up/down. I would never expect a perfect sine signal in real data, so this is just to make it a bit more "realistic".The amplitude of the sine changes simply because it would not be very visible on the plot otherwise. For a Gaussian random walk with stddev of
σ
(in this case,σ=1
), the RMS displacement from the origin aftern
steps isσ*sqrt(n)
. So I scale the amplitude of the sine by this value so that it grows along with the random walk. Otherwise, the range of the sine would be restricted to+1/-1
, while the random walk grows to have an RMS amplitude of+10/-10
(and non-negligible probability to have amplitudes of even higher magnitude as well; most of the time this plot will produce random walk series that go as high as+30/-30
).Does that make sense? I can add these details to the intro for clarity.