tests | |
package |
Convenience utils for plotting, styling, and manipulating high-dimensional vectors.
- Analyses and plotting methods are one line to call, and produce consistently-formatted publication-ready plots.
- Enables rapid exploratory data analysis (EDA) and prototyping, perfect for taking a quick peek at data or making a quick figure to stash in the lab book (with labels and titles automatically included). See examples here.
- Designed for easy drop-in use for other projects, whether using internally to the code or for clean notebooks. Import isthmuslib to avoid writing many lines of plotting code when it would distract or detract from the main focus of your project.
- The visual and text configuration objects (
Style
andRosetta
, respectively) can be directly attached to a given data set, so you can "set it and forget it" at instantiation. All subsequent outputs will automatically have matching colors, sizes, labels, etc. - The
VectorSequence
object is designed for handling, plotting, and manipulating timeseries-like high-dimensional vectors. Its functionality includes: dimensionality reduction via singular vealue decomposition, seasonal (e.g. weekly, monthly, ...) timeseries decomposition, infosurface generation, and more. - Uses industry standard libraries (pyplot, numpy, seaborn, pandas, etc) under the hood, and exposes their underlying functionality through the wrappers.
Free software under the MIT license.
Contact: isthmuslib@mitchellpkt.com
pip install isthmuslib
To use the project:
import isthmuslib
Demo one-liners ============= A complete tutorial notebook is available here: https://github.com/Mitchellpkt/python-isthmuslib/blob/main/isthmuslib_tutorial.ipynb
Below are a few demos of one-line helper functions for plot generation and statistical analyses.
We can plot multiple distributions with the hist()
wrapper around matplotlib.pyplot
:
isthmuslib.hist([data_1, data_2], bins=50, xlabel='Wavelength',
title='H2 observations', legend_strings=["Q3", "Q4"])
Additional keyword arguments are passed through to the pyplot histogram function, for example density
and cumulative
.
istmuslib.hist([data_1, data_2], bins=200, density=True, cumulative=True,
xlabel='Wavelength', title='H2 observations', legend_strings=["Q3", "Q4"])
Likewise, we have a wrapper for matplotlib
's scatter,
isthmuslib.scatter([data_1, data_2], [data_3, data_4], xlabel='angle $\phi$', ylabel='voltage',
title='Tuning results', legend_strings=['Control case', 'Calibrated'])
We can also cast a single x & y vector pair into a 2D histogram (essentially a surface with height [color] showing bin counts).
isthmuslib.hist2d(data_1, data_3, bins=(20, 20), xlabel='angle $\phi$', ylabel='voltage',
title='Control case', colorbar_label='sample counts')
We can also load a dataframe or CSV file into the VectorSequence
class for working with multivariate timeseries and similarly shaped data with some physically-interpretable strictly ordered axis, for example:
- Multiple physical features (temperature, pressure, and irradiation) measured simultaneously at 3 different heights
- Multiple stock values observed over time
- Fluorescence intensity measured simultaneously at different wavelengths
(If the data does not have an inherent ordering, use the isthmuslib VectorMultiSet
instead of the VectorSequence
).
timeseries: isthmuslib.VectorSequence().read_csv(pathlib.Path.cwd() / 'data' / 'example_vector_sequence_data.csv',
inplace=False, basis_col_name='timestamp', name_root='Experiment gamma')
The isthmuslib plotting features demoed above are directly attached to the vector multiset & sequence objects.
timeseries.plot('baz')
timeseries.hist('bar', bins=50)
We can take a peek at correlation between the columns (wraps corr
from pandas
).
timeseries.correlation_matrix()
We can visualize seasonal decomposition analyses with a single line, wrapping statsmodel.tsa
logic with styled plots.
timeseries.plot_decomposition('foo', 30, figsize=(10, 6), title='Foo trace: ', ylabel='Voltage')
The VectorSequence timeseries class contains logic for sliding window analyses with arbitrary functions. Here we'll use a throwaway lambda appreciation
to demonstrate, and apply that function over sliding windows with 2, 4, and 8 week durations.
appreciation = lambda o: {'Change in value (%)': 100 * (o.values('foo')[-1] / o.values('foo')[0] - 1)}
window_widths_weeks: List[float] = [2, 4, 8]
result: isthmuslib.SlidingWindowResults = timeseries.sliding_window(appreciation,
[x * 60 * 60 * 24 * 7 for x in window_widths_weeks],
overlapping=True)
The SlidingWindowResult.plot_results()
method automatically plots results separated by window width.
result.plot_results('Change in value (%)', legend_override=[f"{x} weeks " for x in window_widths_weeks])
Likewise, the sliding_window.plot_pdfs()
method plots distributions separated by window width.
result.plot_pdfs('Change in value (%)', density=True, bins=50,
legend_override=[f"{x} weeks " for x in window_widths_weeks])
Dimensionality reduction (SVD) logic over sliding windows is built into the VectorSequence
class, allowing easy calculation and visualization of information surfaces (first 3 singular value surfaces shown below). The timeseries basis (specified in basis_col_name
) is automatically excluded from the SVD analysis. The cols
keyword argument can be specified when only certain data features should be taken into account.
timeseries.plot_info_surface()
This library includes log extraction tooling from mostly unstructured strings or files. For example, take the string: "It was the best of times, [@@@] it was the worst [<<x=5>>]of times, it was the age of wisdom, [<<y='foo'>>] it was the age of foolishness, [@@@] it was the epoch of belief, it was the epoch of incredulity, [<<y='bar'>>] it was the season of Light, it was the season of Darkness"
The one-liner:
isthmuslib.auto_extract_from_text(input_string)
extracts the dataframe:
We have some tools for quickly checking the quality of a data feature intended for use as a basis. Whether missing or unevenely-spaced data is OK or problematic is 100% context dependent.
First, let's look at some clean data with evenly spaced values and no missing data:
isthmuslib.basis_quality_plots(uniform_complete_data)
On the other hand, here's what we see for uneven or missing data:
isthmuslib.basis_quality_plots(uneven_data)