Skip to content

Mitchellpkt/python-isthmuslib

Repository files navigation

Overview

tests
Requirements Status
package
PyPI Package latest release PyPI Wheel Supported versions Supported implementations
Commits since latest release

Convenience utils for plotting, styling, and manipulating high-dimensional vectors.

  • Analyses and plotting methods are one line to call, and produce consistently-formatted publication-ready plots.
  • Enables rapid exploratory data analysis (EDA) and prototyping, perfect for taking a quick peek at data or making a quick figure to stash in the lab book (with labels and titles automatically included). See examples here.
  • Designed for easy drop-in use for other projects, whether using internally to the code or for clean notebooks. Import isthmuslib to avoid writing many lines of plotting code when it would distract or detract from the main focus of your project.
  • The visual and text configuration objects (Style and Rosetta, respectively) can be directly attached to a given data set, so you can "set it and forget it" at instantiation. All subsequent outputs will automatically have matching colors, sizes, labels, etc.
  • The VectorSequence object is designed for handling, plotting, and manipulating timeseries-like high-dimensional vectors. Its functionality includes: dimensionality reduction via singular vealue decomposition, seasonal (e.g. weekly, monthly, ...) timeseries decomposition, infosurface generation, and more.
  • Uses industry standard libraries (pyplot, numpy, seaborn, pandas, etc) under the hood, and exposes their underlying functionality through the wrappers.

Free software under the MIT license.

Contact: isthmuslib@mitchellpkt.com

Installation

pip install isthmuslib

Documentation

To use the project:

import isthmuslib

Demo one-liners ============= A complete tutorial notebook is available here: https://github.com/Mitchellpkt/python-isthmuslib/blob/main/isthmuslib_tutorial.ipynb

Below are a few demos of one-line helper functions for plot generation and statistical analyses.

We can plot multiple distributions with the hist() wrapper around matplotlib.pyplot:

isthmuslib.hist([data_1, data_2],  bins=50, xlabel='Wavelength',
                title='H2 observations', legend_strings=["Q3", "Q4"])

image

Additional keyword arguments are passed through to the pyplot histogram function, for example density and cumulative.

istmuslib.hist([data_1, data_2], bins=200, density=True, cumulative=True,
               xlabel='Wavelength', title='H2 observations', legend_strings=["Q3", "Q4"])

image

Likewise, we have a wrapper for matplotlib's scatter,

isthmuslib.scatter([data_1, data_2], [data_3, data_4], xlabel='angle $\phi$', ylabel='voltage',
                   title='Tuning results', legend_strings=['Control case', 'Calibrated'])

image

We can also cast a single x & y vector pair into a 2D histogram (essentially a surface with height [color] showing bin counts).

isthmuslib.hist2d(data_1, data_3, bins=(20, 20), xlabel='angle $\phi$', ylabel='voltage',
                  title='Control case', colorbar_label='sample counts')

image

We can also load a dataframe or CSV file into the VectorSequence class for working with multivariate timeseries and similarly shaped data with some physically-interpretable strictly ordered axis, for example:

  • Multiple physical features (temperature, pressure, and irradiation) measured simultaneously at 3 different heights
  • Multiple stock values observed over time
  • Fluorescence intensity measured simultaneously at different wavelengths

(If the data does not have an inherent ordering, use the isthmuslib VectorMultiSet instead of the VectorSequence).

timeseries: isthmuslib.VectorSequence().read_csv(pathlib.Path.cwd() / 'data' / 'example_vector_sequence_data.csv',
                inplace=False, basis_col_name='timestamp', name_root='Experiment gamma')

The isthmuslib plotting features demoed above are directly attached to the vector multiset & sequence objects.

timeseries.plot('baz')
timeseries.hist('bar', bins=50)

image

image

We can take a peek at correlation between the columns (wraps corr from pandas).

timeseries.correlation_matrix()

image

We can visualize seasonal decomposition analyses with a single line, wrapping statsmodel.tsa logic with styled plots.

timeseries.plot_decomposition('foo', 30, figsize=(10, 6), title='Foo trace: ', ylabel='Voltage')

image

image

image

image

The VectorSequence timeseries class contains logic for sliding window analyses with arbitrary functions. Here we'll use a throwaway lambda appreciation to demonstrate, and apply that function over sliding windows with 2, 4, and 8 week durations.

appreciation = lambda o: {'Change in value (%)': 100 * (o.values('foo')[-1] / o.values('foo')[0] - 1)}
window_widths_weeks: List[float] = [2, 4, 8]
result: isthmuslib.SlidingWindowResults = timeseries.sliding_window(appreciation,
                                                              [x * 60 * 60 * 24 * 7 for x in window_widths_weeks],
                                                              overlapping=True)

The SlidingWindowResult.plot_results() method automatically plots results separated by window width.

result.plot_results('Change in value (%)', legend_override=[f"{x} weeks " for x in window_widths_weeks])

image

Likewise, the sliding_window.plot_pdfs() method plots distributions separated by window width.

result.plot_pdfs('Change in value (%)', density=True, bins=50,
                 legend_override=[f"{x} weeks " for x in window_widths_weeks])

image

Dimensionality reduction (SVD) logic over sliding windows is built into the VectorSequence class, allowing easy calculation and visualization of information surfaces (first 3 singular value surfaces shown below). The timeseries basis (specified in basis_col_name) is automatically excluded from the SVD analysis. The cols keyword argument can be specified when only certain data features should be taken into account.

timeseries.plot_info_surface()

image

image

image

This library includes log extraction tooling from mostly unstructured strings or files. For example, take the string: "It was the best of times, [@@@] it was the worst [<<x=5>>]of times, it was the age of wisdom, [<<y='foo'>>] it was the age of foolishness, [@@@] it was the epoch of belief, it was the epoch of incredulity, [<<y='bar'>>] it was the season of Light, it was the season of Darkness"

The one-liner:

isthmuslib.auto_extract_from_text(input_string)

extracts the dataframe:

image

We have some tools for quickly checking the quality of a data feature intended for use as a basis. Whether missing or unevenely-spaced data is OK or problematic is 100% context dependent.

First, let's look at some clean data with evenly spaced values and no missing data:

isthmuslib.basis_quality_plots(uniform_complete_data)

image

image

image

On the other hand, here's what we see for uneven or missing data:

isthmuslib.basis_quality_plots(uneven_data)

image

image

image

About

Convenience utils for plotting, styling, and manipulating high-dimensional vectors. `pip install isthmuslib`

Resources

License

Stars

Watchers

Forks