Overview

tests
package

Convenience utils for plotting, styling, and manipulating high-dimensional vectors.

Analyses and plotting methods are one line to call, and produce consistently-formatted publication-ready plots.
Enables rapid exploratory data analysis (EDA) and prototyping, perfect for taking a quick peek at data or making a quick figure to stash in the lab book (with labels and titles automatically included). See examples here.
Designed for easy drop-in use for other projects, whether using internally to the code or for clean notebooks. Import isthmuslib to avoid writing many lines of plotting code when it would distract or detract from the main focus of your project.
The visual and text configuration objects (Style and Rosetta, respectively) can be directly attached to a given data set, so you can "set it and forget it" at instantiation. All subsequent outputs will automatically have matching colors, sizes, labels, etc.
The VectorSequence object is designed for handling, plotting, and manipulating timeseries-like high-dimensional vectors. Its functionality includes: dimensionality reduction via singular vealue decomposition, seasonal (e.g. weekly, monthly, ...) timeseries decomposition, infosurface generation, and more.
Uses industry standard libraries (pyplot, numpy, seaborn, pandas, etc) under the hood, and exposes their underlying functionality through the wrappers.

Free software under the MIT license.

Contact: isthmuslib@mitchellpkt.com

Installation

pip install isthmuslib

Documentation

To use the project:

import isthmuslib

Demo one-liners ============= A complete tutorial notebook is available here: https://github.com/Mitchellpkt/python-isthmuslib/blob/main/isthmuslib_tutorial.ipynb

Below are a few demos of one-line helper functions for plot generation and statistical analyses.

We can plot multiple distributions with the hist() wrapper around matplotlib.pyplot:

isthmuslib.hist([data_1, data_2],  bins=50, xlabel='Wavelength',
                title='H2 observations', legend_strings=["Q3", "Q4"])

Additional keyword arguments are passed through to the pyplot histogram function, for example density and cumulative.

istmuslib.hist([data_1, data_2], bins=200, density=True, cumulative=True,
               xlabel='Wavelength', title='H2 observations', legend_strings=["Q3", "Q4"])

Likewise, we have a wrapper for matplotlib's scatter,

isthmuslib.scatter([data_1, data_2], [data_3, data_4], xlabel='angle $\phi$', ylabel='voltage',
                   title='Tuning results', legend_strings=['Control case', 'Calibrated'])

We can also cast a single x & y vector pair into a 2D histogram (essentially a surface with height [color] showing bin counts).

isthmuslib.hist2d(data_1, data_3, bins=(20, 20), xlabel='angle $\phi$', ylabel='voltage',
                  title='Control case', colorbar_label='sample counts')

We can also load a dataframe or CSV file into the VectorSequence class for working with multivariate timeseries and similarly shaped data with some physically-interpretable strictly ordered axis, for example:

Multiple physical features (temperature, pressure, and irradiation) measured simultaneously at 3 different heights
Multiple stock values observed over time
Fluorescence intensity measured simultaneously at different wavelengths

(If the data does not have an inherent ordering, use the isthmuslib VectorMultiSet instead of the VectorSequence).

timeseries: isthmuslib.VectorSequence().read_csv(pathlib.Path.cwd() / 'data' / 'example_vector_sequence_data.csv',
                inplace=False, basis_col_name='timestamp', name_root='Experiment gamma')

The isthmuslib plotting features demoed above are directly attached to the vector multiset & sequence objects.

timeseries.plot('baz')
timeseries.hist('bar', bins=50)

We can take a peek at correlation between the columns (wraps corr from pandas).

timeseries.correlation_matrix()

We can visualize seasonal decomposition analyses with a single line, wrapping statsmodel.tsa logic with styled plots.

timeseries.plot_decomposition('foo', 30, figsize=(10, 6), title='Foo trace: ', ylabel='Voltage')

The VectorSequence timeseries class contains logic for sliding window analyses with arbitrary functions. Here we'll use a throwaway lambda appreciation to demonstrate, and apply that function over sliding windows with 2, 4, and 8 week durations.

appreciation = lambda o: {'Change in value (%)': 100 * (o.values('foo')[-1] / o.values('foo')[0] - 1)}
window_widths_weeks: List[float] = [2, 4, 8]
result: isthmuslib.SlidingWindowResults = timeseries.sliding_window(appreciation,
                                                              [x * 60 * 60 * 24 * 7 for x in window_widths_weeks],
                                                              overlapping=True)

The SlidingWindowResult.plot_results() method automatically plots results separated by window width.

result.plot_results('Change in value (%)', legend_override=[f"{x} weeks " for x in window_widths_weeks])

Likewise, the sliding_window.plot_pdfs() method plots distributions separated by window width.

result.plot_pdfs('Change in value (%)', density=True, bins=50,
                 legend_override=[f"{x} weeks " for x in window_widths_weeks])

Dimensionality reduction (SVD) logic over sliding windows is built into the VectorSequence class, allowing easy calculation and visualization of information surfaces (first 3 singular value surfaces shown below). The timeseries basis (specified in basis_col_name) is automatically excluded from the SVD analysis. The cols keyword argument can be specified when only certain data features should be taken into account.

timeseries.plot_info_surface()

This library includes log extraction tooling from mostly unstructured strings or files. For example, take the string: "It was the best of times, [@@@] it was the worst [<<x=5>>]of times, it was the age of wisdom, [<<y='foo'>>] it was the age of foolishness, [@@@] it was the epoch of belief, it was the epoch of incredulity, [<<y='bar'>>] it was the season of Light, it was the season of Darkness"

The one-liner:

isthmuslib.auto_extract_from_text(input_string)

extracts the dataframe:

We have some tools for quickly checking the quality of a data feature intended for use as a basis. Whether missing or unevenely-spaced data is OK or problematic is 100% context dependent.

First, let's look at some clean data with evenly spaced values and no missing data:

isthmuslib.basis_quality_plots(uniform_complete_data)

On the other hand, here's what we see for uneven or missing data:

isthmuslib.basis_quality_plots(uneven_data)

Name		Name	Last commit message	Last commit date
Latest commit History 389 Commits
.ipynb_checkpoints		.ipynb_checkpoints
ci		ci
data/version_controlled		data/version_controlled
readme_images		readme_images
src/isthmuslib		src/isthmuslib
tests		tests
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CHANGELOG.rst		CHANGELOG.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
demo.html		demo.html
demo.pdf		demo.pdf
isthmuslib_tutorial.ipynb		isthmuslib_tutorial.ipynb
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

License

Mitchellpkt/python-isthmuslib

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Documentation

About

Resources

License

Stars

Watchers

Forks

Languages