Skip to content

Releases: pasteur-dev/pasteur

Improved Template Project

12 Sep 11:40
Compare
Choose a tag to compare

This release improves upon the previous one by using the new format preferred by pip-compile in the template project, and switches from a setup.py file to pyproject.toml. The new requirements.in file no longer suggests installing Pasteur's docs dependencies due to sphinx 7 compatibility issues.

The transformers in extras are also updated to remove the categorical dtype check deprecation warning.

Pipeline Tweaks, New Commands, and packaging fixes

12 Sep 10:37
Compare
Choose a tag to compare

This pasteur release tweaks pipeline generation to better segment ingestion and synthesis.

It introduces the new commands ingest_dataset (or id) and ingest_view (or iv) which only perform the dataset and view ingest steps. This makes it easier to iterate on creating new datasets and new views by only re-running their ingest code.

Now by default pipe won't perform the view ingestion steps, which may be cumbersome for out-of-core datasets, and will begin from filtering onward (pipe --all will still run the whole pipeline).

A new view option is introduced: fit_global, which allows for fitting the transformers and encoders in the whole view (at the cost of increased overhead), which fixes issues with rare categorical values not being recognized due to be missing from the work set.

Two bugs were also fixed: TabularDataset required pandas but it wasn't imported and the mlflow default style was not packaged in the pypi package.

Out-of-core overhaul and new event data support

21 Aug 13:06
Compare
Choose a tag to compare

This new release overhauls and standardizes Pasteur's API to prepare it for multi-modal data synthesis. In addition, it fixes some of its rough parts, by making the process of fitting Encodings, Transformations, and Metrics out-of-core through a map-reduce architecture.

For transforming event data, a new type of Transformer, Seq(uence) Transformer is added. This transformer is multi-table aware and can, for example, encode inter-row references (such as a date of #3 row for patient X having a dependency on #2 row). A built-in implementation of this transformer, named SeqTransformerWrapper (accessed through the name seq), contains the necessary joining logic to wrap existing reference transformers into supporting this format.

The new mimic_core view in extras is provided as a proof of concept for this new transformation format, which contains the three core tables of mimic (patients, admissions, and transfers).

0.1.1

17 May 13:00
Compare
Choose a tag to compare

This version correctly packages .yml and template files in the pypi package.

Initial Release

03 Mar 15:02
Compare
Choose a tag to compare

Initial release for pasteur. Pasteur can now be installed with pip and offers a working template for data synthesis.