Skip to content

Releases: epigen/unsupervised_analysis

v2.0.0 - Performance improvements

30 Jun 13:54
Compare
Choose a tag to compare

Enhancements and new features

  • PCA: To improve performance n_components and svd_solver can be configured.
  • Heatmap: performance improvements
    • distance matrix calculation done by pdist from scipy and parallelized for observations and features
    • hierarchical clustering using fastcluster
    • observations can be downsampled using configuration n_observations
    • top features can be selected by variability using configuration n_features

The documentation was updated accordingly.

Bug fixes and other performance improvements are not mentioned.

Full Changelog: v1.1.0...v2.0.0

v1.1.0 - small enhancements and bug fixes

25 Jun 09:48
Compare
Choose a tag to compare

Enhancements and new features

  • Additional PCA diagnostics: Visualization of the top 10 loadings per principal component using lollipop plots.
  • Internal cluster index calculation optional (very compute intensive).
  • Enable plotting of all features using the keyword "ALL".
  • Enhance Snakemake report using labels.
  • Switch from panels to solo plots.
  • Switch to data.table usage for accelerated read/write in R.

The documentation was updated accordingly.

Bug fixes and performance improvements are not mentioned.

Full Changelog: v1.0.1...v1.1.0

v1.0.1 - update author ORCID

08 Oct 12:30
Compare
Choose a tag to compare

v1.0.0 - unsupervised analysis now includes cluster analysis methods

04 Oct 08:10
Compare
Choose a tag to compare

enhancements

  • added a config flag for 2D plot coord_fixed() option

new features

  • Clustering
    • Leiden algorithm
    • Clustification: an ML-based clustering approach that iteratively merges clusters based on misclassification
  • Clustree analysis and visualization
  • Cluster Validation
    • External cluster indices are determined by comparing all clustering results with all categorical metadata
    • Internal cluster indices are determined for each clustering and [metadata_of_interest]
    • Multiple-criteria decision-making (MCDM) using TOPSIS for ranking clustering results by internal indices
  • Visualization
    • all clustering results as 2D and interactive 2D & 3D plots for all available embedings/projections.
    • external cluster indices as hierarchically clustered heatmaps, aggregated in one panel.
    • internal cluster indices as one heatmap with clusterings and selected metadata sorted by TOPSIS ranking from top to bottom and split cluster indices split by type (cost/benefit functions to be minimized/maximized).

documentation

  • add scRNA-seq analysis section to the documentation
  • update the documentation accordingly (Software, Methods, Features, Examples)
  • update report to include all new feature outputs
  • update rulegraph

Bug fixes and performance improvements are not mentioned.

Full Changelog: v0.2.0...v1.0.0

v0.2.0 - enhancements, new features and a full example added

12 Oct 13:45
Compare
Choose a tag to compare

enhancements

  • 2D metadata plots: up to 10 columns per row, coordinates are fixed on both axes, numeric color scheme blue to red with midpoint 0 in grey

new features

  • 2D feature plots: specify features of interest, which values from the data, will be highlighted in the 2D plots (motivated by bioinformatics highlighting expression levels of marker genes)
  • densMAP support: local density preserving regularization as an additional dimensionality reduction method
  • additional PCA diagnostics:
    • pairs: sequential pair-wise PCs for up to 10 PCs using scatter- and density-plots colored by metadata_of_interest
    • loadings: showing the magnitude and direction of the 10 most influential features for each PC combination
  • interactive 2D and 3D visualizations (self-contained HTML files) of all projections and embeddings including widgets to color by categorical and numerical metadata, respectively
  • hierarchically clustered heatmaps of scaled data (z-score) with configured distance metrics and clustering methods (all combinations are computed), and annotated with metadata_of_interest

documentation

  • add a minimal example, using the digits dataset from sklearn, to show configuration, results, and report (.test/ folder)
  • update the documentation accordingly (Software, Methods, Features, Examples)
  • update report to include all new feature outputs (apart from interactive plots)
  • update rulegraph

Bug fixes and performance improvements are not mentioned.

Full Changelog: v0.1.0...v0.2.0

v0.1.0 - first stable version with PCA, UMAP and 2D visualizations

22 Sep 13:25
Compare
Choose a tag to compare