New Sparse Tensor Decomposition Workflow

A Python package for sparse CP tensor decomposition of multi-species time series molecular data using L1 regularization and Optuna hyperparameter optimization.

Note: Output directories are automatically resolved relative to the project root to prevent duplicate output directories when commands are run from different locations.

Installation

Using uv (recommended)

# Navigate to the package directory
cd M-multi-species/scripts/20-new-sparse-tensor-decompositon-model

# Install package in editable mode
uv pip install -e .

Using pip (alternative)

# Navigate to the package directory
cd M-multi-species/scripts/20-new-sparse-tensor-decompositon-model

# Install dependencies
pip install numpy pandas scipy tensorly optuna typer matplotlib pyyaml

# The package can be run directly without installation using PYTHONPATH

Quick Start

Using uv (if installed)

# Navigate to the package directory
cd src

# 1. Build Tensor
uv run new-tensor build-tensor \
    --input-path ../data/vst_counts_matrix.csv \
    --aggregation-method median \
    --normalization zscore

# 2. Optimize Hyperparameters
uv run new-tensor optimize \
    --tensor-path output/tensor/tensor.npz \
    --n-trials 100 \
    --rank-min 4 \
    --rank-max 30

_see output/optimization/best_params.json for optimal rank._


# 3. Fit Model
uv run new-tensor fit \
    --tensor-path ../output/tensor/tensor.npz \
    --rank 9 \
    --non-negative

# 4. Export Results
uv run new-tensor export \
    --fit-dir output/fit \
    --mappings-dir output/tensor

Using direct Python execution (works without uv)

# Navigate to the package directory
cd M-multi-species/scripts/20-new-sparse-tensor-decompositon-model

# 1. Build Tensor
PYTHONPATH=src python -m new_tensor.cli build-tensor \
    --input-path data/vst_counts_matrix.csv \
    --aggregation-method median \
    --normalization zscore

# 2. Optimize Hyperparameters
PYTHONPATH=src python -m new_tensor.cli optimize \
    --tensor-path output/tensor/tensor.npz \
    --n-trials 50 \
    --rank-min 2 \
    --rank-max 8

# 3. Fit Model
PYTHONPATH=src python -m new_tensor.cli fit \
    --tensor-path output/tensor/tensor.npz \
    --rank 5 \
    --non-negative

# 4. Export Results
PYTHONPATH=src python -m new_tensor.cli export \
    --fit-dir output/fit \
    --mappings-dir output/tensor

Working Directory Requirements

All commands should be run from the src directory to ensure consistent output paths:

cd src
# Then run all commands as shown above

The package automatically resolves relative output paths (like output/tensor, output/optimization, etc.) relative to the project root, preventing duplicate output directories when commands are run from different locations.

Command Reference

`build-tensor`

Build 3D tensor from CSV data with replicate aggregation and normalization.

Options:

--input-path: Path to input CSV file (required)
--output-dir: Output directory (default: output/tensor)
--aggregation-method: Replicate aggregation method (default: median)
--normalization: Normalization method (default: zscore)
--min-expression: Minimum expression threshold (default: 1.0)
--min-variance-percentile: Variance percentile threshold (default: 10.0)
--config: Path to config YAML file

`optimize`

Run hyperparameter optimization using Optuna.

Options:

--tensor-path: Path to input tensor .npz file (required)
--output-dir: Output directory (default: output/optimization)
--n-trials: Number of optimization trials (default: 100)
--rank-min: Minimum rank to try (default: 2)
--rank-max: Maximum rank to try (default: 12)
--lambda-min: Minimum L1 penalty (default: 1e-4)
--lambda-max: Maximum L1 penalty (default: 1.0)
--config: Path to config YAML file

`fit`

Fit CP decomposition with specified parameters.

Options:

--tensor-path: Path to input tensor .npz file (required)
--output-dir: Output directory (default: output/fit)
--rank: Rank for decomposition (required)
--lambda-a: L1 penalty for gene factors (default: 0.1)
--lambda-b: L1 penalty for species factors (default: 0.1)
--lambda-c: L1 penalty for time factors (default: 0.1)
--non-negative: Enforce non-negativity (default: false)
--max-iter: Maximum iterations (default: 100)
--config: Path to config YAML file

`export`

Export results in various formats including plots.

Options:

--fit-dir: Directory containing fit results (required)
--mappings-dir: Directory containing tensor mappings (required)
--output-dir: Output directory (default: output/export)
--plot-heatmaps: Generate factor heatmaps (default: true)

Configuration

Use config.yaml to set default parameters:

input:
  file_path: "M-multi-species/output/14-pca-orthologs/vst_counts_matrix.csv"

preprocessing:
  aggregation_method: "median"
  normalization: "zscore"

optimization:
  n_trials: 100
  rank_min: 2
  rank_max: 12

fitting:
  rank: 6
  non_negative: false

Output Structure

output/
├── tensor/
│   ├── tensor.npz
│   ├── genes.csv
│   ├── species.csv
│   ├── timepoints.csv
│   └── tensor_shapes.json
├── optimization/
│   ├── optuna_study.pkl
│   ├── best_params.json
│   ├── trials.csv
│   └── top_trials.csv
├── fit/
│   ├── gene_factors.csv
│   ├── species_factors.csv
│   ├── time_factors.csv
│   └── fit_metrics.json
└── export/
    ├── gene_factors_with_ids.csv
    ├── species_factors_with_codes.csv
    ├── time_factors_with_labels.csv
    ├── decomposition_results.json
    └── factor_heatmaps.png

Algorithm Details

Sparse CP Decomposition

The package implements sparse CP decomposition using Alternating Least Squares (ALS) with L1 regularization:

min 0.5‖X - [A,B,C]‖² + λ_A‖A‖₁ + λ_B‖B‖₁ + λ_C‖C‖₁

Where:

X ∈ ℝ^{G×S×T}: Gene expression tensor
[A,B,C]: CP decomposition with factors A (genes), B (species), C (time)
λ_A, λ_B, λ_C: L1 penalty parameters for sparsity

Hyperparameter Optimization

Uses Optuna with cross-validation via random masking to optimize:

Rank R (number of components)
L1 penalties λ_A, λ_B, λ_C
Non-negativity constraints

Alternative Execution Methods

If uv is not available

If you don't have uv installed, you can run the workflow using direct Python execution:

# Navigate to the package directory
cd src

# Run commands with PYTHONPATH set
PYTHONPATH=src python -m new_tensor.cli [COMMAND] [OPTIONS]

Troubleshooting

Module not found errors: Make sure you're in the correct directory and using PYTHONPATH=src
Command not found: If uv is not installed, use the direct Python execution method above
Permission errors: Ensure you have write permissions in the output directories

Dependencies

Core: numpy, pandas, scipy
Tensor operations: tensorly
Optimization: optuna
CLI: typer
Plotting: matplotlib, seaborn
Configuration: pyyaml

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
output		output
src/new_tensor		src/new_tensor
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

New Sparse Tensor Decomposition Workflow

Installation

Using uv (recommended)

Using pip (alternative)

Quick Start

Using uv (if installed)

Using direct Python execution (works without uv)

Working Directory Requirements

Command Reference

`build-tensor`

`optimize`

`fit`

`export`

Configuration

Output Structure

Algorithm Details

Sparse CP Decomposition

Hyperparameter Optimization

Alternative Execution Methods

If uv is not available

Troubleshooting

Dependencies

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

sr320/workflow-stdm-02

Folders and files

Latest commit

History

Repository files navigation

New Sparse Tensor Decomposition Workflow

Installation

Using uv (recommended)

Using pip (alternative)

Quick Start

Using uv (if installed)

Using direct Python execution (works without uv)

Working Directory Requirements

Command Reference

build-tensor

optimize

fit

export

Configuration

Output Structure

Algorithm Details

Sparse CP Decomposition

Hyperparameter Optimization

Alternative Execution Methods

If uv is not available

Troubleshooting

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`build-tensor`

`optimize`

`fit`

`export`

Packages