KernelBiome package

The KernelBiome python package can be installed via

pip install kernelbiome

or

python -m pip install git+https://github.com/shimenghuang/KernelBiome.git

Small usage example:

import numpy as np
from kernelbiome.kernelbiome import KernelBiome

# Simulated some data
n = 100
X1 = np.random.normal(0, 1, n)
X2 = np.random.normal(0, 1, n)
X3 = np.random.normal(0, 1, n)
X4 = np.random.normal(0, 1, n)
X = np.exp(np.c_[X1, X2, X3, X4])
X /= X.sum(axis=1)[:, None]
y = 5*(X[:, 0]+X[:, 1])/(X[:, 0]+X[:, 1]+X[:, 2]) + np.random.normal(0, 1, n)/2

# Fit KernelBiome
models = {
    'linear': None,
    'aitchison': {'c': np.logspace(-7, -3, 5)},
}
KB = KernelBiome(kernel_estimator='KernelRidge',
                 center_kmat=True,
                 models=models, # `models=None` for using all default models
                 verbose=1)
KB.fit(X, y)

# Calculate mean squared error
MSE = np.sqrt(np.mean((KB.predict(X) - y)**2))

For a complete usage example, see kernelbiome_illustration.py

Reproducible Code

This repository contains the python package KernelBiome and code that can reproduce results in the paper Supervised Learning and Model Analysis with Compositional Data (Huang et al., 2022).

All scripts producing results in the paper can be found in the experiments folder with some helper functions for the experiment scripts located in the helpers folder. Scripts starting with "run_" are used to run computation and save results, and scripts starting with "summarize_" are used to load and summarize results in e.g. figures. data_original and data_processed are folder to place the original and to save the processed datasets respectively. See README files therein for details.

`prediction`

Prediction comparison on the 33 publicly available datasets on classification and regression.

`post_analysis`

Post-analysis including CFI and kernel PCA for two of the public datasets, cirrhosis and centralpark.

`tree_visualization`

Visualization of CFI base on weighted and unweighted KernelBiome.

`consistency`

Simulation to show consistency results in the paper.

`toy_examples`

log_contrast_example.py: Illustration of CFI and CPD in the case of log contrast model using simulated data.

rescale_matters_example.py: Comparison of CFI and CPD with relative influence (RI) and partial dependency plot (PDP) based on simulated data.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
build/lib/kernelbiome		build/lib/kernelbiome
data_original/scripts		data_original/scripts
data_processed/scripts		data_processed/scripts
dist		dist
experiments		experiments
helpers		helpers
kernelbiome		kernelbiome
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
kernelbiome_illustration.py		kernelbiome_illustration.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KernelBiome package

Reproducible Code

`prediction`

`post_analysis`

`tree_visualization`

`consistency`

`toy_examples`

About

Releases

Packages

Languages

License

shimenghuang/KernelBiome

Folders and files

Latest commit

History

Repository files navigation

KernelBiome package

Reproducible Code

prediction

post_analysis

tree_visualization

consistency

toy_examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`prediction`

`post_analysis`

`tree_visualization`

`consistency`

`toy_examples`

Packages