This is a script to compare the scdesigner AIC/BIC result to the scDesign3 AIC/BIC result.

In [1]:
import anndata
import os
import requests

save_path = "data/example_sce.h5ad"
if not os.path.exists(save_path):
    response = requests.get("https://go.wisc.edu/69435h")
    with open(save_path, "wb") as f:
        f.write(response.content)

example_sce = anndata.read_h5ad(save_path)
example_sce

AnnData object with n_obs × n_vars = 2087 × 100
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    uns: 'X_name', 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'PCA', 'UMAP', 'X_pca', 'X_umap'
    layers: 'counts', 'cpm', 'logcounts', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

# scDesign3 impelentation

I will first calculate the AIC/BIC in scDesign3 under similar model setup.

In [2]:
%load_ext rpy2.ipython

In [3]:
%%R
library(scDesign3)
library(ggplot2)
library(zellkonverter)
theme_set(theme_bw())

example_sce <- readH5AD("data/example_sce.h5ad")


    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.
    ℹ Using stored X_name value 'X'


Registered S3 method overwritten by 'scDesign3':
  method         from  
  predict.gamlss gamlss
Registered S3 method overwritten by 'zellkonverter':
  method                                             from      
  py_to_r.pandas.core.arrays.categorical.Categorical reticulate


In [4]:
%%R
example_simu <- scdesign3(
  sce = example_sce,
  assay_use = "counts",
  celltype = "cell_type",
  pseudotime = "pseudotime",
  spatial = NULL,
  other_covariates = NULL,
  mu_formula = "s(pseudotime, k = 6, bs = 'cr')",
  sigma_formula = "1",
  family_use = "nb",
  n_cores = 2,
  usebam = FALSE,
  corr_formula = "1",
  copula = "gaussian",
  DT = TRUE,
  pseudo_obs = FALSE,
  return_model = FALSE,
  nonzerovar = FALSE
) 

Input Data Construction Start
Input Data Construction End
Start Marginal Fitting
Marginal Fitting End
Start Copula Fitting
Convert Residuals to Multivariate Gaussian
Converting End
Copula group 1 starts
Copula Fitting End
Start Parameter Extraction
Parameter
Extraction End
Start Generate New Data
Use Copula to sample a multivariate quantile matrix
Sample Copula group 1 starts
New Data Generating End


And below are the marginal and gaussian copula AIC/BIC calculated directly from scDesign3.

In [5]:
%%R
print(paste0("marginal AIC: ", example_simu$model_aic[1]))
print(paste0("marginal BIC: ", example_simu$model_bic[1]))
print(paste0("copula AIC: ", example_simu$model_aic[2]))
print(paste0("copula BIC: ", example_simu$model_bic[2]))

[1] "marginal AIC: 778898.707033089"
[1] "marginal BIC: 782773.03646551"
[1] "copula AIC: -8846.23760782392"
[1] "copula BIC: 19089.0027822082"


# scdesigner AIC/BIC example

Now we are going to extract AIC/BIC from our scdesigner models.

In [6]:
from scdesigner.simulators import NegBinCopulaSimulator

In [7]:
formula = "~ bs(pseudotime, degree=5)"
nbcopula = NegBinCopulaSimulator()
nbcopula.fit(example_sce, formula)

                                                           

Marginal and copula AIC and BIC are included as instances within each simulator class. The AIC/BIC only using previous mean formula gives the same result as before.

In [8]:
print("Marginal AIC:", nbcopula.marginal_aic)
print("Marginal AIC:", nbcopula.marginal_bic)
print("Copula AIC:", nbcopula.copula_aic)
print("Copula BIC:", nbcopula.copula_bic)

Marginal AIC: 793849.9375
Marginal AIC: 797800.375
Copula AIC: -9314.95684798737
Copula BIC: 18620.283542044774


I will then compare the marginal AIC/BIC results on other distributions with generated datasets.

# Poisson example

In [9]:
from scdesigner.simulators import PoissonCopulaSimulator
from scipy.stats import poisson
import numpy as np
import pandas as pd
import anndata

In [10]:
n_sample, n_gene, n_feature = 2000, 20, 2
X1 = np.random.normal(size=(n_sample, n_feature)) # covariates
ground_truth = np.random.normal(size=(n_feature, n_gene)) # feature x gene
beta = np.exp(X1 @ ground_truth) # cell x gene

# generate samples
Y = poisson(beta).rvs()
obs = pd.DataFrame(X1, columns=[f"dim{j}" for j in range(n_feature)]) # cell x feature
adata = anndata.AnnData(X=Y, obs=obs)
adata

AnnData object with n_obs × n_vars = 2000 × 20
    obs: 'dim0', 'dim1'

In [11]:
formula = "~ dim0 + dim1 - 1"
pcopula = PoissonCopulaSimulator()
pcopula.fit(adata, formula)

print("marginal AIC:", pcopula.marginal_aic)
print("marginal BIC:", pcopula.marginal_bic)

                                                           

marginal AIC: 107687.21875
marginal BIC: 107911.2578125


In [12]:
beta_hat = np.exp(X1 @ pcopula.params['coef_mean'].values)
log_likelihood = np.sum(poisson(beta_hat).logpmf(Y))
k = n_feature * n_gene # nparams

print("marginal AIC:", 2 * k - 2 * log_likelihood)
print("marginal BIC:", k * np.log(n_sample) - 2 * log_likelihood)

marginal AIC: 107687.29176784499
marginal BIC: 107911.32786622667


# ZI-NB example

In [13]:
from scdesigner.simulators import ZeroInflatedNegBinCopulaSimulator
from scipy.stats import nbinom, bernoulli
from scipy.special import expit

In [14]:
n, g, d, p, z = 2000, 20, 2, 2, 2
X1 = np.random.normal(size=(n, d)) # 
X2 = np.random.normal(size=(n, p)) #  
X3 = np.random.normal(size=(n, z)) #  
B = np.random.normal(size=(d, g)) # feature x gene
D = np.random.normal(size=(p, g)) # feature x gene
Z = np.random.normal(size=(z, g)) # feature x gene
mu = np.exp(X1 @ B) # cell x gene
r = np.exp(X2 @ D) # cell x gene
pi = expit(X3 @ Z) # cell x gene

# generate samples
Y = nbinom(n=r, p=r / (r + mu)).rvs() * bernoulli(1 - pi).rvs()

X_1 = pd.DataFrame(X1, columns=[f"mean_dim{j}" for j in range(d)]) # cell x feature
X_2 = pd.DataFrame(X2, columns=[f"dispersion_dim{j}" for j in range(p)]) # cell x feature
X_3 = pd.DataFrame(X3, columns=[f"zero_inflation_dim{j}" for j in range(z)]) # cell x feature
obs = pd.concat([X_1, X_2, X_3], axis=1)

adata = anndata.AnnData(X=Y, obs=obs)

In [15]:
formula = {"mean": "~ mean_dim0 + mean_dim1 - 1", 
           "dispersion": "~ dispersion_dim0 + dispersion_dim1 - 1",
           "zero_inflation": "~ zero_inflation_dim0 + zero_inflation_dim1 - 1"}
zinb = ZeroInflatedNegBinCopulaSimulator()
zinb.fit(adata, formula)

print("marginal AIC:", zinb.marginal_aic)
print("marginal BIC:", zinb.marginal_bic)

                                                          

marginal AIC: 72666.8359375
marginal BIC: 73338.9453125


In [16]:
mu_hat = np.exp(X1 @ zinb.params['coef_mean'].values) # cell x gene
r_hat = np.exp(X2 @ zinb.params['coef_dispersion'].values) # cell x gene
pi_hat = expit(X3 @ zinb.params['coef_zero_inflation'].values) # cell x gene

log_likelihood = np.sum(
    np.log(pi_hat * (Y == 0) + (1 - pi_hat) * nbinom(n=r_hat, p=r_hat / (r_hat + mu_hat)).pmf(Y) + 1e-10))
k = g * (d+p+z) #

print("marginal AIC:", 2 * k - 2 * log_likelihood)
print("marginal BIC:", k * np.log(n) - 2 * log_likelihood)

marginal AIC: 72666.84471199854
marginal BIC: 73338.95300714359


# ZI-P example

In [17]:
from scdesigner.simulators import ZeroInflatedPoissonRegressionSimulator
from scipy.stats import poisson, bernoulli
from scipy.special import expit

In [18]:
n_sample, n_gene, n_feature1, n_feature2 = 2000, 20, 2, 3
X1 = np.random.normal(size=(n_sample, n_feature1)) # beta covariates
X2 = np.random.normal(size=(n_sample, n_feature2)) # zero-inflation covariates
gt_beta = np.random.normal(size=(n_feature1, n_gene))
gt_pi = np.random.normal(size=(n_feature2, n_gene))
beta = np.exp(X1 @ gt_beta)
pi = 1 / (1 + np.exp(-(X2 @ gt_pi)))

# generate samples
Y = poisson(beta).rvs() * bernoulli(1 - pi).rvs()
obs1 = pd.DataFrame(X1, columns=[f"beta_dim{j}" for j in range(n_feature1)])
obs2 = pd.DataFrame(X2, columns=[f"pi_dim{j}" for j in range(n_feature2)])
obs = pd.concat([obs1, obs2], axis=1)
adata = anndata.AnnData(X=Y, obs=obs)
adata

AnnData object with n_obs × n_vars = 2000 × 20
    obs: 'beta_dim0', 'beta_dim1', 'pi_dim0', 'pi_dim1', 'pi_dim2'

In [19]:
formula = {"mean": "~ beta_dim0 + beta_dim1 - 1",
           "zero_inflation": "~ pi_dim0 + pi_dim1 + pi_dim2 - 1"}
zip = ZeroInflatedPoissonRegressionSimulator()
zip.fit(adata, formula)

print("marginal AIC:", zip.marginal_aic)
print("marginal BIC:", zip.marginal_bic)

                                                           

marginal AIC: 74802.5078125
marginal BIC: 75362.6015625


In [20]:
beta_hat = np.exp(X1 @ zip.params['coef_mean'].values) # cell x gene
pi_hat = expit(X2 @ zip.params['coef_zero_inflation'].values) # cell x gene

log_likelihood = np.sum(
    np.log(pi_hat * (Y == 0) + (1 - pi_hat) * poisson(beta_hat).pmf(Y) + 1e-10))
k = n_gene * (n_feature1+n_feature2) # nparams

print("marginal AIC:", 2 * k - 2 * log_likelihood)
print("marginal BIC:", k * np.log(n_sample) - 2 * log_likelihood)

marginal AIC: 74802.51191463022
marginal BIC: 75362.60216058443


# Bernoulli example

In [21]:
from scdesigner.simulators import BernoulliCopulaSimulator

In [22]:
n_sample, n_gene, n_feature1 = 2000, 20, 2
X1 = np.random.normal(size=(n_sample, n_feature1)) # covariates
ground_truth = np.random.normal(size=(n_feature1, n_gene)) # feature x gene
beta = expit(X1 @ ground_truth) # cell x gene


# generate samples
Y = bernoulli(beta).rvs()
obs = pd.DataFrame(X1, columns=[f"dim{j}" for j in range(n_feature1)]) # cell x feature
adata = anndata.AnnData(X=Y, obs=obs)

In [23]:
formula = "~ dim0 + dim1 - 1"
bsim = BernoulliCopulaSimulator()
bsim.fit(adata, formula)

print("marginal AIC:", bsim.marginal_aic)
print("marginal BIC:", bsim.marginal_bic)

                                                           

marginal AIC: 43508.49609375
marginal BIC: 43732.53125


In [24]:
beta_hat = expit(X1 @ bsim.params['coef_mean'].values)
log_likelihood = np.sum(bernoulli(beta_hat).logpmf(Y))
k = n_feature * n_gene # nparams

print("marginal AIC:", 2 * k - 2 * log_likelihood)
print("marginal BIC:", k * np.log(n_sample) - 2 * log_likelihood)

marginal AIC: 43508.494435482266
marginal BIC: 43732.53053386395


We can see the scdesigner AIC/BIC aligns with AIC/BIC we calculated by hand (using scipy.stats).