### Decoupled Data Parsers

In [None]:
import anndata
import os
import requests

save_path = "data/example_sce.h5ad"
if not os.path.exists(save_path):
    response = requests.get("https://go.wisc.edu/69435h")
    with open(save_path, "wb") as f:
        f.write(response.content)

example_sce = anndata.read_h5ad(save_path)
example_sce

In [None]:
import scdesigner.experimental.data as dt

f = {"mu": "~ pseudotime", "alpha": "~ 1"}
dl = dt.FormulaLoader(example_sce, f, batch_size=1000)
y, x  = next(iter(dl.loader))
print(dl.names)
print(y, x)

### Generic Estimators

In [None]:
import scdesigner.experimental.estimators as est

dl = dt.FormulaLoader(example_sce, f, batch_size=1000)
ml = est.NegativeBinomialML({"lr": 0.01, "max_epochs": 10})
parameters = ml.estimate(dl.loader)

In [None]:
import scdesigner.experimental.samplers as sam

sampler = sam.NegativeBinomialSampler(parameters)
sampler.sample(dl.loader)

The more realistic case for sampling is when the loader only has covariate information, not the original training Y gene count assay.

In [None]:
dl_ = dt.FormulaLoader(example_sce.obs, f, batch_size=1000)
sampler.sample(dl_.loader)

In some cases, we might want to sample an anndata object directly. We can do this by decorating the original sampler with variable/predictor information.

In [None]:
sampler_ann = sam.anndata_sample_n(sampler, dl.names[0], dl.names[1])
adata = sampler_ann.sample(dl.loader)
adata

Alternatively, we can provide the obs over which to sample, together with the formula used to load obs for the model.

In [None]:
sampler_ann = sam.anndata_sample_l(sampler, f)
adata = sampler_ann.sample(example_sce.obs)
adata

Here is an example of estimating a copula model.

In [None]:
dl = dt.FormulaLoader(example_sce, {"mu": "~ pseudotime", "alpha": "~ 1"}, batch_size=1000)
copula = est.NegativeBinomialCopulaEstimator({"max_epochs": 10})
parameters = copula.estimate(dl.loader)
parameters["covariance"].shape

We can sample from the fitted model.

In [None]:
sampler = sam.NegativeBinomialCopulaSampler(parameters)
y = sampler.sample(dl.loader)
y[:4, :]

### Negative Controls

Here is a way of defining loaders with different covariates for different subsets of genes.

In [None]:
sc1 = example_sce[:, :20].copy()
sc2 = example_sce[:, 20:].copy()

dl = dt.CompositeFormulaLoader([sc1, sc2], [{"mu": "~ pseudotime", "alpha": "~ 1"}, {"mu": "~ 1", "alpha": "~ 1"}], batch_size=1000)

y, x = next(iter(dl.loader[0]))
print(y.shape)
print(x)
y, x = next(iter(dl.loader[1]))
print(y.shape)
print(x)

Now names is a list of tuples, each with gene names matched with regression parameters.

In [None]:
print(dl.names)

Now we can just loop over estimators for each subset of genes. We could provide a list of estimators if we want different model families.

In [None]:
ml = est.CompositeEstimator(est.NegativeBinomialML, {"lr": 0.01, "max_epochs": 10})
parameters = ml.estimate(dl.loader)

Sampling similarly loops over loader elements.

In [None]:
sampler = sam.CompositeSampler(parameters, sam.NegativeBinomialSampler)
samples = sampler.sample(dl.loader)
[s.shape for s in samples]

We can remove the observed counts and only work with covariates.

In [None]:
dl = dt.CompositeFormulaLoader([sc1.obs, sc2.obs], [{"mu": "~ pseudotime", "alpha": "~ 1"}, {"mu": "~ 1", "alpha": "~ 1"}], batch_size=1000)
sampler = sam.CompositeSampler(parameters, sam.NegativeBinomialSampler)
samples = sampler.sample(dl.loader)
[s.shape for s in samples]

We can also split genes in a dataset that's backed on disk. Note that we need to copy into separate subsets, because we need genuine anndata as input, not just views.