This notebook aims to test the "unwrapping" function for model param dictionaries. The goal is to convert the dictionary back to a tensor vector so that it can be directly input into the likelihood function we have.

In [2]:
import anndata
import os
import requests

save_path = "data/example_sce.h5ad"
if not os.path.exists(save_path):
    response = requests.get("https://go.wisc.edu/69435h")
    with open(save_path, "wb") as f:
        f.write(response.content)

example_sce = anndata.read_h5ad(save_path)
example_sce

AnnData object with n_obs × n_vars = 2087 × 100
    obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'cell_type', 'sizeFactor', 'pseudotime'
    var: 'highly_variable_genes'
    uns: 'X_name', 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
    obsm: 'PCA', 'UMAP', 'X_pca', 'X_umap'
    layers: 'counts', 'cpm', 'logcounts', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances'

I define the "unwrapper" function considering:

`params` - the input parameter dictionary;

`param_order` - in case the order of parameters in the dictionary is different from how we used to calculate the likelihood;

`transform` - in case the output parameters in the dictionary is a result of some transformation (for example, the `pi` output for zero inflated models is transformed by `torch.Sigmoid`).

In [35]:
import torch

def unwrapper(params, param_order=None, transform=None):
    l = []
    if param_order is None:
        for idx, k in enumerate(params):
            if k != "covariance":
                n_feature = params[k].shape[0]
                n_outcomes = params[k].shape[1]
                v = torch.Tensor(params[k].values).reshape(1, n_feature*n_outcomes)[0]
                if transform is not None: v = transform[idx](v)
                l.append(v)
    else:
        for idx in range(len(param_order)):
            n_feature = params[param_order[idx]].shape[0]
            n_outcomes = params[param_order[idx]].shape[1]
            v = torch.Tensor(params[param_order[idx]].values).reshape(1, n_feature*n_outcomes)[0]
            if transform is not None: v = transform[idx](v)
            l.append(v)
    return torch.cat(l, dim=0)

# NegBin param unwrapper

In [15]:
from scdesigner.simulators import NegBinCopulaSimulator

In [16]:
formula = "~ bs(pseudotime, degree=5)"
nbcopula = NegBinCopulaSimulator()
nbcopula.fit(example_sce, formula)

  return self.list[idx]
                                                                                  

According to our formula definition, we have the first 600 params as the `beta` variable and the last 100 params as the `gamma` variable. We'll compare both. The values in the tensor vector are saved in `data/nbreg.csv` earlier when we called model.fit (specifically, it's saved in the postprocessor before we turn the vector into a dictionary.)

Here we compared the `beta`:

In [40]:
import pandas as pd

params = pd.DataFrame(unwrapper(nbcopula.params).numpy())
params[0:600]

Unnamed: 0,0
0,1.928733
1,2.010292
2,1.410292
3,1.626484
4,2.940444
...,...
595,0.404076
596,0.387220
597,-1.464699
598,4.736888


In [39]:
df = pd.read_csv('data/nbreg.csv', header=None)
df[0:600]

Unnamed: 0,0
0,1.928733
1,2.010292
2,1.410292
3,1.626484
4,2.940444
...,...
595,0.404076
596,0.387220
597,-1.464699
598,4.736888


and `gamma`:

In [41]:
params[600:]

Unnamed: 0,0
600,-1.440749
601,-1.537884
602,-1.055986
603,-0.054716
604,-0.870956
...,...
695,0.598124
696,1.490004
697,2.054415
698,-1.823017


In [42]:
df[600:]

Unnamed: 0,0
600,-1.440749
601,-1.537884
602,-1.055986
603,-0.054716
604,-0.870956
...,...
695,0.598124
696,1.490004
697,2.054415
698,-1.823017


# Poisson unwrapper

I tested unwrapper on Poisson model as well.

In [4]:
from scdesigner.simulators import PoissonCopulaSimulator

In [5]:
poisson = PoissonCopulaSimulator()
poisson.fit(example_sce)

  return self.list[idx]
  return self.list[idx]
                                                                      

In [43]:
params = pd.DataFrame(unwrapper(poisson.params).numpy())
params

Unnamed: 0,0
0,-18.075312
1,-18.096006
2,-18.332464
3,-18.243290
4,-18.307772
...,...
95,-19.268106
96,-18.950972
97,-19.028061
98,-19.610620


In [44]:
df = pd.read_csv('data/poi.csv', header=None)
df

Unnamed: 0,0
0,-18.075312
1,-18.096006
2,-18.332464
3,-18.243290
4,-18.307772
...,...
95,-19.268106
96,-18.950972
97,-19.028060
98,-19.610620


# ZINB unwrapper

In [20]:
from scdesigner.simulators import ZeroInflatedNegBinRegressionSimulator

In [22]:
zinb = ZeroInflatedNegBinRegressionSimulator()
zinb.fit(example_sce, formula)

                                                                                 

In the zero inflated model case, we'll need to transform the dictionary values since in the post processor, the `gamma` values are exponentiated, and we also applied sigmoid to the `pi` values.

Here we compared the `beta`:

In [48]:
params = pd.DataFrame(unwrapper(zinb.params, transform = [lambda x: x, torch.log, torch.logit]).numpy())
params[0:600]

Unnamed: 0,0
0,-9.406925
1,-9.182685
2,-5.033072
3,-6.829961
4,-8.091279
...,...
595,-8.027226
596,-8.926693
597,-9.738951
598,0.481444


In [49]:
df = pd.read_csv('data/zinb.csv', header=None)
df[0:600]

Unnamed: 0,0
0,-9.406925
1,-9.182685
2,-5.033072
3,-6.829961
4,-8.091279
...,...
595,-8.027226
596,-8.926693
597,-9.738951
598,0.481444


`gamma`:

In [50]:
params[600:700]

Unnamed: 0,0
600,2.314165
601,2.346639
602,1.957201
603,2.072699
604,1.939836
...,...
695,3.118830
696,2.990576
697,3.128190
698,5.410685


In [51]:
df[600:700]

Unnamed: 0,0
600,2.314165
601,2.346639
602,1.957202
603,2.072699
604,1.939835
...,...
695,3.118830
696,2.990576
697,3.128190
698,5.410685


and `pi`:

In [52]:
params[700:]

Unnamed: 0,0
700,11.972087
701,11.881935
702,13.862943
703,13.377435
704,11.044529
...,...
795,11.535656
796,11.442565
797,10.836420
798,-20.379805


In [53]:
df[700:]

Unnamed: 0,0
700,11.977303
701,11.877859
702,13.899811
703,13.413737
704,11.041703
...,...
795,11.538200
796,11.446466
797,10.838930
798,-20.379805
