# Usage

The following shows how you can implement this pretty easily. Installation is fairly simple through github (PyPI incoming).

In [None]:
!pip install git+https://github.com/jhaberbe/chinese_restaurant_process

Collecting git+https://github.com/jhaberbe/crp
  Cloning https://github.com/jhaberbe/crp to /private/var/folders/2n/j06nrn2n7r524t776sngh0xr0000gr/T/pip-req-build-u3m05wha
  Running command git clone --filter=blob:none --quiet https://github.com/jhaberbe/crp /private/var/folders/2n/j06nrn2n7r524t776sngh0xr0000gr/T/pip-req-build-u3m05wha
  Resolved https://github.com/jhaberbe/crp to commit b35ae312829e2391ae6ecbdbf1169f2cd8858fc6
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


# Using the module

Really, the front facing portion is the ChineseRestaurantProcess object. Just pull that out, load your matrix that you want to perform inference on, and then ".run".

You can use a subset of the dataset (1/50th in my experience), and still get fairly robust results!

In [None]:
import scanpy as sc
from crp.dirichlet import ChineseRestaurantProcess

adata = sc.read_h5ad("/Users/jameshaberberger/GitHub/chinese-restaurant-process/data/adata.h5ad")
adata = adata[adata.X.sum(axis=1) > 100]

crp = ChineseRestaurantProcess(adata[::100].X.todense())
crp.run(epochs=10)

100%|██████████| 1232/1232 [00:01<00:00, 657.70it/s]
100%|██████████| 1232/1232 [00:02<00:00, 446.48it/s]
100%|██████████| 1232/1232 [00:02<00:00, 419.64it/s]
100%|██████████| 1232/1232 [00:02<00:00, 413.71it/s]
100%|██████████| 1232/1232 [00:02<00:00, 416.21it/s]
100%|██████████| 1232/1232 [00:02<00:00, 414.19it/s]
100%|██████████| 1232/1232 [00:02<00:00, 426.59it/s]
100%|██████████| 1232/1232 [00:02<00:00, 423.19it/s]
100%|██████████| 1232/1232 [00:02<00:00, 422.40it/s]
100%|██████████| 1232/1232 [00:02<00:00, 424.26it/s]


# Label Transfer

Once you've predicted the classes, just assign labels using the predict function. I've included a minimum membership argument, so that you can remove classes with low membership from inference. However, if you wanted, you could always just set it to zero, and get the ful bayesian nonparametric inference experience..

In [7]:
# Predict labels for the full dataset
# You can adjust the `min_membership` parameter to control the minimum membership threshold for assigning labels.
# 1% works well for me. 0% is true CRP, but expect small clusters, which may or may not be representative of reality.
adata.obs["labels"] = crp.predict(adata.X.todense(), min_membership=0.01)

100%|██████████| 123130/123130 [01:04<00:00, 1896.44it/s]
  adata.obs["labels"] = crp.predict(adata.X.todense(), min_membership=0.01)
