This is a simple tutorial to get you started using toomanycells (à la Python).

If you you don't have toomanycells installed or you have an old version, run the following command without the comment tags (#). Note that we run it twice to make sure that the latest version is installed.

In [2]:
#!pip install toomanycells -U
#!pip install toomanycells -U

In [2]:
import os
import pandas as pd
import toomanycells
from toomanycells import TooManyCells as tmc

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


We are going to download the data from the peripheral blood mononuclear cells scanpy [tutorial](https://scanpy.readthedocs.io/en/stable/tutorials/basics/clustering-2017.html).

In [3]:
storage = "./demo_data"
if not os.path.exists(storage):
    os.makedirs(storage)
!curl https://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -o demo_data/pbmc3k_filtered_gene_bc_matrices.tar.gz
!cd demo_data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz    

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7443k  100 7443k    0     0  20.4M      0 --:--:-- --:--:-- --:--:-- 20.8M


Now we load the data using toomanycells.

In [4]:
source_dir = os.path.join(
    storage,
    "filtered_gene_bc_matrices/hg19/")
output = storage
tmc_obj = tmc(source_dir,
              output,
              input_is_matrix_market=True)

Loading data from .mtx file.
Note that we assume the format:
genes=rows and cells=columns.
Loading barcodes.tsv
Loading genes.tsv
AnnData object with n_obs × n_vars = 2700 × 32738
    obs: 'sp_cluster', 'sp_path'


To facilitate the visualization, we'll annotate our cells with an arbitrary cell annotation file. Conveniently, toomanycells already includes one.

In [5]:
cell_annotations = toomanycells.load_metadata_for_demo()

In [6]:
cell_annotations

Unnamed: 0,label
AAACATACAACCAC-1,D
AAACATTGAGCTAC-1,E
AAACATTGATCAGC-1,A
AAACCGTGCTTCCG-1,B
AAACCGTGTATGCG-1,G
...,...
TTTCGAACTCTCAT-1,B
TTTCTACTGAGGCA-1,I
TTTCTACTTCCTCG-1,E
TTTGCATGAGAGGC-1,E


In [7]:
#Column containing the cell annotations.
ca_column = "cell_annotations"
tmc_obj.update_cell_annotations(cell_annotations,
                                ca_column)

In [8]:
tmc_obj.run_spectral_clustering()

Normalizing rows.
The first iterations are typically slow.
However, they tend to become faster as 
the size of the partition becomes smaller.
Note that the number of iterations is
only an estimate.


100%|█████████████████████████████| 73/73 [00:21<00:00,  3.40it/s]

Elapsed time for clustering: 21.55 seconds.





In [9]:
tmc_obj.store_outputs()

DiGraph with 73 nodes and 72 edges
Elapsed time for plotting: 0.20 seconds.


Note that the AnnData object now possesses two new columns. The first one is "sp_cluster", which indicates the cluster membership for each cell. The second is "sp_path", which shows the path that has to be taken, starting from the root node 0, to reach the leaf node cluster.

In [10]:
tmc_obj.A.obs

Unnamed: 0_level_0,sp_cluster,sp_path,cell_annotations
cell,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AAACATACAACCAC-1,72,72/60/29/14/6/2/0,D
AAACATTGAGCTAC-1,56,56/27/13/6/2/0,E
AAACATTGATCAGC-1,65,65/48/23/11/5/2/0,A
AAACCGTGCTTCCG-1,35,35/17/8/3/1/0,B
AAACCGTGTATGCG-1,64,64/32/15/7/3/1/0,G
...,...,...,...
TTTCGAACTCTCAT-1,43,43/21/10/4/1/0,B
TTTCTACTGAGGCA-1,56,56/27/13/6/2/0,I
TTTCTACTTCCTCG-1,57,57/28/13/6/2/0,E
TTTGCATGAGAGGC-1,47,47/23/11/5/2/0,E


After running the following block, go to your browser and enter localhost:9987

In [None]:
#Path to too-many-cells-interactive
path_to_tmci = "/Users/javier/Documents/too-many-cells-interactive"
#Column for cell annotations
ca_column = "cell_annotations"
#Port to use
port = 9987
tmc_obj.visualize_with_tmc_interactive(path_to_tmci,
                                       ca_column,
                                       port)

Once the app is running, just type in your browser 
        localhost:9987
The app will start loading after pressing Enter.
Press Enter to continue ...
#0 building with "desktop-linux" instance using docker driver

#1 [node internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.29kB done
#1 DONE 0.0s

#2 [node internal] load metadata for docker.io/library/node:18.7-buster
#2 DONE 0.2s

#3 [node internal] load .dockerignore
#3 transferring context: 90B done
#3 DONE 0.0s

#4 [node  1/15] FROM docker.io/library/node:18.7-buster@sha256:347e3f2e173cfdc2b2dda9f2e5ca43eea4fb470b03bc54a07c8968742e2ba337
#4 DONE 0.0s

#5 [node internal] load build context
#5 transferring context: 4.51kB done
#5 DONE 0.0s

#6 [node 14/15] WORKDIR /usr/app/
#6 CACHED

#7 [node  6/15] COPY ./react ./react
#7 CACHED

#8 [node  8/15] RUN chown -R 501:20 /usr/app
#8 CACHED

#9 [node  4/15] RUN id -u 501 || useradd --create-home --shell /bin/bash -g 20 -u 501 tmc-user
#9 CACHED

#10 [node  3/15]

 Container too-many-cells-interactive-postgres-1  Running


TRUNCATE TABLE
yarn run v1.22.19
$ node dist/server.js
The app is running!


That's all folks.
Javier Ruiz-Ramirez @ UHN April 2024