# TreeAlign with clone labels as input
## Introduction
TreeAlign can also take in 1. a cell by gene `scRNA count matrix`, 2. cell by gene `scDNA copy number matrix` and 3. a `clone label` table which indicates the clone label for each cell in scDNA. TreeAlign assigns cells from the scRNA count matrix to a clone in scDNA. 

## Loading data

In [3]:
from treealign import CloneAlignClone
from treealign import CloneAlignTree

import pandas as pd
from Bio import Phylo

# scRNA read count matrix where each row represents a gene, 
# each column represents a cell
expr = pd.read_csv("../data/example_expr.csv", index_col=0)

# scDNA copy number matrix where each row represents a gene,
# each column represents a cell
cnv = pd.read_csv("../data/example_gene_cnv.csv", index_col=0)

# clone labels for each cell in scDNA
clone = pd.read_csv("../data/example_cell_clone.csv")

In [4]:
clone

Unnamed: 0,cell_id,clone_id
0,081LA_DLP_UNSORTED-128673A-R14-C25,C
1,081LA_DLP_UNSORTED-128673A-R14-C44,A
2,081LA_DLP_UNSORTED-128673A-R14-C46,A
3,081LA_DLP_UNSORTED-128673A-R14-C60,B
4,081LA_DLP_UNSORTED-128673A-R15-C12,C
...,...,...
175,081LA_DLP_UNSORTED-128673A-R32-C49,C
176,081LA_DLP_UNSORTED-128673A-R32-C50,A
177,081LA_DLP_UNSORTED-128673A-R32-C54,A
178,081LA_DLP_UNSORTED-128673A-R32-C56,C


## Running TreeAlign with clone labels

In [7]:
# construct CloneAlignTree object for data preprocessing
obj = CloneAlignClone(expr, cnv, clone, repeat=1)

# running TreeAlign to assign cells to phylogenetic subclades
obj.assign_cells_to_clones()

Start run clonealign for 3 clones:
cnv gene count: 498
expr cell count: 1000
Start Inference.

..................................................................................................................................................ELBO converged at iteration 147


(    cell_id clone_id
 0         0        C
 1         1        C
 2         2        A
 3         3        C
 4         4        B
 ..      ...      ...
 995     995        A
 996     996        B
 997     997        B
 998     998        A
 999     999        B
 
 [1000 rows x 2 columns],
          gene  gene_type_score
 0      NMNAT1     9.961327e-01
 1    C1orf127     4.503274e-10
 2      MAD2L2     9.990683e-01
 3    KIAA2013     9.865901e-01
 4       PLOD1     9.247180e-01
 ..        ...              ...
 493      ASPN     8.546441e-03
 494     CENPP     9.996878e-01
 495      ECM2     6.237036e-08
 496        GK     1.954867e-02
 497      TAB3     9.925644e-01
 
 [498 rows x 2 columns],
      0
 0    C
 1    C
 2    A
 3    C
 4    B
 ..  ..
 995  A
 996  B
 997  B
 998  A
 999  B
 
 [1000 rows x 1 columns],
                      0
 NMNAT1    9.961327e-01
 C1orf127  4.503274e-10
 MAD2L2    9.990683e-01
 KIAA2013  9.865901e-01
 PLOD1     9.247180e-01
 ...                ...
 ASPN

In [12]:
# to view more details about parameters you can customize when you run TreeAlign
help(CloneAlignClone)

Help on class CloneAlignClone in module treealign.clonealign_clone:

class CloneAlignClone(treealign.clonealign.CloneAlign)
 |  Method resolution order:
 |      CloneAlignClone
 |      treealign.clonealign.CloneAlign
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, expr, cnv, clone, normalize_cnv=True, cnv_cutoff=10, model_select='gene', repeat=10, min_clone_cell_count=20, min_clone_assign_prob=0.8, min_clone_assign_freq=0.7, min_consensus_gene_freq=0.2, max_temp=1.0, min_temp=0.5, anneal_rate=0.01, learning_rate=0.1, max_iter=400, rel_tol=5e-05)
 |      initialize CloneAlignClone object
 |      :param expr: expr read count matrix. row is gene, column is cell. (pandas.DataFrame)
 |      :param cnv: cnv matrix. row is gene, column is cell. (pandas.DataFrame)
 |      :param clone: groupings of cnv cells. (pandas.DataFrame)
 |      :param normalize_cnv: whether to normalized cnv matrix by min or not. (bool)
 |      :param cnv_cutoff: set cnv higher than cnv_c

## Getting results
The output of TreeAlign includes: 1. a table indicating the clone to which the cells in scRNA data are assigned. 2. for each gene, a score ranging between 0 and 1 reflecting dosage effects.

In [9]:
clone_assign, s_score, clone_assign_raw, s_score_raw = obj.generate_output()

In [10]:
# subclade assignment for each cell in scRNA data
clone_assign

Unnamed: 0,cell_id,clone_id
0,0,C
1,1,C
2,2,A
3,3,C
4,4,B
...,...,...
995,995,A
996,996,B
997,997,B
998,998,A


In [11]:
# the probability of having dosage effects for each gene
s_score

Unnamed: 0,gene,gene_type_score
0,NMNAT1,9.961327e-01
1,C1orf127,4.503274e-10
2,MAD2L2,9.990683e-01
3,KIAA2013,9.865901e-01
4,PLOD1,9.247180e-01
...,...,...
493,ASPN,8.546441e-03
494,CENPP,9.996878e-01
495,ECM2,6.237036e-08
496,GK,1.954867e-02
