# TreeAlign with allele specfic information
## Introduction
TreeAlign is a model for scRNA and scDNA integration. TreeAlign can take in either total copy number information or allele specific copy number information or both to assign cells from scRNA to a phylogenetic tree constructed with scDNA

## Loading data

In [1]:
from treealign import CloneAlignClone
from treealign import CloneAlignTree

import pandas as pd
from Bio import Phylo

In [2]:
# load total copy number input

# scRNA read count matrix where each row represents a gene, 
# each column represents a cell
expr = pd.read_csv("../data/example_expr.csv", index_col=0)

# scDNA copy number matrix where each row represents a gene,
# each column represents a cell
cnv = pd.read_csv("../data/example_gene_cnv.csv", index_col=0)

In [3]:
# load allele specific input

# b allele frequency matrix
# each row represents a snp
# each column represents a cell
# The number in the matrix is the b allele frequency at the given snp and cell
hscn = pd.read_csv("../data/example_snp_baf.csv", index_col=0)

# reference allele count matrix from scRNA
# each row represents a snp
# each column represents a cell
snv_allele = pd.read_csv("../data/example_snp_allele.csv", index_col=0)

# total count matrix at SNPs from scRNA
# each row represents a snp
# each column represents a cell
snv_total = pd.read_csv("../data/example_snp_total.csv", index_col=0)

In [4]:
# load phylogenetic tree used for clone assignment

# phylogenetic tree constructed with scDNA data in newick format
tree = Phylo.read("../data/example_hdbscan.newick", "newick")

## Running TreeAlign with tree

In [5]:
# construct CloneAlignTree object for data preprocessing
# run TreeAlign with both total copy number & allele specific datasets

# `repeat` is set to 1 here for demonstration purposes. it would be better to set `repeat` larger than 5. 
obj = CloneAlignTree(tree=tree, expr=expr, cnv=cnv, hscn=hscn, snv_allele=snv_allele, snv=snv_total, repeat=1)

# it is possible to run TreeAlign with total copy number data only
# obj = CloneAlignTree(tree=tree, expr=expr, cnv=cnv, repeat=1)

# it is also possible to run TreeAlign with allele specific data only
# obj = CloneAlignTree(tree=tree, hscn=hscn, snv_allele=snv_allele, snv=snv_total, repeat=1)

# running TreeAlign to assign cells to phylogenetic subclades
obj.assign_cells_to_tree()




Start processing 
At node_0, one of the child clade is node_1 with 341 terminals. 
At node_0, one of the child clade is node_341 with 721 terminals. 
There are 0 genes in matrices. 
Start run clonealign for clade: node_0
hscn snp count: 1355
snv allele matrix cell count: 1000
Start Inference.


  Variable._execution_engine.run_backward(


ELBO converged at iteration 312
Clonealign finished!
CloneAlign Tree finishes at clade: node_0 with correct frequency 1.0




Start processing 
At node_1, one of the child clade is node_2 with 77 terminals. 
At node_1, one of the child clade is node_78 with 264 terminals. 
There are 0 genes in matrices. 
Start run clonealign for clade: node_1
hscn snp count: 1291
snv allele matrix cell count: 238
Start Inference.
ELBO converged at iteration 240
Clonealign finished!
CloneAlign Tree finishes at clade: node_1 with correct frequency 1.0




Start processing 
At node_2, one of the child clade is node_3 with 26 terminals. 
At node_2, one of the child clade is node_28 with 51 terminals. 
There are 0 genes in matrices. 
Start run clonealign for clade: node_2
hscn snp count: 1242
snv allele matrix cell count: 37
Start Inference.
ELBO converged at iteration 230
Clonealign finished!
CloneAlign Tree finishes at clade: node_2 with correct frequency 1.0




Start processing 
At node_3, there are les

In [None]:
# to view more details about parameters you can customize when you run TreeAlign
help(CloneAlignTree)

## Getting results
The output of TreeAlign includes: 1. a table indicating the subclades to which the cells in scRNA data are assigned. 2. for each gene, a score ranging between 0 and 1 reflecting dosage effects.

In [None]:
clone_assign_df, gene_type_score_df, allele_assign_prob_df = obj.generate_output()

In [None]:
# subclade assignment for each cell in scRNA data
clone_assign_df

In [None]:
# the probability of having dosage effects for each gene
gene_type_score_df