# TreeAlign with phylogenetic tree as input
## Introduction
TreeAlign is a model for scRNA and scDNA integration. TreeAlign takes in 1. a cell by gene `scRNA count matrix`, 2. cell by gene `scDNA copy number matrix` and 3. a `phylogenetic tree` constructed with scDNA data as input. TreeAlign assigns cells from the scRNA count matrix to a subclade of the phylogenetic tree. 

## Loading data

In [2]:
from treealign import CloneAlignClone
from treealign import CloneAlignTree

import pandas as pd
from Bio import Phylo

# scRNA read count matrix where each row represents a gene, 
# each column represents a cell
expr = pd.read_csv("../data/example_expr.csv", index_col=0)

# scDNA copy number matrix where each row represents a gene,
# each column represents a cell
cnv = pd.read_csv("../data/example_gene_cnv.csv", index_col=0)

# phylogenetic tree constructed with scDNA data in newick format
tree = Phylo.read("../data/example_phylogeny.newick", "newick")

  from .autonotebook import tqdm as notebook_tqdm


## Running TreeAlign with tree

In [None]:
# construct CloneAlignTree object for data preprocessing
obj = CloneAlignTree(expr, cnv, tree, repeat=1, min_gene_diff=50)

# running TreeAlign to assign cells to phylogenetic subclades
obj.assign_cells_to_tree()

In [11]:
# to view more details about parameters you can customize when you run TreeAlign
help(CloneAlignTree)

Help on function __init__ in module treealign.clonealign_tree:

__init__(self, expr, cnv, tree, normalize_cnv=True, cnv_cutoff=10, model_select='gene', repeat=10, min_cell_count_expr=20, min_cell_count_cnv=20, min_gene_diff=300, level_cutoff=10, min_proceed_freq=0.7, min_clone_assign_prob=0.8, min_clone_assign_freq=0.7, min_consensus_gene_freq=0.8, max_temp=1.0, min_temp=0.5, anneal_rate=0.01, learning_rate=0.1, max_iter=400, rel_tol=5e-05)
    initialize CloneAlignTree object
    :param expr: expr read count matrix. row is gene, column is cell. (pandas.DataFrame)
    :param cnv: cnv matrix. row is gene, column is cell. (pandas.DataFrame)
    :param tree: phylogenetic tree of cells (Bio.Phylo.BaseTree.Tree)
    :param normalize_cnv: whether to normalized cnv matrix by min or not. (bool)
    :param cnv_cutoff: set cnv higher than cnv_cutoff to cnv_cutoff. (int)
    :param model_select: "gene" for the extended clonealign model or "default" for the original clonelign model (str)
    :para

## Getting results
The output of TreeAlign includes: 1. a table indicating the subclades to which the cells in scRNA data are assigned. 2. for each gene, a score ranging between 0 and 1 reflecting dosage effects.

In [None]:
clone_assign, s_score = obj.generate_output()

In [8]:
# subclade assignment for each cell in scRNA data
clone_assign

Unnamed: 0,cell_id,clone_id
0,0,node_37
1,1,node_37
2,2,node_89
3,3,node_37
4,4,node_9
...,...,...
993,995,node_89
994,996,node_9
995,997,node_9
996,998,node_89


In [9]:
# the probability of having dosage effects for each gene
s_score

Unnamed: 0,gene,gene_type_score
0,C1orf127,1.403111e-09
1,MAD2L2,9.875857e-01
2,KIAA2013,9.988436e-01
3,PLOD1,9.949160e-01
4,RAB4A,9.985823e-01
...,...,...
339,NFIL3,9.893179e-01
340,NOL8,9.928191e-01
341,ASPN,1.000000e+00
342,CENPP,9.209363e-01
