## Make and plot node-specific markers
This notebook takes user-specified nodes and creates lists of markers comparing the clusters on the two branches below these nodes.

In [None]:
import pandas as pd
import numpy as np
import cellstates as cs
from cellstates.chelpers import marker_scores
import scipy.io as sio
import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import pickle as pkl

In [None]:
path='/scicore/home/doetsch/GROUP/scigrp/vargasSingleCell3_2males_added/'
pathold='/scicore/home/doetsch/GROUP/scigrp/vargasSingleCell3/'

In [None]:
pklfile=open("varagasSingleCell3_2males_added.pkl",'rb')
df=pkl.load(pklfile)
clusters=pkl.load(pklfile)
hierarchy_df=pkl.load(pklfile)
score_df=pkl.load(pklfile)
annotation=pkl.load(pklfile)
n_scale=pkl.load(pklfile)
lmbd=pkl.load(pklfile)
pklfile.close()
data = df.to_numpy().astype(int)

In [None]:
clst = cs.Cluster(data, lmbd, clusters, max_clusters=max(clusters)+1, num_threads=12, n_cache=1000)

In [None]:
colordict={"Tom":"#1f77b4","Adam":"#17bec7","Viole":"#e377c2","Ana":"#d62728","Eve":"#a62728","Fiona":"#d41f7d","John":"#4287f5","Melvin":"#03255c"}
colors = list(map(colordict.get, np.unique(annotation)))
print(np.unique(annotation))
cl, clsizes = np.unique(clusters, return_counts=True)

In [None]:
with open("/scicore/home/doetsch/GROUP/scigrp/vargasSingleCell3/utils.py") as f:
    exec(f.read())

If the plotHierachy plot hasn't been made yet. You can make it here for a given level of superclustering. Just adjust the "nc" variable in the first line.

In [None]:
nc = 96 # number of clusters
merged_clusters = cs.clusters_from_hierarchy(hierarchy_df, cluster_init=clusters, steps= - nc + 1)
newick_string = cs.hierarchy_to_newick(hierarchy_df[-nc+1:], merged_clusters, cell_leaves=False)
t = Tree(newick_string, format=1)
ts = get_TreeStyle_attributes(t, merged_clusters, annotation, colors=colors,leaf_scale=0.05,normalize=True,showInternalNodeNames=True)
new_leaf_names = ["merged"+str(nc)+"C"+str(i) for i in np.arange(nc)]
name_dict = dict(zip(t.iter_leaf_names(),  new_leaf_names))
for key in name_dict:
    name_dict[key]=name_dict[key]+"_"+key
for l in t.iter_leaves():
    l.add_face(TextFace(name_dict[l.name],fsize=60), column=2)
# ts.show_leaf_name = True
t.render(path+'nb/plots/plotHierarchy96.pdf', tree_style=ts)

Here you indicate which node you are interested in. The node nodes typically are something like "I3". These names can be looked up in the plotHierarchy.pdf plot. The a list of top 100 markers if made.

In [None]:
nodeOfInterest="I3"
node = t.search_nodes(name=nodeOfInterest)[0]
i=0
subs=[[],[]]
for child in node.get_children():
    if i==2:
        print("not a binary split!!!")
        break
    for leaf in child:
        subs[i].append(int(leaf.name[1:]))
    i+=1
scores=marker_scores(clst,subs[0],subs[1])
scores = pd.Series(scores, index=geneids)
split_topmarkers=open(path+"nb/plots/"+nodeOfInterest+"_type3_markers.txt",'w')
sortedindex=scores.abs().sort_values(ascending=False)[:100].index
split_topmarkers.write("\n".join(list(map(str,sortedindex))))
split_topmarkers.close()

Here a ballplot of the above-generated list of genes is made. This is done at a given level of superclustering, which can be adjusted in the first line.

In [None]:
superclustering=96
import os
os.environ['QT_QPA_PLATFORM']='offscreen'
with open(path+"nb/plots/"+nodeOfInterest+"_type3_markers.txt", 'r') as genelist:
    makeBallPlot(superclustering, genelist,""+nodeOfInterest+"_type3_markers_"+str(superclustering)+".pdf",n_scale=n_scale,plotpath=path+"nb/plots/")
with open(path+"nb/plots/"+nodeOfInterest+"_type3_markers.txt", 'r') as genelist:
    makeBallPlot(superclustering, genelist,""+nodeOfInterest+"_type3_markers_"+str(superclustering)+"_expression.pdf",addExpression=True,n_scale=n_scale,plotpath=path+"nb/plots/")