### Notebook for running SCOT on SNARE-seq Cell Mixture Data
Access to the raw dataset: Gene Expression Omnibus accession no GSE126074  
SNARE-seq data in `/data` folder containes the version with dimensionality reduction techniques applied from the original SNARE-seq paper (https://www.nature.com/articles/s41587-019-0290-0)    
SCOT software has been updated on 20 September 2020. It now outputs error statements for convergence issues. When it runs into numerical instabilities in convergence, it outputs None, None instead of X_new, y_new. If you run into such an error, please try using a larger epsilon value for the entropic regularization.  

**Note** This version of the notebook runs a new setting for SCOT, where we use correlation as a metrix for building kNN graphs and use connectivity information from this graph in adjacency matrices fed into the optimal transport algorithm.  

If you have any questions, e-mail: ritambhara@brown.edu, pinar_demetci@brown.edu, rebecca_santorella@brown.edu  

In [1]:
import src.utils as ut
import src.evals as evals
from src.scot import *

In [12]:
X=np.exp(np.load("data/scrna_feat.npy")) #Unlike the other notebook, we don't need to correct the log transformation. 
# log transformation makes little difference when considering correlations in kNN graphs as opposed to Euclidean distances.
y=np.load("data/scatac_feat.npy")
print("Dimensions of input datasets are: ", "X= ", X.shape, " y= ", y.shape)

Dimensions of input datasets are:  X=  (1047, 10)  y=  (1047, 19)


In [13]:
X=ut.unit_normalize(X)
y=ut.unit_normalize(y)

## If you'd like to apply z-score normalization instead:
# X=ut.zscore_standardize()
# y=ut.zscore_standardize()
# Note that zscore_standardize doesn't yield as good results on this dataset and MMD-MA and UnionCom comparisons 
# also used unit (l-2) normalization

In [None]:
# Set hyperparameters of the algorithm:
k=50
e=0.0005 
# Other values to try for very similar alignment results:
# k=25 with e=0.0018, 0.00182, 0.00185, 0.002, or k=30 with  
# Combinations from a range of k=20 to k=30 and e=0.0015 to e= 0.0040 seems to yield the best results on this dataset,
# So if you'd like to perform hyperparameter tuning, you can set a grid between these values.
X_new,y_new= scot(X, y, k, e, mode="connectivity", metric="correlation")

It.  |Err         
-------------------
    0|2.351419e-03|


##### Evaluate results:

In [None]:
fracs=evals.calc_domainAveraged_FOSCTTM(X_new, y_new)
print("Average FOSCTTM score for this alignment is: ", np.mean(fracs))

In [None]:
import matplotlib.pyplot as plt
legend_label="SCOT alignment FOSCTTM \n average value: "+str(np.mean(fracs))
plt.plot(np.arange(len(fracs)), np.sort(fracs), "r--", label=legend_label)
plt.legend()
plt.xlabel("Cells")
plt.ylabel("Sorted FOSCTTM")
plt.show()