# scATAC-seq

# VIPCCA scRNA-seq + scATAC-seq tutorial
This tutorial shows loading, preprocessing, VIPCCA integration and visualization of scRNA-seq and scATAC-seq dataset.

we obtained a [scRNA-seq data](http://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_10k_v3/pbmc_10k_v3_filtered_feature_bc_matrix.h5) consisting of gene expression measurements on 33,538 genes in 11,769 cells and a [scATAC-seq data](http://cf.10xgenomics.com/samples/cell-atac/1.0.1/atac_v1_pbmc_10k/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5) consisting of 89,796 open chromatin peaks on 8,728 nuclei, both were produced by 10X Genomics Chromium system and were on PBMCs.

### Preprocessing with R
For the scRNA-seq data, we obtained 13 cell types using a [standard workflow](https://www.dropbox.com/s/3f3p5nxrn5b3y4y/pbmc_10k_v3.rds?dl=1) in Seurat. The 13 cell types include 460 B cell progenitor, 2,992 CD14+ Monocytes, 328 CD16+ Monocytes, 1,596 CD4 Memory, 1,047 CD4 Naïve, 383 CD8 effector, 337 CD8 Naïve, 74 Dendritic cell, 592 Double negative T cell, 544 NK cell, 68 pDC, 52 Plateletes, and 599 pre-B cell.

For the scATAC-seq data, we filtered out cells that have with fewer than 5,000 total peak counts to focus on a final set of 7,866 cells for analysis. See the Signac website for [up-to-date vignettes](https://satijalab.org/signac/articles/pbmc_vignette.html) and documentation for analysing scATAC-seq data.

```R
library(Seurat)
library(ggplot2)
library(patchwork)

peaks <- Read10X_h5("../data/atac_v1_pbmc_10k_filtered_peak_bc_matrix.h5")
# create a gene activity matrix from the peak matrix and GTF, using chromosomes 1:22, X, and Y.
# Peaks that fall within gene bodies, or 2kb upstream of a gene, are considered
activity.matrix <- CreateGeneActivityMatrix(
    peak.matrix = peaks, 
    annotation.file = "../data/Homo_sapiens.GRCh37.82.gtf", 
    seq.levels = c(1:22, "X", "Y"), 
    upstream = 2000, verbose = TRUE
)
```

After filter, we converting Seurat Object to AnnData via h5Seurat using R packages. In this case, the atac.h5ad file will be generated in the corresponding path.

```R
library(SeuratDisk)

SaveH5Seurat(atac, filename = "atac.h5Seurat")
Convert("atac.h5Seurat", dest = "h5ad")
```

### Importing VIPCCA

In [1]:
import vipcca as vip
import vipcca.preprocessing as pp
import matplotlib
import scanpy as sc
matplotlib.use('TkAgg')

# Command for Jupyter notebooks only
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from matplotlib.axes._axes import _log as matplotlib_axes_logger
matplotlib_axes_logger.setLevel('ERROR')

Using TensorFlow backend.


### Loading data in python

In [2]:
adata1 = pp.read_sc_data("/Users/zhongyuanke/data/vipcca/atac/rna.h5ad")
adata2 = pp.read_sc_data("/Users/zhongyuanke/data/vipcca/atac/geneact.h5ad")
adata_all= pp.preprocessing([adata1, adata2])

Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.


### Data preprocessing
Here, we filter and normalize each data separately and concatenate them into one AnnData object. For more details, please check the preprocessing API.

In [3]:
adata_all= pp.preprocessing([adata1, adata2])

Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.


### VIPCCA Integration

In [5]:
# Command for Jupyter notebooks only
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

handle = vip.vipcca.VIPCCA(adata_all=adata_all,
                           res_path='./results/atac/',
                           mode='CVAE',
                           split_by="_batch",
                           epochs=20,
                           lambda_regulizer=2,
                           batch_input_size=64,
                           batch_input_size2=14,
                           )
adata_transform=handle.fit_transform()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "vae_mlp"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      (None, 2000)         0                                            
__________________________________________________________________________________________________
batch_input1 (InputLayer)       (None, 64)           0                                            
__________________________________________________________________________________________________
encoder_mlp (Model)             [(None, 16), (None,  285088      encoder_input[0][0]              
                                                                 batch_input1[0][0]               
__________________________________________________________________________________________________
batch_input

... storing '_batch' as categorical
... storing 'celltype' as categorical
... storing 'tech' as categorical


### Cell type prediction

In [None]:
ann_atac=network.findNeighbors(adata_transform)

### UMAP Visualization
