# Example pipeline and files to process 10X Visium data

We used the [Adult Mouse Olfactory Bulb](https://www.10xgenomics.com/datasets/adult-mouse-olfactory-bulb-1-standard-1) dataset from 10X website as the example. This data has about 5000 cells and the whole run time of this example is about **1 hour** using our workstation (CPU AMD Ryzen Threadripper Pro 5965wx, only use CPU for training).  

In [None]:
# Download the bam file from 10X website
!wget https://cf.10xgenomics.com/samples/spatial-exp/2.0.0/Visium_Mouse_Olfactory_Bulb/Visium_Mouse_Olfactory_Bulb_possorted_genome_bam.bam
# Build the reference for this dataset
!python ../../build_reference.py --species Mouse

In [None]:
# Run get_cb.py to extract all cell barcodes from the bam file
# When it comes to your own data, you should change the directory of input bam file and output cell barcode file
!python get_cb.py

# Index the downloaded bam file.

In [None]:
import pysam
pysam.index("Visium_Mouse_Olfactory_Bulb_possorted_genome_bam.bam")

## Prepare your path_to_bam.txt, path_to_cb.txt and sample_list.txt files

- sample_list.txt: The file contains sample names, each row represents a sample
- path_to_bam.txt: The file contains the directory to each sample's bam file
- path_to_cb.txt: The file contains the directory to each sample's cell barcode list.


## Run MATES on the example 10X Visium data

In [None]:
from MATES import bam_processor,data_processor,MATES_model,TE_quantifier
## Step1: Generate coverage vector
bam_processor.split_count_10X_data('exclusive','sample_list.txt', 'path_to_bam.txt', 'path_to_cb.txt', bc_ind='CB', ref_path = 'TE_nooverlap.csv')

## Step2: Generate the training and prediction sample
data_processor.calculate_UM_region('exclusive', '10X', 'sample_list.txt', bin_size=5, proportion=80, bc_path_file='path_to_cb.txt',cut_off=50)
data_processor.generate_training_sample('10X', 'sample_list.txt', bin_size=5, proportion=80,cut_off=50)
data_processor.generate_prediction_sample('exclusive','10X','sample_list.txt', bin_size=5, proportion=80, ref_path="TE_nooverlap.csv", bc_path_file='path_to_cb.txt',cut_off=50)


## Step3:Training and prediction
MATES_model.train('10X', 'sample_list.txt', bin_size = 5, proportion = 80, BATCH_SIZE= 256, 
                  AE_LR = 1e-4, MLP_LR = 1e-6, AE_EPOCHS = 2, MLP_EPOCHS = 2, DEVICE='cpu')
MATES_model.prediction('exclusive', '10X', 'sample_list.txt', bin_size = 5, proportion = 80, 
                       AE_trained_epochs =2, MLP_trained_epochs=2,DEVICE='cpu')

##Step4: Quantify the TE
TE_quantifier.unique_TE_MTX('exclusive', '10X', 'sample_list.txt', 20, bc_path_file='path_to_cb.txt')
TE_quantifier.finalize_TE_MTX('10X', 'sample_list.txt')