#  Mapping - Three Steps

### Step 1. Prepare mapping config


### Command
```shell
yap default-mapping-config
```

### Input
This step take several informations related to the library to make up a mapping config file. Each information is explained below

#### Mode `--mode`
- mc for normal snmC-seq2, snmC-seq3 
- mct for snmCT-seq, snmC2T-seq

#### Barcode version `--barcode_version`
- V1 for 8 random index
- V2 for 384 random index

#### Bismark Reference `--bismark_ref`
Read [bismark documentation](https://rawgit.com/FelixKrueger/Bismark/master/Docs/Bismark_User_Guide.html) to prepare a **bismark-bowtie2** mapping index using the `bismark_genome_preparation` command

#### Genome FASTA `--genome_fasta`
The **SAME** fasta file you used for `bismark_genome_preparation`.

#### STAR Reference (mct only) `--star_ref`
Read [STAR documentation](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) to prepare a STAR mapping index. In addition to the FASTA file, STAR also need a GTF file. For human and mouse, [GTF from GENCODE](https://www.gencodegenes.org/) is recommended.

#### GTF (mct only) `--gtf`
The **SAME** GTF file you used for STAR index building.

#### NOMe (NOMe treatment only) `--nome`
If this library is NOMe treated, the mapping config will have two modifications in ALLC generation and mapping summary:
    1. [callMethylation] num_upstr_bases = 1; this will record additional 1 base information before the cytosine in the mC context column of ALLC files, which allow us distinguish GpC sites.
    2. [callMethylation] mc_stat_feature and mc_stat_alias changed; this will generate different mapping summary to calculate GpCH, HpCG, and HpCH methylation, etc.
    

### Output

The mapping config generated based on input information will be printed out directly, you can redirect the output into a mapping_config.ini file, and use this file in step 2

## Step 2 Prepare Mapping Commands


### Command
```shell
yap default-mapping-config
```

### Input



### Output



In [3]:
!yap default-mapping-config -h

usage: yap default-mapping-config [-h] --mode {mct,mc} --barcode_version
                                  {V1,V2} --bismark_ref BISMARK_REF
                                  --genome_fasta GENOME_FASTA --star_ref
                                  STAR_REF --gtf GTF [--nome]

optional arguments:
  -h, --help            show this help message and exit
  --mode {mct,mc}       Library mode (default: None)
  --barcode_version {V1,V2}
                        Barcode version, V1 for 8 random index, V2 for 384
                        random index (default: None)
  --bismark_ref BISMARK_REF
                        Path to the bismark reference (default: None)
  --genome_fasta GENOME_FASTA
                        Path to the genome fasta file (default: None)
  --star_ref STAR_REF   Path to the STAR reference (default: None)
  --gtf GTF             Path to the GTF annotation file (default: None)
  --nome                Does this library have NOMe treatment? (default:
          

## Detail mapping steps

![Mapping Steps](files/MappingPipeline.png)

## Output Directory Structure


### Output Directory Structure
- Each sub-directory will contain:
    1. *.command.txt: the actual command list for generating the data file
    2. *.records.csv: the csv file contain the output file list that should be exist after the corresponding command finishes.
    3. data file (after execution): the actual data file will appear after command execute successfully.
    4. *.stats.csv (after `yap mapping-summary`): the stats file for this step.
- the qsub/ directory contains a copy of all command list as well, if using qsub mode, the qsub log file will appear in each of the sub-directory of qsub.
- MappingSummary.csv.gz (after `yap mapping-summary`): A single flat csv file for the final cell-level summary for ALL necessary mapping stats.

In [18]:
!tree -d /example/output_dir/for/normal/mc/mapping/

/home/hanliu/tmp/final-bp/
├── allc
│   ├── generate_allc.command.txt
│   ├── generate_allc.records.csv
│   └── generate_allc.stats.csv
├── bismark_bam
│   ├── bismark_bam_qc.command.txt
│   ├── bismark_bam_qc.records.csv
│   ├── bismark_bam_qc.stats.csv
│   ├── bismark_mapping.command.txt
│   ├── bismark_mapping.records.csv
│   ├── bismark_mapping.stats.csv
│   ├── final_bam.command.txt
│   ├── final_bam.records.csv
│   ├── select_dna_reads.command.txt
│   ├── select_dna_reads.records.csv
│   └── select_dna_reads.stats.csv
├── fastq
│   ├── demultiplex.command.txt
│   ├── demultiplex.records.csv
│   ├── demultiplex.stats.csv
│   ├── fastq_dataframe.csv
│   ├── fastq_qc.command.txt
│   ├── fastq_qc.records.csv
│   ├── fastq_qc.stats.csv
│   ├── merge_lane.command.txt
│   └── merge_lane.records.csv
├── MappingSummary.csv.gz
└── qsub
    ├── bismark_bam_allc
    │   ├── allc_commands.txt
    │   ├── bam_qc_commands.txt
    │   ├── bismark_commands.txt
   

### mCT Library Specific Directory
- For mCT library, there will be an additional sub-directory called "star_bam" in both the root output directory and qsub directory. For storing STAR mapped RNA bam file
- The mapping summary file will also have additional fields for RNA stats and DNA, RNA selection stats.

In [2]:
!tree -d /example/output_dir/for/mct/mapping/

/home/hanliu/tmp/final/
├── allc
├── bismark_bam
├── fastq
├── qsub
│   ├── bismark_bam_allc
│   ├── fastq
│   └── star_bam
└── star_bam

8 directories
