#  Run Mapping Command

## Command
```shell
yap mapping
```

In [2]:
!yap mapping -h

usage: yap mapping [-h] --input_fastq_pattern INPUT_FASTQ_PATTERN --output_dir
                   OUTPUT_DIR --config_path CONFIG_PATH
                   [--fastq_dataframe_path FASTQ_DATAFRAME_PATH]
                   [--mode {command_only,qsub,local}]

optional arguments:
  -h, --help            show this help message and exit
  --fastq_dataframe_path FASTQ_DATAFRAME_PATH
                        FASTQ dataframe path, this is optional, if the library
                        is demultiplexed using SampleSheet generated by yap
                        make-sample-sheet, FASTQ dataframe will be generated
                        automatically. (default: None)
  --mode {command_only,qsub,local}
                        Run mode, if command_only, will only generate command
                        files but not execute; if qsub, will execute with SGE
                        qsub system; if local, will execute with current
                        system, only use this for debugg

## Mapping mode

### Notes:
- All mode generate the same commands for all steps, the difference is whether and how to execute the commands.
- You will get the list of commands in output directory in all modes for your records

### command_only
- This mode will only generate list of commands for each mapping steps, then exit. You can execute these commands to get the same results as other modes.
- The file located in `{output_dir}/command_order.txt` is the order of execution.

#### Order of execution (IMPORTANT)
- The order of execution within each command file doesn't matter, because those are commands for each individual cell.
- The order of execution between each command file matters. Each step dependent on the execution of previous step.

### qsub (inhouse only)
This mode will automatically submit jobs using SGE qsub system.

#### Important notes
- This is only intend to work on our own SGE qsub system, may not work on other systems
- For other users, please use the `command_only` mode to get the command list. And then try to execute the command based on your own system setting.

### local
This mode will automatically execute all commands in the local machine, ONLY use this for testing and debugging. For example you can choose several files to run in local mode, test if all step execute successfully.


## Detail mapping steps

![Mapping Steps](files/MappingPipeline.png)

## Output Directory Structure


### Output Directory Structure
- Each sub-directory will contain:
    1. *.command.txt: the actual command list for generating the data file
    2. *.records.csv: the csv file contain the output file list that should be exist after the corresponding command finishes.
    3. data file (after execution): the actual data file will appear after command execute successfully.
    4. *.stats.csv (after `yap mapping-summary`): the stats file for this step.
- the qsub/ directory contains a copy of all command list as well, if using qsub mode, the qsub log file will appear in each of the sub-directory of qsub.
- MappingSummary.csv.gz (after `yap mapping-summary`): A single flat csv file for the final cell-level summary for ALL necessary mapping stats.

In [18]:
!tree -d /example/output_dir/for/normal/mc/mapping/

/home/hanliu/tmp/final-bp/
├── allc
│   ├── generate_allc.command.txt
│   ├── generate_allc.records.csv
│   └── generate_allc.stats.csv
├── bismark_bam
│   ├── bismark_bam_qc.command.txt
│   ├── bismark_bam_qc.records.csv
│   ├── bismark_bam_qc.stats.csv
│   ├── bismark_mapping.command.txt
│   ├── bismark_mapping.records.csv
│   ├── bismark_mapping.stats.csv
│   ├── final_bam.command.txt
│   ├── final_bam.records.csv
│   ├── select_dna_reads.command.txt
│   ├── select_dna_reads.records.csv
│   └── select_dna_reads.stats.csv
├── fastq
│   ├── demultiplex.command.txt
│   ├── demultiplex.records.csv
│   ├── demultiplex.stats.csv
│   ├── fastq_dataframe.csv
│   ├── fastq_qc.command.txt
│   ├── fastq_qc.records.csv
│   ├── fastq_qc.stats.csv
│   ├── merge_lane.command.txt
│   └── merge_lane.records.csv
├── MappingSummary.csv.gz
└── qsub
    ├── bismark_bam_allc
    │   ├── allc_commands.txt
    │   ├── bam_qc_commands.txt
    │   ├── bismark_commands.txt
   

### mCT Library Specific Directory
- For mCT library, there will be an additional sub-directory called "star_bam" in both the root output directory and qsub directory. For storing STAR mapped RNA bam file
- The mapping summary file will also have additional fields for RNA stats and DNA, RNA selection stats.

In [2]:
!tree -d /example/output_dir/for/mct/mapping/

/home/hanliu/tmp/final/
├── allc
├── bismark_bam
├── fastq
├── qsub
│   ├── bismark_bam_allc
│   ├── fastq
│   └── star_bam
└── star_bam

8 directories
