# Prepare Sample Sheet (start from sequencing)

## Related Commands
```shell
# Print out template of the plate info
yap default-plate-info

# Make bcl2fastq sample sheet based on the plate info file
yap make-sample-sheet

```


## Step 1: Prepare a PlateInfo file

### What is PlateInfo file?
- A plain text file with experimental, library, and barcoding information.
- This file needs to be made manually for each library.
- The main content of this file is the **barcoding information for each plate in the library**, so the pipeline can properly **demultiplex and name the single cell files** with that information
- The initial 8-random-index barcoding version is V1, the 384-random index barcoding version is V2


### Get plate_info.txt template
```shell
yap default-plate-info
```

#### V1 (8-random-index) plate info template

In [2]:
!yap default-plate-info -v V1

# Executing default-plate-info...
#                               .__
#   ___________    _____ ______ |  |   ____
#  /  ___/\__  \  /     \\____ \|  | _/ __ \
#  \___ \  / __ \|  Y Y  \  |_> >  |_\  ___/
# /____  >(____  /__|_|  /   __/|____/\___  >
#      \/      \/      \/|__|             \/
#        .__                   __  ._.
#   _____|  |__   ____   _____/  |_| |
#  /  ___/  |  \_/ __ \_/ __ \   __\ |
#  \___ \|   Y  \  ___/\  ___/|  |  \|
# /____  >___|  /\___  >\___  >__|  __
#      \/     \/     \/     \/      \/
#
# ____   ________
# \   \ /   /_   |
#  \   Y   / |   |
#   \     /  |   |
#    \___/   |___|
#
#
# PlateInfo template of single cell sequencing demultiplex
#
# This file template contain 3 sections.
#
# [CriticalInfo]
# [LibraryInfo]
# [PlateInfo]
#
# The final sample id will be values of each part concatenated by "-" in the following order
# [Values in LibraryInfo] + [Additional values in PlateInfo] + [Sample UID determined by librar

#### V2 (384-random-index) plate info template

In [3]:
!yap default-plate-info -v V2

# Executing default-plate-info...
#                               .__
#   ___________    _____ ______ |  |   ____
#  /  ___/\__  \  /     \\____ \|  | _/ __ \
#  \___ \  / __ \|  Y Y  \  |_> >  |_\  ___/
# /____  >(____  /__|_|  /   __/|____/\___  >
#      \/      \/      \/|__|             \/
#        .__                   __  ._.
#   _____|  |__   ____   _____/  |_| |
#  /  ___/  |  \_/ __ \_/ __ \   __\ |
#  \___ \|   Y  \  ___/\  ___/|  |  \|
# /____  >___|  /\___  >\___  >__|  __
#      \/     \/     \/     \/      \/
#
# ____   ____________
# \   \ /   /\_____  \
#  \   Y   /  /  ____/
#   \     /  /       \
#    \___/   \_______ \
#                    \/
#
# PlateInfo template of single cell sequencing demultiplex
#
# This file template contain 3 sections.
#
# [CriticalInfo]
# [LibraryInfo]
# [PlateInfo]
#
# The final sample id will be values of each part concatenated by "-" in the following order
# [Values in LibraryInfo] + [Additional values in Pl

## Step 2: Run `yap make-sample-sheet`

- `yap make-sample-sheet` take a V1 or V2 plate info file to generate bcl2fastq sample sheet.
- The sample sheet and name pattern is automatically generated so the pipeline can automatically parse cell information during and post mapping.

- See usage bellow:

In [4]:
!yap make-sample-sheet -h

usage: yap make-sample-sheet [-h] --plate_info_path PLATE_INFO_PATH
                             --output_prefix OUTPUT_PREFIX
                             [--header_path HEADER_PATH]

optional arguments:
  -h, --help            show this help message and exit

Required inputs:
  --plate_info_path PLATE_INFO_PATH
                        Path of the plate information file. (default: None)
  --output_prefix OUTPUT_PREFIX
                        Output prefix, will generate 2 sample sheets, 1 for
                        miseq, 1 for novaseq (default: None)

Optional inputs:
  --header_path HEADER_PATH
                        Path to the sample sheet header that contains
                        sequencer info. Will use default if not provided.
                        (default: None)


## Example 

### V1 Index Library (e.g. snmC-seq2, snmCT-seq using V1 index)
- This example contain eight plates
- Every two plates share a primer quarter
- Possible primer_quarter values are:
    - SetB_Q1, SetB_Q2, SetB_Q3, SetB_Q4
    - Set1_Q1, Set1_Q2, Set1_Q3, Set1_Q4
- Each primer_quarter appears no more than twice for the same NovaSeq run.

### V2 Index Library (e.g. snmC-seq3, snmCT-seq using V2 index)
- This example contain four plates
- Every plate has six multiplex groups
- All primer name must be unique for the same NovaSeq run.

### Example Output

#### Command

In [5]:
!yap make-sample-sheet --plate_info_paths example_plate_info.txt --output_prefix Pool_EXAMPLE

# Executing make-sample-sheet...
# make-sample-sheet finished.


#### Output

The output contains 2 sample sheets, one for miseq, one for novaseq

In [7]:
!head -n 25 Pool_EXAMPLE.miseq.sample_sheet.csv

[Header],,,,,,,,,,
IEMFileVersion,4,,,,,,,,,
Date,,,,,,,,,,
Workflow,GenerateFASTQ,,,,,,,,,
Application,HiSeq_FASTQ_Only,,,,,,,,,
Assay,TruSeq_HT,,,,,,,,,
Description,,,,,,,,,,
Chemistry,,,,,,,,,,
,,,,,,,,,,
[Reads],,,,,,,,,,
151,,,,,,,,,,
151,,,,,,,,,,
,,,,,,,,,,
[Settings],,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,
,,,,,,,,,,
[Data],,,,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A1,,Plate,,,CGTAGAACAG,,CTGTTAGCGG,Pool_72_73_9A_10C,hanliu@salk.edu
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A2,,Plate,,,CTGGCATATT,,CGGAAGATAA,Pool_72_73_9A_10C,hanliu@salk.edu
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A3,,Plate,,,AGAACCTCGC,,ACACTTCGTT,P

In [9]:
!head -n 25 Pool_EXAMPLE.novaseq.sample_sheet.csv

[Header],,,,,,,,,,
IEMFileVersion,4,,,,,,,,,
Date,,,,,,,,,,
Workflow,GenerateFASTQ,,,,,,,,,
Application,HiSeq_FASTQ_Only,,,,,,,,,
Assay,TruSeq_HT,,,,,,,,,
Description,,,,,,,,,,
Chemistry,,,,,,,,,,
,,,,,,,,,,
[Reads],,,,,,,,,,
151,,,,,,,,,,
151,,,,,,,,,,
,,,,,,,,,,
[Settings],,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,
,,,,,,,,,,
[Data],,,,,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A1,,Plate,,,CGTAGAACAG,,CTGTTAGCGG,Pool_72_73_9A_10C,hanliu@salk.edu
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A2,,Plate,,,CTGGCATATT,,CGGAAGATAA,Pool_72_73_9A_10C,hanliu@salk.edu
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A3,,Plate,,,AGAACCTCGC,,A