# Make Sample Sheet (inhouse)

## Step 1: Prepare a PlateInfo file

### What is PlateInfo file?
- A plain text file with experimental, library, and barcoding information.
- This file need to be made by user for each library.
- The main content of this file is the **barcoding information for each plate in the library**, so the pipeline can properly **demultiplex and name the single cell files** with that information
- The barcoding information also allows automatic parse the original well (row and column) of the plate where the cell locates.
- If your project have special design in different rows/columns, you can integrate that **after** mapping by yourself, based on the row/column information the pipeline gives.
- The initial 8-random-index barcoding version is V1, the new 384-random index barcoding version is V2


### Get plate_info.txt template
```shell
yap default-plate-info
```

#### V1 (8-random-index) plate info

In [1]:
!yap default-plate-info -v V1

# Executing default-plate-info...
#                               .__
#   ___________    _____ ______ |  |   ____
#  /  ___/\__  \  /     \\____ \|  | _/ __ \
#  \___ \  / __ \|  Y Y  \  |_> >  |_\  ___/
# /____  >(____  /__|_|  /   __/|____/\___  >
#      \/      \/      \/|__|             \/
#        .__                   __  ._.
#   _____|  |__   ____   _____/  |_| |
#  /  ___/  |  \_/ __ \_/ __ \   __\ |
#  \___ \|   Y  \  ___/\  ___/|  |  \|
# /____  >___|  /\___  >\___  >__|  __
#      \/     \/     \/     \/      \/
#
# ____   ________
# \   \ /   /_   |
#  \   Y   / |   |
#   \     /  |   |
#    \___/   |___|
#
#
# PlateInfo template of single cell sequencing demultiplex
#
# This file template contain 3 sections.
#
# [CriticalInfo]
# [LibraryInfo]
# [PlateInfo]
#
# The final sample id will be values of each part concatenated by "-" in the following order
# [Values in LibraryInfo] + [Additional values in PlateInfo] + [Sample UID determined by librar

#### V2 (384-random-index) plate info

In [2]:
!yap default-plate-info -v V2

# Executing default-plate-info...
#                               .__
#   ___________    _____ ______ |  |   ____
#  /  ___/\__  \  /     \\____ \|  | _/ __ \
#  \___ \  / __ \|  Y Y  \  |_> >  |_\  ___/
# /____  >(____  /__|_|  /   __/|____/\___  >
#      \/      \/      \/|__|             \/
#        .__                   __  ._.
#   _____|  |__   ____   _____/  |_| |
#  /  ___/  |  \_/ __ \_/ __ \   __\ |
#  \___ \|   Y  \  ___/\  ___/|  |  \|
# /____  >___|  /\___  >\___  >__|  __
#      \/     \/     \/     \/      \/
#
# ____   ____________
# \   \ /   /\_____  \
#  \   Y   /  /  ____/
#   \     /  /       \
#    \___/   \_______ \
#                    \/
#
# PlateInfo template of single cell sequencing demultiplex
#
# This file template contain 3 sections.
#
# [CriticalInfo]
# [LibraryInfo]
# [PlateInfo]
#
# The final sample id will be values of each part concatenated by "-" in the following order
# [Values in LibraryInfo] + [Additional values in Pl

## Step 2: Run `yap make-sample-sheet`

- `yap make-sample-sheet` will recognize V1, V2 plate info file automatically, the sample sheet and name pattern for V1 and V2 is different, but all the subsequent command (`yap mapping` and `yap mapping-summary` will recognize this)

- Usage
```shell
yap make-sample-sheet [-h] --plate_info_paths PLATE_INFO_PATHS
                             [PLATE_INFO_PATHS ...] --output_prefix
                             OUTPUT_PREFIX [--header_path HEADER_PATH]
```

In [3]:
!yap make-sample-sheet -h

usage: yap make-sample-sheet [-h] --plate_info_paths PLATE_INFO_PATHS
                             [PLATE_INFO_PATHS ...] --output_prefix
                             OUTPUT_PREFIX [--header_path HEADER_PATH]

optional arguments:
  -h, --help            show this help message and exit

Required inputs:
  --plate_info_paths PLATE_INFO_PATHS [PLATE_INFO_PATHS ...]
                        Space separated paths of plate infos, at least one
                        file should be provided. If multiple files provided,
                        will check barcode compatibility. (default: None)
  --output_prefix OUTPUT_PREFIX
                        Output prefix, will generate 2 sample sheets, 1 for
                        miseq, 1 for novaseq (default: None)

Optional inputs:
  --header_path HEADER_PATH
                        Path to the sample sheet header that contains
                        sequencer info. Will use default if not provided.
                        (defau

### Good example (V1 Example)

#### plate_info.txt

##### Library No.1

In [3]:
with open('plate_info.txt') as f:
    for line in f:
        line = line.strip()
        if line.startswith('#') or line == '':
            continue
        print(line)

[CriticalInfo]
n_random_index=8
input_plate_size=384
pool_id=Pool_73
tube_label=Pool_72_73_9A_10C
email=hanliu@salk.edu
[LibraryInfo]
lib_comp_date=180101
project=CEMBA
organism=mm
dev_stage_age=P56
tissue_cell_type=1A
exp_cond=1
bio_rep=1
tech_rep=1
lib_type=snmC-seq2
sequencer=NovaSeq
se_pe=pe
read_length=150
requested_by=HL
[PlateInfo]
plate_id	primer_quarter
CEMBA190530_9C_1	SetB_Q1
CEMBA190530_9C_2	SetB_Q1
CEMBA190530_9C_3	SetB_Q2
CEMBA190530_9C_4	SetB_Q2
CEMBA190620_9C_1	SetB_Q3
CEMBA190620_9C_2	SetB_Q3
CEMBA190620_9C_3	SetB_Q4
CEMBA190620_9C_4	SetB_Q4


##### Library No.2

In [4]:
with open('plate_info2.txt') as f:
    for line in f:
        line = line.strip()
        if line.startswith('#') or line == '':
            continue
        print(line)

[CriticalInfo]
n_random_index=8
input_plate_size=384
pool_id=Pool_73
tube_label=Pool_72_73_9A_10C
email=hanliu@salk.edu
[LibraryInfo]
lib_comp_date=200120
project=CEMBA
organism=mm
dev_stage_age=P56
tissue_cell_type=1A
exp_cond=1
bio_rep=1
tech_rep=1
lib_type=snmC-seq2
sequencer=NovaSeq
se_pe=pe
read_length=150
requested_by=HL
[PlateInfo]
plate_id	primer_quarter
CEMBA200120_9T_1	Set1_Q1
CEMBA200120_9C_2	Set1_Q1
CEMBA200120_9C_3	Set1_Q2
CEMBA200120_9C_4	Set1_Q2
CEMBA200120_9C_1	Set1_Q3
CEMBA200120_9C_2	Set1_Q3
CEMBA200120_9C_3	Set1_Q4
CEMBA200120_9C_4	Set1_Q4


#### Command
- Can accept multiple plate_info.txt, once their barcode are compatible to each other.
- The output is 2 sample sheet, one for miseq, one for novaseq

In [5]:
!yap make-sample-sheet --plate_info_paths plate_info.txt plate_info2.txt --output_prefix Pool_EXAMPLE

# Executing make-sample-sheet...
# make-sample-sheet finished.


#### Results

In [6]:
!cat Pool_EXAMPLE.miseq.sample_sheet.csv | grep CEMBA | wc -l

768


In [7]:
!head -n 25 Pool_EXAMPLE.miseq.sample_sheet.csv

[Header],,,,,,,,,,
IEMFileVersion,4,,,,,,,,,
Date,,,,,,,,,,
Workflow,GenerateFASTQ,,,,,,,,,
Application,HiSeq_FASTQ_Only,,,,,,,,,
Assay,TruSeq_HT,,,,,,,,,
Description,,,,,,,,,,
Chemistry,,,,,,,,,,
,,,,,,,,,,
[Reads],,,,,,,,,,
151,,,,,,,,,,
151,,,,,,,,,,
,,,,,,,,,,
[Settings],,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,
,,,,,,,,,,
[Data],,,,,,,,,,
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A1,,Plate,,,CGTAGAACAG,,CTGTTAGCGG,Pool_72_73_9A_10C,hanliu@salk.edu
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A2,,Plate,,,CTGGCATATT,,CGGAAGATAA,Pool_72_73_9A_10C,hanliu@salk.edu
180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A3,,Plate,,,AGAACCTCGC,,ACACTTCGTT,P

In [8]:
!cat Pool_EXAMPLE.novaseq.sample_sheet.csv | grep CEMBA | wc -l

3072


In [9]:
!head -n 25 Pool_EXAMPLE.novaseq.sample_sheet.csv

[Header],,,,,,,,,,
IEMFileVersion,4,,,,,,,,,
Date,,,,,,,,,,
Workflow,GenerateFASTQ,,,,,,,,,
Application,HiSeq_FASTQ_Only,,,,,,,,,
Assay,TruSeq_HT,,,,,,,,,
Description,,,,,,,,,,
Chemistry,,,,,,,,,,
,,,,,,,,,,
[Reads],,,,,,,,,,
151,,,,,,,,,,
151,,,,,,,,,,
,,,,,,,,,,
[Settings],,,,,,,,,,
Adapter,AGATCGGAAGAGCACACGTCTGAACTCCAGTCA,,,,,,,,,
AdapterRead2,AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT,,,,,,,,,
,,,,,,,,,,
[Data],,,,,,,,,,
Lane,Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,I5_Index_ID,index2,Sample_Project,Description
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A1,,Plate,,,CGTAGAACAG,,CTGTTAGCGG,Pool_72_73_9A_10C,hanliu@salk.edu
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A2,,Plate,,,CTGGCATATT,,CGGAAGATAA,Pool_72_73_9A_10C,hanliu@salk.edu
1,180101-CEMBA-mm-P56-1A-1-1-1-snmC-seq2-NovaSeq-pe-150-HL-CEMBA190530_9C_1-CEMBA190530_9C_2-A3,,Plate,,,AGAACCTCGC,,A

### Bad example

- Will check barcode compatibility, plate name uniqueness etc.

In [10]:
!yap make-sample-sheet --plate_info_paths plate_info.txt plate_info.txt --output_prefix Pool_EXAMPLE

# Executing make-sample-sheet...
Traceback (most recent call last):
  File "/gale/netapp/home/hanliu/anaconda3/bin/yap", line 10, in <module>
    sys.exit(main())
  File "/gale/netapp/home/hanliu/anaconda3/lib/python3.6/site-packages/cemba_data/__main__.py", line 1261, in main
    func(**args_vars)
  File "/gale/netapp/home/hanliu/anaconda3/lib/python3.6/site-packages/cemba_data/mapping/prepare_sample_sheet.py", line 204, in make_sample_sheet
    raise ValueError(f'{primer_quarter} have {n_plate} plates in the table, that is impossible.')
ValueError: SetB_Q3 have 4 plates in the table, that is impossible.
