Skip to content
masikol edited this page Oct 18, 2022 · 4 revisions

pub

Name stands for "Pick Up Barcodes"

Desription

The script is designed for automatic picking up sequencing barcodes.

"pub.R" uses pam algorithm (from cluster R package) to cluster barcodes and select ones that are the most dissimilar from others.

After picking up barcodes, it checks color balance: i.e. checks if there will not be all the same laser/LED "shining" during one sequencing cycle. See this page for details about color balance.

Dependencies

The script is written in R, so you need R interpreter to use it.

Usage

Rscript pub.R <csv_file_with_barcodes> <number_of_samples>

Run Rscript pub.R -h to see help message in console.

Example of usage

Pick up 28 barcodes from file my_favorite_barcodes.csv.

Rscript pub.R my_favorite_barcodes.csv 28

Format of input files

"pub.R" accepts either "single" or "double" format of input files.

"Single" file format

Two-column CSV file. Header is desirable, but not mandatory.

First column -- barcode name. Second column -- barcode sequence.

Example:

I7_Index_ID,index
P1-A1,TTACCGAC
P2-A2,AGTGACCT
P3-A3,TCGGATTC
P4-A4,CAAGGTAC

"Double" file format

Four-column CSV file. Header is desirable, but not mandatory.

First column -- i7 barcode name. Second column -- i7 barcode sequence. Third column -- i5 barcode name. Fourth column -- i5 barcode sequence.

Example:

I7_Index_ID,index,I5_Index_ID,index2
N701,TAAGGCGA,S502,CTCTCTAT
N702,CGTACTAG,S502,CTCTCTAT
N703,AGGCAGAA,S502,CTCTCTAT
N704,TCCTGAGC,S502,CTCTCTAT

"Muting" lines in barcode files

You can "mute" (or comment, if you wish) lines with # in your barcode file. Muted lines will be ignored by "pub.R".

Example (barcode P2-A2 will be ignored by "pub.R"):

I7_Index_ID,index,I5_Index_ID,index2
I7_Index_ID,index
P1-A1,TTACCGAC
# P2-A2,AGTGACCT
P3-A3,TCGGATTC
P4-A4,CAAGGTAC