Skip to content

noamteyssier/scispeak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scispeak

a rust parser to convert sci-seq-v3 reads into kallisto compatible formats

a CLI tool to whitelist filter sci-seq-v3 reads and convert them to a 10X-style format.

Overview

This tool is used to filter sciseq reads against their respective barcode whitelists and then output fastq file formats in the style of 10X reads.

This parses the sci-seq-v3 format, identifies the cell barcodes and UMIs and writes out a new file to resemble the 10X sequence construct to be used with other tools that have not yet adopted the sci-seq format.

sci-rna-seq3 Sequencing Construct

The sci-rna-seq3 sequencing construct is organized in the following way:

            ┌─'illumina_p5:29'
            ├─'i5:10'
            ├─'truseq_read_1_adapter:33'
            │                            ┌─'hairpin_barcode:10'
            │                            ├─'hairpin_adapter:6'
            ├─read_1─────────────────────┤
            │                            ├─'umi:8'
──RNA───────┤                            └─'cell_bc:10'
            ├─'poly_T:98'
            ├─'read_2:98'
            │                            ┌─'ME:19'
            ├─i7_primer──────────────────┤
            │                            └─'s7:15'
            ├─'i7:10'
            └─'illumina_p7:24'

Visualization from seqspec.

And so the resulting R1 and R2 files boil down to:

# R1
[linker][adapter][umi][barcode]

# R2
[cDNA]

Usage

This is a single command CLI tool. It requires just the R1 and R2 filepaths

scispeak \
    -i data/SRR7827205_sample_R1.fastq.gz \
    -I data/SRR7827205_sample_R2.fastq.gz;

However, it can be accelerated using multiple compression threads:

scispeak \
    -i data/SRR7827205_sample_R1.fastq.gz \
    -I data/SRR7827205_sample_R2.fastq.gz \
    -t 8;

And can store a log file as well to keep matching statistics:

scispeak \
    -i data/SRR7827205_sample_R1.fastq.gz \
    -I data/SRR7827205_sample_R2.fastq.gz \
    -t 8 \
    -l;

Outputs

This program will output 3 files per run:

  1. <args.prefix>_R1.fastq.gz: A fastq with the [barcode][UMI] construct for all reads passing the whitelist.
  2. <args.prefix>_R2.fastq.gz: An unaltered fastq of the R2 for all reads passing the whitelist.
  3. <args.prefix>_log.json: A log file containing the filtering statistics of the run.

About

a rust parser to convert sci-rna-seq3 reads into 10x compatible formats

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages