Skip to content

pachterlab/qcbc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

qcbc

qcbc is a python package to quality control synthetic barcode sequences for orthogonal sequencing-based assays such as:

Installation

The latest release can be installed with

pip install qcbc

The development version can be installed with

pip install git+https://github.com/pachterlab/qcbc

Run qcbc on your own barcode list Open In Colab

Usage

qcbc consists of four subcommands:

$ qcbc
usage: qcbc [-h] [--verbose] <CMD> ...

qcbc 0.0.2: Format sequence specification files

positional arguments:
  <CMD>
    ambiguous  find barcodes with shared subsequence
    content    compute base distribution (A,T,C,G counts/frequencies)
    homopolymer
               compute homopolymer distribution (length > 2)
    pdist      compute pairwise distance
    volume     compute size of barcode space

Barcode files are expected to contain both the barcode sequence and a name associated with the barcode, separated by a tab. For example

$ cat barcodes.txt
AGCAGTTACAG tag1
CTTGTACCCAG tag2

$ cat -t barcodes.txt 
CATGGAGGCG^Itag1
AGCAGTTACAG^Itag2

Note that cat -t file.txt converts <tabs> into ^I and can be used to verify that the file is properly setup.

qcbc ambiguous: find barcodes with shared subsequence

Find barcodes that share subsequences of a given length.

qcbc ambiguous -l <length> <bc_file>
  • optionally, -rc can be used to check the reverse complement of the subsequences.
  • <length> corresponds to the subsequence length used to evaluate ambiguity between barcodes.
  • <bc_file> corresponds to the barcode file.

Examples

# check ambiguous barcodes by subsequences of length 6
$ qcbc ambiguous -l 3 barcodes.txt
CAG	tag1,tag1,tag2
TAC	tag1,tag2

qcbc content: compute base distribution

Compute the base distribution within each barcode.

qcbc content <bc_file>
  • optionally, specify -- frequency to return the base distribution fraction
  • optionally, specify --entropy to return the entropy of the base distribution fraction relative to the max entropy.
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc content -e barcodes.txt
name	seq	ent
tag1	AGCAGTTACAG	0.67
tag2	CTTGTACCCAG	0.67

qcbc homopolymer: compute homopolymer distribution

Find the number of homopolymers of length two or greater.

qcbc homopolymer <bc_file>
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc homopolymer barcodes.txt
name  seq homopolymer_length
tag1	AGCAGTTACAG	1,0,0,0,0,0,0,0,0,0
tag2	CTTGTACCCAG	1,1,0,0,0,0,0,0,0,0

qcbc pdist: compute pairwise distance

Compute the pairwise hamming distance between barcodes.

qcbc pdist <bc_file>
  • optionally, -rc can be used to check the reverse complement of the subsequences.
  • <bc_file> corresponds to the barcode file.

Examples

$ qcbc pdist barcodes.txt
AGCAGTTACAG	tag1	CTTGTACCCAG	tag2	8.0

qcbc volume: compute size of barcode space

Compute the fraction of barcode space occupied by the given barcodes.

qcbc volume <bc_file>
  • <bc_file> corresponds to the barcode file.

Examples

$  qcbc volume barcodes.txt
2 out of 4,194,304 possible unique barcodes representing 0.0000%

Contributing

Thank you for wanting to improve qcbc. If you have a bug that is related to qcbc please create an issue. The issue should contain

  • the qcbc command ran,
  • the error message, and
  • the qcbc and python version.

If you'd like to add assays sequence specifications or make modifications to the qcbc tool please do the following:

  1. Fork the project.
# Press "Fork" at the top right of the GitHub page
  1. Clone the fork and create a branch for your feature
git clone https://github.com/<USERNAME>/qcbc.git
cd qcbc
git checkout -b cool-new-feature
  1. Make changes, add files, and commit
# make changes, add files, and commit them
git add path/to/file1.py path/to/file2.py
git commit -m "I made these changes"
  1. Push changes to GitHub
git push origin cool-new-feature
  1. Submit a pull request

If you are unfamilar with pull requests, you find more information on the GitHub help page.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published