Skip to content

kamhonhoi/HybBCSeq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Latest version: November 10, 2017

Overview

HybBCSeq is a suite of bioinformatics tools that is used to process and analyze next-generation sequening data generated by the Hybridoma Barcoded Sequencing workflow.

  • Demultiplex and bin NGS data to its well of origin
  • Cleaning of the NGS data to screen for productive antibody variable domain sequences
  • Report the antibody varaible domain sequences for each well

Prerequisite

  • Ubuntu 16.04
  • Python 2.7
  • Git 2.7.4 (or, the latest build)
  • Pip 9.0.1 (or, the latest build)
  • virtualenv 15.1.0 (or, the latest build)
  • flash 1.2.11 (Included in the HybBCSeq-working directory)

Installation

  • To install and upgrade Pip
sudo apt-get update
sudo apt-get install python-pip
sudo pip install --upgrade pip
  • To install virtualenv
sudo apt-get update
sudo apt-get install python-virtualenv
  • Perform git clone with the following command:
git clone https://github.com/kamhonhoi/HybBCSeq.git
  • To setup the virtual environment and install necessary packages (assuming at HybBCSeq/ folder):
python venv_setup.py
source HybBCSeq-venv/bin/activate
pip install -r requirements.txt
deactivate

Usage

Assuming at the HybBCSeq/ folder

  1. Retrieve raw NGS sequence files (.gz extension) from the source sequencer location and place in the HybBCSeq-working/samples directory

  2. In order to run the provided scripts, activate virtualenv with the following command:

    source HybBCSeq-venv/bin/activate
    
    • Note: to end virtualenv session, use command --- deactivate
    • Change into the HybBCSeq-working directory (i.e. cd HybBCSeq-working)
  3. Merge pair-end reads and re-label output files with desired labeling

    • Program used: flash
      • Usage example:
      ./flash –r 300 –f 500 –s 50  samples/NGS-R1.fastq.gz samples/NGS-R2.fastq.gz –o samples/NGS-merged
      
      • Arguments explained:
        • –r : sequence read length per read direction (for MiSeq 2x300, set read length to 300)
        • –f : expected merged read fragment length
        • –s : standard deviation from expected read fragment length
        • Locations of the NGS R1 and R2 sequence files
        • –o : output location and custom prefix
      • Outputs: please refer to the flash help for explanations on the generated files; in particular, the file with .extendedFrags.fastq extension is the merged file needed for next step
  4. Demultiplexing the merged sequences to wells

    • Script used: BarcodedSeq-demultiplex.py
      • Usage example:
      python BarcodedSeq-demultiplex.py barcodes.fna samples/NGS-merged.extendedFrags.fastq
      
      • Arguments explained:
        • barcodes.fna : the FASTA file containing the corresponding barcodes for Row and Columns (i.e. VH_barcodes.fna or VK_barcodes.fna)
        • merged_sequence file : location of the merged sequence file
      • Outputs: -demux.csv is the file needed for next step. –demux.log is the log file for the demultiplexing process. –demux-unfoundBC.csv is the file containing sequences without detectable barcodes. –unfoundflag.fna is the FASTA file containing sequences without detectable flag.
  5. Cleaning up the demultiplex sequences

    • Script used: BarcodedSeq-cleanup.py
      • Usage example:
      python BarcodedSeq-cleanup.py  motif.MotifT  samples/demux.csv
      
      • Arguments explained:
        • motif.MotifT : Location of the probability table flanking the mouse variable domain
        • demux.csv : Location of the demultiplexed CSV file
      • Outputs: -cleaned.csv is the cleaned file for the next step
      • Note: If "undefined symbol: PyFPE_jbuf" error were encountered, please refer to the Troubleshooting section for a fix.
  6. Consolidating cleaned demultiplexed sequences

    • Script used: BarcodedSeq-consolidate.py
      • Usage example:
      python BarcodedSeq-consolidate.py –mr 2 –ml 300 samples/cleaned.csv
      
      • Arguments explained:
        • -mr: minimum read counts to be considered for subsequent analysis
        • cleaned.csv: Location of the cleaned multiplexed file
      • Outputs: -cons.csv is the file needed for the next step; -cons-parametersLog.txt contains the arguments parameters used
  7. Reporting the representative sequences for each well

    • Script used: BarcodedSeq-report.py
      • Usage example:
      python BarcodedSeq-report.py –n 20 samples/cons.csv
      
      • Arguments explained:
        • –n : the number of iterations; higher number increase yields at the expense of representative sequence quality
        • cons.csv : Location of the consolidated file
      • Outputs: -report.csv is the final report file; -report.log reports the number of wells reported
  8. Deactivate virtual environment when complete

    deactivate
    

Troubleshooting

  • To fix the "undefined symbol: PyFPE_jbuf" error message
    • Change directory to: HybBCSeq/HybBCSeq-bin/without_fpectl/
    • Run the following command: python package_patch.py
    • The script will patch the custom package to work with your environment

License

Please refer to the LICENSE file.

Citation

Please cite: Chen, Y., Journal of Immunological Methods (2018), https://doi.org/10.1016/j.jim.2018.01.004

About

HybBCSeq is a suite of bioinformatics tools for the Hybridoma Barcoded Sequencing Workflow.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages