GitHub - kamhonhoi/HybBCSeq: HybBCSeq is a suite of bioinformatics tools for the Hybridoma Barcoded Sequencing Workflow.

Latest version: November 10, 2017

Overview

HybBCSeq is a suite of bioinformatics tools that is used to process and analyze next-generation sequening data generated by the Hybridoma Barcoded Sequencing workflow.

Demultiplex and bin NGS data to its well of origin
Cleaning of the NGS data to screen for productive antibody variable domain sequences
Report the antibody varaible domain sequences for each well

Prerequisite

Ubuntu 16.04
Python 2.7
Git 2.7.4 (or, the latest build)
Pip 9.0.1 (or, the latest build)
virtualenv 15.1.0 (or, the latest build)
flash 1.2.11 (Included in the HybBCSeq-working directory)

Installation

To install and upgrade Pip

sudo apt-get update
sudo apt-get install python-pip
sudo pip install --upgrade pip

To install virtualenv

sudo apt-get update
sudo apt-get install python-virtualenv

Perform git clone with the following command:

git clone https://github.com/kamhonhoi/HybBCSeq.git

To setup the virtual environment and install necessary packages (assuming at HybBCSeq/ folder):

python venv_setup.py
source HybBCSeq-venv/bin/activate
pip install -r requirements.txt
deactivate

Usage

Assuming at the HybBCSeq/ folder

Retrieve raw NGS sequence files (.gz extension) from the source sequencer location and place in the HybBCSeq-working/samples directory
In order to run the provided scripts, activate virtualenv with the following command:
```
source HybBCSeq-venv/bin/activate
```
- Note: to end virtualenv session, use command --- deactivate
- Change into the HybBCSeq-working directory (i.e. cd HybBCSeq-working)
Merge pair-end reads and re-label output files with desired labeling
- Program used: flash
  - Usage example:
```
./flash –r 300 –f 500 –s 50  samples/NGS-R1.fastq.gz samples/NGS-R2.fastq.gz –o samples/NGS-merged
```
  - Arguments explained:
    - –r : sequence read length per read direction (for MiSeq 2x300, set read length to 300)
    - –f : expected merged read fragment length
    - –s : standard deviation from expected read fragment length
    - Locations of the NGS R1 and R2 sequence files
    - –o : output location and custom prefix
  - Outputs: please refer to the flash help for explanations on the generated files; in particular, the file with .extendedFrags.fastq extension is the merged file needed for next step
Demultiplexing the merged sequences to wells
- Script used: BarcodedSeq-demultiplex.py
  - Usage example:
```
python BarcodedSeq-demultiplex.py barcodes.fna samples/NGS-merged.extendedFrags.fastq
```
  - Arguments explained:
    - barcodes.fna : the FASTA file containing the corresponding barcodes for Row and Columns (i.e. VH_barcodes.fna or VK_barcodes.fna)
    - merged_sequence file : location of the merged sequence file
  - Outputs: -demux.csv is the file needed for next step. –demux.log is the log file for the demultiplexing process. –demux-unfoundBC.csv is the file containing sequences without detectable barcodes. –unfoundflag.fna is the FASTA file containing sequences without detectable flag.
Cleaning up the demultiplex sequences
- Script used: BarcodedSeq-cleanup.py
  - Usage example:
```
python BarcodedSeq-cleanup.py  motif.MotifT  samples/demux.csv
```
  - Arguments explained:
    - motif.MotifT : Location of the probability table flanking the mouse variable domain
    - demux.csv : Location of the demultiplexed CSV file
  - Outputs: -cleaned.csv is the cleaned file for the next step
  - Note: If "undefined symbol: PyFPE_jbuf" error were encountered, please refer to the Troubleshooting section for a fix.
Consolidating cleaned demultiplexed sequences
- Script used: BarcodedSeq-consolidate.py
  - Usage example:
```
python BarcodedSeq-consolidate.py –mr 2 –ml 300 samples/cleaned.csv
```
  - Arguments explained:
    - -mr: minimum read counts to be considered for subsequent analysis
    - cleaned.csv: Location of the cleaned multiplexed file
  - Outputs: -cons.csv is the file needed for the next step; -cons-parametersLog.txt contains the arguments parameters used
Reporting the representative sequences for each well
- Script used: BarcodedSeq-report.py
  - Usage example:
```
python BarcodedSeq-report.py –n 20 samples/cons.csv
```
  - Arguments explained:
    - –n : the number of iterations; higher number increase yields at the expense of representative sequence quality
    - cons.csv : Location of the consolidated file
  - Outputs: -report.csv is the final report file; -report.log reports the number of wells reported
Deactivate virtual environment when complete
```
deactivate
```

Troubleshooting

To fix the "undefined symbol: PyFPE_jbuf" error message
- Change directory to: HybBCSeq/HybBCSeq-bin/without_fpectl/
- Run the following command: python package_patch.py
- The script will patch the custom package to work with your environment

License

Please refer to the LICENSE file.

Citation

Please cite: Chen, Y., Journal of Immunological Methods (2018), https://doi.org/10.1016/j.jim.2018.01.004

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Latest version: November 10, 2017

Overview

Prerequisite

Installation

Usage

Troubleshooting

License

Citation

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
HybBCSeq-bin		HybBCSeq-bin
HybBCSeq-working		HybBCSeq-working
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
venv_setup.py		venv_setup.py

License

kamhonhoi/HybBCSeq

Folders and files

Latest commit

History

Repository files navigation

Latest version: November 10, 2017

Overview

Prerequisite

Installation

Usage

Troubleshooting

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages