Skip to content

Phage Library Sequence Evaluation, a tool to analyse protein sequence coverage of phage display libraries

License

Notifications You must be signed in to change notification settings

stevenshave/PuLSE

Repository files navigation

PuLSE (Phage Library Sequence Evaluation) v1.4

Quality control and quantification of peptide sequences explored by phage display libraries.


As the complexity of phage display libraries increases, the ability to screen all theoretical library members is lost. Quality control must therefore be carried out on DNA base occurrence frequency derived from a subset of library reads taken from next generation sequencing.

PuLSE is designed to read in fastq next generation sequencing data describing reads from a phage display library. Output is positional base frequencies, as well as protein residue translation position counts and normalised frequencies. Using this output it is easy to identify skewed libraries or positions greatly enriched for certain bases. Output is in the form of an HTML formatted report and easily parsable tab delimited text file.

Dependencies

  • An HTML5 compatible web browser to view generated reports. (Tested with Google Chrome 57.0.2987.133 64-bit and Mozilla Firefox 52.0.2 64-bit).
  • To decompress the sample sequencing data, a means to decompress the gz data is required. On Linux this can be achieved with gunzip and on windows 7-Zip or similar. However, on linux, PuLSE is able to read gz compressed fastq files, making prior decompression unnecessary.
  • On linux, the common gzlib is used to read compressed input.
  • The linux version of PuLSE makes use of the ZLib wrapper zstr also included in this source distribution (see src/zstr). zstr is licensed under the MIT license and Copyright (c) 2015 Matei David, Ontario Institute for Cancer Research https://github.com/mateidavid/zstr.

To compile from source

  • A C++14 (C++1y) compiler. Pulse has been tested using GCC 5.4.0 and Clang 3.8.0.
  • To build from the Makefile, you must have GNU Make installed (tested with GNU Make 4.1)

Pre compiled binaries


Pre compiled binaries accompany the PuLSE distribution package for:

  • 64-bit Linux (Ubuntu 16.04 and compatible).
  • 32 & 64-bit Windows 7 and compatible.
Platform 32/64 bit Sha256 sum
Ubuntu Linux 16.04 64-bit e8b39ab6247b47da73730ca368209bb990848af72d797a0f82d63a39522450b7
Windows 7 64-bit deb4a1188a2bc634d9b3ba55051cbc18eb098296ba89eef2aa088d536738e368
Windows 7 32-bit 8bee802465f387726248681c04f4e2b719163831c8c93a20556b529c1936e17e

Pre compiled binaries can be found in the 'binaries sub folder'. NOTE: Windows binaries require the Visual C++ Redistributable for Visual Studio 2015 to be installed. Obtain MSVCP140.DLL - https://www.microsoft.com/en-gb/download/details.aspx?id=48145

Compilation


Windows

Microsoft Visual Studio 15 project and solution files are included in the PuLSE distribution. Alternatively, a C++14 capable compiler such as ICC or MinG can be used to compile the source code using the manual Linux compilation instructions below.

Linux

Pulse was developed and extensively tested on Ubuntu Linux 16.04, and as such is supplied with a configure script to build a makefile compatible with the target system. To use the build system:

./configure make sudo make install

The final make install is not necessary if you would like to run PuLSE from the build directory, or relocate the executable yourself. Make install will copy the executable to /usr/local/bin/

Alternatively, you may manually compile PuLSE via:

GCC

g++ -o pulse src/PuLSE.cpp -Wall -O3 -std=c++1y -lz

clang

clang++ -o pulse src/PuLSE.cpp -Wall -O3 -std=c++1y -lz

Usage


If compiling from source, it is a good idea to first run PuLSE-test before running PuLSE on your sequencing data. This will ensure that the system is working properly. PuLSE is designed to be run on output from next generation sequencing in the fastq file format and can be invoked as follows:

pulse inFile.fastq libraryDefinition[triplet residue][...]

inFile.fastq

inFile.fastq is the data output from next generation sequencing of the phage library being profiled.

libraryDefinition

libraryDefinition is a string of characters used by PuLSE to identify flanking DNA bases surrounding the randomised library position. With both upstream and downstream matches made, the randomised sequence between these markers is considered a library member and included in profiling statistics. The definition takes the form of first, DNA bases encoding the upstream forward marker, the dynamic portion of the library, and finally the downstream forward marker. The definition is reversed and complimentary bases generated in order to deal with reverse library reads in the NGS data. The full length of the specified (and reverse, complimentary) upstream marker is always used. However, in the case of downstream markers, only the first 3 DNA bases are used. The example data accompanying the PuLSE distribution uses 'CGTTGCXXXXXXXXXXXXXXXTGTGCT' as the library definition, specifying that the randomised sequence of 15 DNA bases denoted by 'X' (5 amino acids) as being flanked by CGTTGC and TGTGCT. PuLSE fully supports the use of IUPAC nucleotide codes as follows:

IUPAC nucleotide code Base
A Adenine
C Cytosine
G Guanine
T Thymine
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
B C or G or T
D A or G or T
H A or C or T
V A or C or G
N any base

Note that in addition to the above nucleotide codes, X may be used, and is equivalent to 'N' (any base).

[triplet residue] (Optional parameter)

This optional parameter allows PuLSE to operate on non-standard DNA triplet to amino acid mappings. By default, PuLSE uses the following mapping of DNA triplets to amino acid residue single letter codes:

UUU->F, UUC->F, UUA->L, UUG->L, CUU->L, CUC->L, CUA->L, CUG->L, AUU->I, AUC->I, AUA->I, AUG->M, GUU->V, GUC->V, GUA->V, GUG->V, UCU->S, UCC->S, UCA->S, UCG->S, AGU->S, AGC->S, CCU->P, CCC->P, CCA->P, CCG->P, ACU->T, ACC->T, ACA->T, ACG->T, GCU->A, GCC->A, GCA->A, GCG->A, UAU->Y, UAC->Y, UAA->*, UAG->*, UGA->*, CAU->H, CAC->H, CAA->Q, CAG->Q, GAA->E, GAG->E, AAU->N, AAC->N, AAA->K, AAG->K, GAU->D, GAC->D, UGU->C, UGC->C, UGG->W, CGU->R, CGC->R, CGA->R, CGG->R, AGA->R, AGG->R, GGU->G, GGC->G, GGA->G, GGG->G

A custom mapping may be inserted by first specifying the DNA triplet to be modified, then the single letter amino acid code as the product of the triplet. A common option for phage display systems with nonsense suppression is the alteration of the triplet UAG->* to UAG->Q. The change is expressed by replacing [triplet residue] with:

UAG Q

Note, that multiple changes may be made to the mappings, by continuing to specify mapping on the command line.

Example

The PuLSE distribution is supplied with an example dataset, containing NGS data obtained from sequencing a cyclic 5-mer phage display library containing 5 randomised amino acid positions flanked by cystine residues an expressed in a system with nonsense suppression. This data is supplied with the PuLSE distribution and is compressed with xz compression. Before use, it must be decompressed (on Linux, you may use unxz, or on windows 7-Zip). The PuLSE library definition for this library is as follows: CGTTGCXXXXXXXXXXXXXXXTGTGCT. Nonsense suppression is in the form of the UAG triplet remapped to produce a glutamate residue (Q).

To run PuLSE on the included dataset, supply the library definition and remap UAG to Q, we invoke PuLSE with the following command line:

On windows:

pulse sample-pulse-5merCyclic-CGTTGCXXXXXXXXXXXXXXXTGTGCT.fastq CGTTGCXXXXXXXXXXXXXXXTGTGCT UAG Q

On linux:

pulse sample-pulse-5merCyclic-CGTTGCXXXXXXXXXXXXXXXTGTGCT.fastq.gz CGTTGCXXXXXXXXXXXXXXXTGTGCT UAG Q

Alternatively, under linux you may run the bash script runExampleDataset.bash.

PuLSE will then output a HTML report with the name sample-pulse-5merCyclic-CGTTGCXXXXXXXXXXXXXXXTGTGCT.html and a simple tab delimited parsable text file report sample-pulse-5merCyclic-CGTTGCXXXXXXXXXXXXXXXTGTGCT.txt

About

Phage Library Sequence Evaluation, a tool to analyse protein sequence coverage of phage display libraries

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages