Installation

SCOPE++ is a C++-based program for accurately identifying homopolymer in cDNA sequences using Hidden Markov Models. This can be extended to trimming poly(A)/poly(T) tails, or identifying A,C,G,T,or N homopolymer sequences.

Installation

First, make sure that autotools is installed.

Then run ./configure; make; sudo make install

If you don't have permission, then create a directory and run the following command

./configure --prefix=<your directory>; make; sudo make install

Getting Started

To make sure that the tool is working, run the following command below

./scope -i test.fq -o test_out.fa

You should end up with output something like this

Input file: ./example/test.fastq
Output file: test.out
File type: illumina
Input File Format: fasta
Output File Format: fasta
polyType: A
Filter Width: 12
Edge MinLength: 4
Boundary States: 2
Mininum Length: 10
Maxinum Training Set: 1000
Laplacian Smoothing Parameter: 1
Details: 0
Zero Based: 0
Print Everything: 0
Print Best Alignment: 0
Building model
model finalized
Number of sequences with homopolymers 54
Number of sequences without homopolymers 39
Number of trashed sequences 0

Parameters

    Input:
       -i [input file] (required) 
          the fastq input file or the fasta input file
       --input_format [input file format] 
             (default = fasta)
             fasta or fastq
    Output:
       -o [output file](required) 
             A fasta file containing masked homopolymers tails
       --print_all [output options]
             Prints all sequences to the file.
             Otherwise will print only sequences with detected
             polyA tails
       --out_format [output file format] 
             (default = fasta)
             fasta or fastq
       --details [output details]
             outputs more information including alignment scores,
             homopolymer length, and percent identity
       -z [zero index] 
              Output format is printed in zero based indexing, half open intervals 
              By default it is printed in one based indexing, closed intervals
    Search Type:
       --homopolymer_type homopolymer type [N|A|G|C|TCG]
             e.g. option A is a polyA tail
             (default = A)
       --poly searches for poly(A) or poly(T)
       --trim
             trims poly(A)/poly(T) tails
    Tool parameters:
       --filter_width filter width
             Size of the sliding window
             (default = 8 base pairs) 
       --minLength Mininum homopolymer Length 
             (default = 10 base pairs) 
       --minIdentity = 70  The minimum identity a homopolymer can have
       --edge_minLength Edge boundary MinLength
          (default=6)
       --edge_states Number of states at boundaries
          (default = 1)
       --sampling_frequency  determines how often sequences should be sampled for training
          (default = 1)
       --numTrain Number of training sequences
             (default = 1000) 
       --left_gap Distance minLength of beginning of the poly(A)/poly(T) to read end
       --right_gap  Distance minLength of end of the poly(A)/poly(T) to read end
       --no_retrain Disables Baum Welch training
       --numThreads
    Help:
       --help help
       --version version information

More thorough descriptions of the parameters can be shown in the other README

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
basic		basic
example		example
human_validated_data		human_validated_data
m4		m4
manuscript		manuscript
parse		parse
sim		sim
src		src
test		test
.gitignore		.gitignore
Makefile.am		Makefile.am
Makefile.in		Makefile.in
README		README
README.md		README.md
bootstrap.sh		bootstrap.sh
compile		compile
config.h		config.h
config.h.in		config.h.in
configure		configure
configure.ac		configure.ac
depcomp		depcomp
install-sh		install-sh
license.txt		license.txt
missing		missing

License

mortonjt/SCOPE

Folders and files

Latest commit

History

Repository files navigation

Installation

Getting Started

Parameters

About

Resources

License

Stars

Watchers

Forks

Languages