Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

added 5' trimming and discard Ns to README and sickle.xml

  • Loading branch information...
commit 865d9a05e69c680f80e59349ed8d09482d46c222 1 parent 997e252
@najoshi authored
Showing with 34 additions and 10 deletions.
  1. +22 −10 README.md
  2. +12 −0 sickle.xml
View
32 README.md
@@ -3,22 +3,27 @@
## About
Most modern sequencing technologies produce reads that have
-deteriorating quality towards the 3'-end. Incorrectly called bases
-here negatively impact assembles, mapping, and downstream
+deteriorating quality towards the 3'-end and some towards the 5'-end as well. Incorrectly called bases
+in both regions negatively impact assembles, mapping, and downstream
bioinformatics analyses.
Sickle is a tool that uses sliding windows along with quality and
length thresholds to determine when quality is sufficiently low to
-trim the 3'-end of reads. It will also discard reads based upon the
+trim the 3'-end of reads and also determines when the quality is
+sufficiently high enough to trim the 5'-end of reads. It will also discard reads based upon the
length threshold. It takes the quality values and slides a window
across them whose length is 0.1 times the length of the read. If this
length is less than 1, then the window is set to be equal to the
length of the read. Otherwise, the window slides along the quality
-values until the average quality in the window drops below the
-threshold. At that point the algorithm determines where in the window
-the drop occurs and cuts both the read and quality strings there.
-However, if the cut point is less than the minimum length threshold,
-then the read is discarded entirely.
+values until the average quality in the window rises above the threshold, at
+which point the algorithm determines where within the window the rise occurs
+and cuts the read and quality there for the 5'-end cut. Then when the avearge quality
+in the window drops below the threshold, the algorithm determines where in the window
+the drop occurs and cuts both the read and quality strings there for the 3'-end cut.
+However, if the length of the remaining sequence is less than the minimum length threshold,
+then the read is discarded entirely. 5'-end trimming can be disabled.
+
+Sickle also has an option to discard reads with any Ns in them.
Sickle supports four types of quality values: Illumina, Solexa, Phred,
and Sanger. Note that the Solexa quality setting is an approximation
@@ -63,12 +68,14 @@ specific to those commands:
`sickle se` takes an input fastq file and outputs a trimmed version of
that file. It also has options to change the length and quality
-thresholds for trimming.
+thresholds for trimming, as well as disabling 5'-trimming and enabling removal
+of sequences with Ns.
#### Examples
sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq
sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -q 33 -l 40
+ sickle se -f input_file.fastq -t illumina -o trimmed_output_file.fastq -x -n
### Sickle Paired End (`sickle pe`)
@@ -76,7 +83,8 @@ thresholds for trimming.
trimmed paired-end files as well as a "singles" file. The "singles"
file contains reads that passed filter in one of the paired-end files
but not the other. You can also change the length and quality
-thresholds for trimming.
+thresholds for trimming, as well as disable 5'-trimming and enable removal
+of sequences with Ns.
#### Examples
@@ -88,3 +96,7 @@ thresholds for trimming.
-o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
-s trimmed_singles_file.fastq -q 12 -l 15
+ sickle pe -f input_file1.fastq -r input_file2.fastq -t sanger \
+ -o trimmed_output_file1.fastq -p trimmed_output_file2.fastq \
+ -s trimmed_singles_file.fastq -n
+
View
12 sickle.xml
@@ -20,6 +20,14 @@
-l $length_threshold
#end if
+ #if str($disable_five_prime) == 'disable_five_prime_true':
+ -x
+ #end if
+
+ #if str($discard_n) == 'discard_n_true':
+ -n
+ #end if
+
2> /dev/null
</command>
@@ -53,6 +61,10 @@
<param name="length_threshold" value="20" type="integer" optional="true" label="Length Threshold">
<validator type="in_range" min="0" message="Minimum value is 0"/>
</param>
+
+ <param name="disable_five_prime" type="boolean" truevalue="disable_five_prime_true" falsevalue="disable_five_prime_false" checked="false" label="Disable 5'-end trimming"/>
+
+ <param name="discard_n" type="boolean" truevalue="discard_n_true" falsevalue="discard_n_false" checked="false" label="Discard any sequence with any number of Ns"/>
</inputs>
<outputs>
Please sign in to comment.
Something went wrong with that request. Please try again.