Skip to content
Analysis code accompanying the paper "Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements." PLoS One. (2014).
R Shell GLSL Perl
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
README
alignment_stats.sh
downsample_melanogaster.sh
error_rates.R
extract_deletions.sh
generate_deletion_profile.pl
mol-combined-6.frg
run_minimus2.sh
run_wgs.sge
te-data.csv
te-glm.R
wgs_specfile.spec

README

# de novo genome assembly
run_wgs.sge -- command used to run CA
wgs_specfile.spec -- specfile upon which CA depends
mol-combined-6.frg -- frg file that gives additional parameters and points to fastq used for assembly
run_minimus2.sh -- script used to run minimus2

# analysis of long-read data
extract_deletions.sh -- script that reformats CIGAR string and calls perl script to calculate positions of deletions in alignments
generate_deletion_profile.pl -- script required by extract_deletions.sh to get deletion positions from CIGAR string

# analysis of assembly
alignment_stats.sh -- commands used to generate alignment results for Table 2
te_glm.R -- R script to generate GLM and GLMM to test features of TEs important for assembly
te_data.csv -- data from FlyTE database used as predictors in GLM/GLMM

# assemble down-sampled datasets
downsample_melanogaster.sh -- script to perform downsampling, generating a fastq file and frg file for input to the Celera Assembler


# see assess-assembly repository for script to assess presence/absence of genomic features (i.e. the response variable for the GLMM)
You can’t perform that action at this time.