Analysis code accompanying the paper "Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements." PLoS One. (2014).
# de novo genome assembly
run_wgs.sge -- command used to run CA
wgs_specfile.spec -- specfile upon which CA depends
mol-combined-6.frg -- frg file that gives additional parameters and points to fastq used for assembly -- script used to run minimus2

# analysis of long-read data -- script that reformats CIGAR string and calls perl script to calculate positions of deletions in alignments -- script required by to get deletion positions from CIGAR string

# analysis of assembly -- commands used to generate alignment results for Table 2
te_glm.R -- R script to generate GLM and GLMM to test features of TEs important for assembly
te_data.csv -- data from FlyTE database used as predictors in GLM/GLMM

# assemble down-sampled datasets -- script to perform downsampling, generating a fastq file and frg file for input to the Celera Assembler

# see assess-assembly repository for script to assess presence/absence of genomic features (i.e. the response variable for the GLMM)
