Skip to content

Latest commit

 

History

History

example

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Example of using ScatTR

Running the example

In this example, we are estimating the copy number of a single TR in a simulated WGS sample. To run the example, make sure you have scattr or cargo installed in your PATH:

./run.sh

Input files

In the input/ directory, there are example input files to run ScatTR. In general, running ScatTR requires the following input files:

  • A tab-delimeted catalog containing TR loci of interest: catalog.tsv
    • The file is expected to have columns: id, contig, start, end, and motif
  • Aligned reads of the sample and their index: input/sample.bam and input/sample.bam.bai
  • A reference genome and its index: input/reference.fa and input/reference.fa.fai

Output files

In the output/ directory, there are output files generated by ScatTR based on the example input files. ScatTR produces the following output files:

  • The extracted bag of reads: sample.bag.bam
  • The extracted depth and insert distributions: sample.stats.json
    • By default, ScatTR produces plots of these distributions (sample.depth_distr.png and sample.insert_distr.png). The --no-plot option disables this behavior
  • The optimization problem definitions: sample.defs.json
    • This file information about all the reads associated with each TR and their relative positions with the decoy references (refer to manuscript methods section for more details)
  • The estimated copy numbers: sample.genotypes.json
    • This file contains estimates for each TR locus in addition to a 95% confidence interval

Generating input files

Generating the input files requires having python, art_illumina, bwa, samtools in your PATH. The input files are generated by running:

./scripts/make_inputs.sh

The script creates a random reference genome (~200 kbp in length) that is used to generate reads for a heteroyzygous TR expansion with normal allele copy number being 3 and with expanded copy number being 100. The motif of the TR is CAGATA.