Skip to content

A new software to calculate ecDNA from WGS bam file.

License

Notifications You must be signed in to change notification settings

sssimonyang/ecDNA-finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ecdna-finder is a modification and improvement of CircleMap which is under MIT license.

The idea of ecdna-finder is halfly the same as CircleMap. Ecdna-finder reuses the idea of CircleMap and modifies it to detect ecDNA(bigger than eccDNA which is the focus of CircleMap) efficiently.

Three files influding extract.py, utils.py, mates.py is a modification of extract_circle_SV_reads.py,utils.py,repeats.py respectively in CircleMap/circlemap/.

Thanks to @iprada for developing CircleMap and it totally facilitates my development of ecDNA-finder.

Prepare

To run the software simply, you only need to prepare coordinate-sorted bam file and queryname-sorted bam file.

Use follow command to transform a coordinate-sorted bam file to queryname-sorted.

samtools sort -n -@ {threads_num} -o {queryname-sorted} {coordinate-sorted}

Package need

  • numpy
  • pandas
  • pysam

All can be easy installed by pip or conda

Run

Run simply

python main.py -coord {coordinate-sorted} -query {queryname-sorted} -dir {dirname}

Other important params you may need

-cutoff : the default value of cutoff is 0, which means use the round(depth_average / 20) to cutoff peak and split read mate. The depth_average is calculated automatically. Certainly, the mininum allowable value is 1. YOU can provide this value to change the result. BUT the amount of time used varies accordingly.

Results

Results is placed in a new directory named {dirname} in the current directory. The circ_results.tsv is a tab-delimiter file.

Help guide

usage: main.py [-h] -coord COORDINATE -query QUERYNAME -dir DIRNAME
                   [-cutoff CUTOFF]

ecDNA-finder

optional arguments:
  -h, --help            show this help message and exit
  -coord COORDINATE, --coordinate COORDINATE
                        bam file sorted by coordinate
  -query QUERYNAME, --queryname QUERYNAME
                        bam file sorted by queryname
  -dir DIRNAME, --dirname DIRNAME
                        result directory
  -cutoff CUTOFF, --cutoff CUTOFF
                        seed interval cutoff and support read cutoff

Pbs_file

A pbs_file recommended for lab use

#!/bin/sh
#PBS -N PBS_ecDNA
#PBS -l nodes=1:ppn=5
#PBS -l walltime=16:00:00
#PBS -S /bin/bash
#PBS -q normal_3
#PBS -o /public/home/zhangjing1/yangjk/ecDNA/result/ecDNA_out
#PBS -e /public/home/zhangjing1/yangjk/ecDNA/result/ecDNA_out

start='------------START------------'$(date "+%Y %h %d %H:%M:%S")'------------START------------'
echo $start
echo $start >&2
dirname=SRR8236745
cd /public/home/zhangjing1/yangjk/ecDNA/result/
coordinate=/public/home/zhangjing1/yangjk/data/bam/${dirname}/sorted_coordinate.bam
queryname=/public/home/zhangjing1/yangjk/data/bam/${dirname}/sorted_query_name.bam
/public/home/liuxs/anaconda3/envs/ecDNA/bin/python /public/home/zhangjing1/yangjk/ecDNA/code/main.py -coord ${coordinate} -query ${queryname} -dir ${dirname}
end='-------------END-------------'$(date "+%Y %h %d %H:%M:%S")'-------------END-------------'
echo $end
echo $end >&2

About

A new software to calculate ecDNA from WGS bam file.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages