List of segmental duplications #2

lemieuxl · 2018-11-02T12:45:43Z

When creating the list of segmental duplications, it looks like the resulting file has a column missing (when following your instructions).

The input file downloaded at this step looks fine (genomicSuperDups.bed).

$ head -n 3 genomicSuperDups.bed 
chr1	10000	87112	0.00713299
chr1	10000	20818	0.0186603
chr1	10000	19844	0.0173215

It looks like I'm losing the last column at the cut part of your command (in the README file).

$ awk '{print $1,$2; print $1,$3}' genomicSuperDups.bed | \
>  sort -k1,1 -k2,2n | uniq | \
>  awk 'chrom==$1 {print chrom"\t"pos"\t"$2} {chrom=$1; pos=$2}' | \
>  bedtools intersect -a genomicSuperDups.bed -b - | \
>  bedtools sort | \
>  bedtools groupby -c 4 -o min | \
>  awk 'BEGIN {i=0; s[0]="+"; s[1]="-"} {if ($4!=x) i=(i+1)%2; x=$4; print $0"\t0\t"s[i]}' | \
>  bedtools merge -s -c 4 -o distinct | \
>  cut -f1-3,5 | head -n 3
chr1	10000	10485
chr1	10485	18392
chr1	18392	87112

Is it possible that order of the fields might have changed with newer versions of bedtools (at the merge part of the command)? Removing the cut command gives me what looks like the proper content.

$ zcat dup.grch37.bed.gz | head -n 3
1	10000	10485	0.00713299
1	10485	18392	0.00579252
1	18392	87112	0.00457824

I'm using bedtools version 2.27.1 (so it shouldn't be affected by the groupby bug).

The text was updated successfully, but these errors were encountered:

freeseek · 2018-11-02T13:25:33Z

The output you obtained seems correct to me. The final list is not a list of segmental duplications, but rather a list of intervals with the smallest divergence value for each interval, so that if you have overlapping segmental duplications, you will preserve only the one with the smallest divergence. I will see if I can clarify the tutorial. I am not sure whether using this filter is very important.

lemieuxl · 2018-11-02T13:28:14Z

Thanks for your help!
I'll monitor the README file for updates.

freeseek closed this as completed Nov 3, 2018

kylec mentioned this issue Dec 7, 2018

segfault at basic qc step #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of segmental duplications #2

List of segmental duplications #2

lemieuxl commented Nov 2, 2018

freeseek commented Nov 2, 2018

lemieuxl commented Nov 2, 2018

List of segmental duplications #2

List of segmental duplications #2

Comments

lemieuxl commented Nov 2, 2018

freeseek commented Nov 2, 2018

lemieuxl commented Nov 2, 2018