Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of segmental duplications #2

Closed
lemieuxl opened this issue Nov 2, 2018 · 2 comments
Closed

List of segmental duplications #2

lemieuxl opened this issue Nov 2, 2018 · 2 comments

Comments

@lemieuxl
Copy link

lemieuxl commented Nov 2, 2018

When creating the list of segmental duplications, it looks like the resulting file has a column missing (when following your instructions).

The input file downloaded at this step looks fine (genomicSuperDups.bed).

$ head -n 3 genomicSuperDups.bed 
chr1	10000	87112	0.00713299
chr1	10000	20818	0.0186603
chr1	10000	19844	0.0173215

It looks like I'm losing the last column at the cut part of your command (in the README file).

$ awk '{print $1,$2; print $1,$3}' genomicSuperDups.bed | \
>  sort -k1,1 -k2,2n | uniq | \
>  awk 'chrom==$1 {print chrom"\t"pos"\t"$2} {chrom=$1; pos=$2}' | \
>  bedtools intersect -a genomicSuperDups.bed -b - | \
>  bedtools sort | \
>  bedtools groupby -c 4 -o min | \
>  awk 'BEGIN {i=0; s[0]="+"; s[1]="-"} {if ($4!=x) i=(i+1)%2; x=$4; print $0"\t0\t"s[i]}' | \
>  bedtools merge -s -c 4 -o distinct | \
>  cut -f1-3,5 | head -n 3
chr1	10000	10485
chr1	10485	18392
chr1	18392	87112

Is it possible that order of the fields might have changed with newer versions of bedtools (at the merge part of the command)? Removing the cut command gives me what looks like the proper content.

$ zcat dup.grch37.bed.gz | head -n 3
1	10000	10485	0.00713299
1	10485	18392	0.00579252
1	18392	87112	0.00457824

I'm using bedtools version 2.27.1 (so it shouldn't be affected by the groupby bug).

@freeseek
Copy link
Owner

freeseek commented Nov 2, 2018

The output you obtained seems correct to me. The final list is not a list of segmental duplications, but rather a list of intervals with the smallest divergence value for each interval, so that if you have overlapping segmental duplications, you will preserve only the one with the smallest divergence. I will see if I can clarify the tutorial. I am not sure whether using this filter is very important.

@lemieuxl
Copy link
Author

lemieuxl commented Nov 2, 2018

Thanks for your help!
I'll monitor the README file for updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants