-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controlling granularity #4
Comments
The linear gap model in edyeet is probably not suitable for this target. If
you apply it as part of pggb, with smoothxg afterwards, it should be fine.
But directly inducing the graph with seqwish is going to generate a mess if
that's the only step.
I would suggest using wfmash. There you may find a need to increase the -l
and -a parameters or turn of adaptive banding to get the best alignment.
…On Fri, Oct 9, 2020, 21:57 Eugene Goltsman ***@***.***> wrote:
Hi Erik,
I'm using edyeet to induce a graph (w seqwish) on a small set of sequences
that contain mostly large indels (~4-6kb). It seems like in this case
edyeet is trying too hard to do base-level alignment where it should've
either terminated or opened a large gap. In the first case below, there is
a 5 kb inverted duplication (I know it because it was synthetically
introduced) at pos 7544324 on Accn1, but the aligner is attempting to
extend the alignment past the breakpoint following the initial ~50kb match.
Similarly, in the second case a 5kb inversion occurs at pos 7,573,027, but
instead of terminating the alignment, edyeet is pushing through the area of
virtually no identity. This leads to tiny graph segments and structures
that later get called as bogus variants. I tried raising the -p cutoff to
95%, but that results in the entire 50kb block containing the inversion not
being reported. It seems that this cutoff applies across the entire block.
Is there anything else you could suggest tweaking that works at a local
level, sort of like a gap extension vs mismatch penalty in smith-waterman ?
Thanks!
Accn1 75071545 7500000 7550000 + Accn2 75021975 7490030 7539604 49550 50000 23 id:f:0.99538 ma:i:49550 mm:i:13 ni:i:423 nd:i:11 ns:i:14 ed:i:461 al:i:50011 se:f:0.00921797 cg:Z:44326=10D4998=5I2=7I1=7I1=6I1=1I2=3I1=1I2=1I2=1I2=1X1=1X1I1=1I2=1X1=11I1=8I2=2I1=3I1=7I1=7I1=1I1=2I1=4I1=2I1=1I1=1I1=3I1=3I2=3I1=11I1=1I1=3I1=2I1=2I2=5I1=6I1=9I1=3I1=1I1=1I1=9I1=3I1=6I2=2I2=3I1=1I1=4I3=4I2=3I1=1I1=1I2=2I1=1I3=1I1=1D2=1I1=2I1=3I1=1X2=3I3=3I1=2I4=5I2=1I1=1I1=1I3=4I2=5I2=2I1=2I1=2I3=5I1=5I1=1I2=3I3=13I1=1I2=4I1=3I3=6I1=4I2=2I2=2I1=3I2=5I2=1I1=1I1=2I1=5I5=7I1=1I1=2I1=1I1=1I2=1X5I1=1I1=2I1=2I2=5I2=3I2=7I1=1I3=1I2=13I5=1X2=2I1=2I2=3I3=1X1=2I3=1X3I1=2I2=1I3=3I2=3I1=2I4=3I1=1I2=2I1=3I1=3I2=2I2=1X7I1=1I2=1I1=1I1=6I1=1I1=5I1=5I1=3I1=5I1=1X1=1I1=1I1=2I5=1X2I1=1I2=1I1=2X5I1=1I1=1I1=1I2=1I2=1I1=1I2=14I
Accn1 75071545 7550000 7600000 + Accn2 75021975 7545041 7594371 47271 50000 14 id:f:0.958261 ma:i:47271 mm:i:1450 ni:i:609 nd:i:609 ns:i:670 ed:i:3338 al:i:50609 se:f:0.0659566cg:Z:23026=1X2D1=1X1=1D1=1I1=2D1=1X1D2=1X1=1D1=1D1=1D1=1X1D3=1X2=1D1=1X1D1=1D1=1X3=1X1D1=1X1D4=1X1=2I3=2D1=1D4=1I3=2X4=4X1=2I1=1I1=1X2=1X2=1I1=1X2=1I2=1I1=1I1=3X1=1X1=2I2=2X2=1X3=3X1D3=2X1=1X1I1=2X1=2X1=1X2I1=2X6=1X1=1X2=1D3=1I1=1X2I1=3I5=3X2=1X1D2=1X2=1X1=1X1D1=1D2=1I1=1I4=1X1=1D1=1D2=2I1=3X1I2=1I1=1D1=1X1=1D1=1X1I1=1D2=1X1D1=1I2=2X4=1X3=2X1I4=2X2I3=2I3=1X1=1X1D2=3X1=1D1=1X1I2=1D2=1I2=1D1=1D1=1I1=1X1=1X1=2D1=1D2=2D6=1X3D1=1X1=2X2=1I1=1X1=1I3=1X1=1X2=1X2D1=2D1=1D3=1X1=1D2=2X1D1=1X1=1I5=1I2=3X3=1X1=1D1=1I2=2X1=1I1=3X1=1X1I1=1X1=1D2=1X1I1=2X2=2X1=1X1=1I1=1X1D1=1I2=1X1I1=1X1=1X1=1X1I2=1X3=2X1=5D4=1X1=2X1=1I6=1X1D1=1I2=2X2=2X2I1=2X2=1X1=1D1=1D1=1X1D1=1I2=1X1I1=2X3=1X2=1X1=1I2=1D1=1I2=1D4=1D1=5D3=2D1=1X1=1X1=1X1=2X2=1X2D1=1D3=1I1=1X3=2X1D2=1X1=1X2=1X1D1=1X2D1=1X4=2X1=3X2D2=1X1=1D2=1X2=1X3=1X1D2=1X1D1=1D1=1X2=1D1=1X1D4=1X1D1=1X2=1D2=2D1=1X2=1I3=2X1I1=2X3=2D3=2X1=1I2=1X1D3=1X1=1X1=2X1D1=1D1=2D2=2X1D2=1D1=1X1=2X1=1I2=1D1=2D1=1D3=1I1=2D3=3X1=2X1D4=1I1=1I2=2I5=1X1=1I2=1X1D3=1X2D2=1X2=2X1D2=2X3=1X1=1X1D1=1X1=2D2=4D1=1X2=1X1=1I2=1D1=2D1=1D2=1X1=2D3=1D3=1D1=1I2=2X1=1X1D1=1X1=2X1=2X1I2=1I1=1X1=1X1=2I3=1D2=2X2=1D3=1I1=1X1=2X2=2D1=1D2=1X1=1X1=1D1=3D1=1D2=2X1=2X1=1X3=1I2=1X1=2X2=2X1=1D1=3X3D1=1D2=1X2=1I1=1X3=1X2=1X2=1X1=2D5=1X1=1X1=1X1D2=1X1=1D1=1X1D4=1I2=1X2=1D4=3X1D2=1D1=1X1=3X1=2X3=2X1I2=2D3=2X1=1X1I2=1X1D1=1I2=1I1=1X1I3=3X1=4X1D1=1D1=1X1=3X3D1=1X3=1X1=2D1=1X3=1X1D1=1X2=1D1=1X1=3X1=1X1=2D1=2X3=1D2=1X2=1X1=1D1=3D1=2D1=1X1D4=1X1D2=1X1=1X1I3=1X2=1X1=1X3=2X1D2=1X1D2=1I2=2X2=1D1=1X1=3X2I1=2X2=1X2=4D5=1D2=1X2=1X1I2=2X1=1X1D3=3X2=1X2=1D2=1X1=1D1=1I1=1X1=1X1=1X1=1X1=1X1I2=1I2=2X3=2I4=1X1=1D4=2I1=1X1I2=1X2=1X2=2X2=1X1=2X1=1X1=1D2=4I2=1I3=1X1D4=1X1=1X2=2I3=1X1=1X1=1D1=1X1=1X2=1I2=1D1=1X1=1X2D3=2D2=1X2=1X1I2=1I1=1X1=1X1I3=3X2=1X1=2X1=1X1=1X1=2D1=1D1=2D1=1D3=2X1D3=1X1=1X2=1D2=1I1=1X1=1I2=1I2=1I1=1D1=1X3=3X1=1X1=1X4=1D2=1X1=3X1=1X1=2X3=1X1=1D2=3D6=1X1=1X1=1X2=1D2=2X1=1X1I1=1D1=1D3=1X2=1X1=3X1=1I2=1I4=1X1=3I1=1I1=1X1=1X1=1X2=1D1=1X1I2=1D3=2D1=1X1=1I1=1D1=1D2=1X2=1X3=1D1=2X1D1=3I2=1X2=1X1=1X1=1I2=1I1=1I1=1X3=1X2=1X1=1X1=1X1=1X1=1X2=1D3=1D1=3I3=1X1=2X2=1X1=2X2=4X2=1X1I1=1X1=1I1=1X1=1X1D1=1I3=2X1=1I1=1I1=1D2=2X1=1X3=1X1=1X1=1X3=1X2D1=2D2=2X2=3X2=1X1D1=1D2=2X2=1X1=1D2=1I2=1D1=1X1D1=2D1=1D2=2X4=1X2D1=1D1=1I2=1X1=1I2=1X2=1I1=1I2=1X4=2D2=1I1=1X1=1X3=2X2=1X2=1D2=1X1=4X1=1I2=1X1=1X1=2X1D2=2D4=1D2=1D2=3D3=2X1D1=1I1=1D2=1X2=2X2=1I1=1I1=1X1=1D1=1X2=1X1=1I3=1X1=1X2=2X2=1X1=2X1D1=1D1=3X1=1I3=1X1=1I2=1I1=1I2=1I2=1X2=1X1=1I4=1I1=1I2=1I1=1X4=3I2=3X2=1D2=1D1=1I1=1X3=1D1=1X3=1X1I2=1I1=1I1=1I2=1X2D1=2D2=1X1=1I2=1X1=1D4=1I1=3X1=2X1=1X1D1=1X2=1X4=1X1=1X1I2=1X1=1I1=1I3=2X2=1X1=2X2D3=1X2D4=3X3=2I1=1X2=1X1=2I2=1X1=1X4=1X1=1X1D2=3X2=1X4=1I1=1X3I2=2X1I2=2X1=1I1=2I5=1X1D1=1X1=1I2=1I2=1I1=1X2=1X1=1I1=1I3=2I2=1X1=2I1=2I2=1I3=3I1=2X1=1I3=1X2I1=1I2=3X2=1X2I1=1X1=1I1=1D4=1I2=1X2I1=1X1=1X1=2I1=1I1=1I4=1X1=1X1=1I1=2I1=3I3=2X2=1I2=1I1=2I3=1X3=1X1=1I2=1X2=1X3D2=1I1=1D2=1X4=6X4=2X2I2=1I4=1X2=2X1D1=1D1=1X1=1D1=1D1=3X3=1X1=1X1I1=1X2=1I1=1X5=4D4=1X1=1D1=1X1=2X1=6X1=2X2=1X1I1=1X4=1X2D6=2X1D4=2X1=1D2=1X1=1X1=2D2=2D1=1X5=2X1I1=1I4=2X4=1X1I2=2X1=1D2=1X1I2=1I1=2D2=2D4=1X2=1X1I1=1X1I1=1I1=1X1=1I1=1I1=1I1=3X3=1I1=1X2=1I1=1I1=1X1=1X1=1X1D4=2X3I5=1X2I4=5X1=1X1=1X2=1X1D4=2X1=4X1=1I1=2X1=1X5=1I1=1X2=1X1=2X1=2X1=4X1D1=2X1=1D1=2X2=1X1=1X1D2=1D1=1D3=1X4=1X1=1D1=1X2=1X1D2=1D2=1D1=2X1D1=2D2=1D1=1X1D1=1X1=1D2=1X1D1=1X1=1D1=2D3=1X2D1=1D1=1D5=1I2=1X1D2=1I1=2X1=2X4=1D1=1X1D3=2D2=1X1=2D2=1X1I1=1X1=2X1I1=2X2=1I3=1X2=3X1=1X2=1X1=1X1I1=1X2=1D2=1X1=3X1=1X4=2X1=6X2=1I1=1X1I2=1D1=3X1=1I1=2D2=1X1=1I1=2X1=1X1=1X1=1X4=1X1D1=1X2=2X1=3X1=2X1D1=1X1D1=1X1=1X1=1X1=3X1=1D1=1X1=1X2=2D2=1X1D1=1D3=1X1I4=1I3=1D1=1X1=1D4=2D2=1X2D1=1D1=1D1=1D4=1X1D1=1D3=1X1=1X1I2=1X1I2=1X1D2=1X2=1I1=1I4=1X1=2X1=1X2=1I1=2X2=2X2=2X1=1D2=1X1=2X1=1X4=1D1=1D2=1X2=1X1I2=1X1D2=2X1=1D4=1X2I4=1I1=1I1=1I1=1X1=1I1=3I5=1X1I1=1I3=1D4=1X1D3=1X2=1I1=3I2=1X1=1X1=1X1=2X1I1=1X1=1X1=1X1=1X1=2X1I1=3X1I1=2X2=1X1=1X1=1I3=1X1=1X1=1X1=2X1=1D1=1X2=2X1=1D1=1X2I1=1I2=1I4=7X4=1X2D1=1D1=3X1=1X2=1I2=1X1=1X1D1=1X2=1X1=3X2=1X3=1D2=2X1=2X1D1=1X1=1X1D2=2I1=1X2=2I3=1X3=2I2=2X1=2X1=1X1=1I2=1D1=1D5=1I1=1X1=3I3=1I1=1I1=1X1I1=1X1=1I2=1X1=1X2I3=2X1I1=3I3=2I2=1X1I2=1X1=1X1=1I4=1X3=1I1=1I2=1X1I1=1X2=2X2=2X1=4X2I1=2X1=2X1=1X2=1X1=1D5=1X1=2X1=1D1=4X1=2X4=1X1I2=1X1=1X1=5X4=1X2D5=2X3D4=1X1=2X1=1I1=1D1=1D2=1X1=1D3=3X1=1D2=1D1=2X1=1D1=2D1=1X1=1D1=1X4=2I2=2X1I2=1X1D3=2X1I2=1X1D4=2X5=2X1D1=1D4=1X1=2I2=1I1=1X1I1=1X3=2X1=1I3=2X1I6=1X2I4=1X1=1X1=1D1=2X1=6X1=2X1=1X1=1I1=1X4=4I5=1X2=2I3=1I2=1X1=1X1I1=1X1D1=1X1D1=1X1=1I1=1X1=1X1=1D1=1X4=1X2=1X2D2=1D2=6X4=1X2=1I1=1D2=1X1=3I1=1X3=1X2=1D1=1X3=2D1=1D2=1D2=2X3=1X2D1=2D2=2D1=1X4=1D1=1D1=2D1=1X1=1X1=1X2D3=1D3=1I1=1D1=1X1=1X2=3X2=1X1=2D2=3D1=1D1=2X1=1D2=2D1=1D2=2X1=1D3=4D5=1D1=1X2=2D1=1D3=1D1=1X1D1=1X1I5=2D2=2X1D2=2X1D2=1X1=4D4=1X2=3X2=1X1=1X1I4=1X1=1X2=2D1=1X2=1X3=2D1=3X4=2X2=1I1=1D2=1D2=3I1=2I3=1X1=1D1=1D2=2X1=1D4=1X2=1X1=2X1=1X1=2X1D1=1X1I4=1I1=1X2=1D1=1X2=2I1=1X2I3=1X1=2D3=2D2=1X1=1I3=1X1=1D1=1I2=1I2=3X3=1X2=1X1=1X1=1D2=1X1=1D3=1I1=1X3=1X1D2=1X2=1X2=2D1=1D3=1X2=5X1=2D1=1D2=2D2=1X1=1X3=1X1=1D2=1X1=1I1=1X2=1D2=2X2=1X1D2=1I1=1D1=2X1I3=3I2=1I2=1I4=2I2=2X1=2X2=1I1=1D1=4X1=1X3=1I1=1X2=2X3=1X1=1X1=1D2=2I4=1X2=1X1=1D2=1D3=1X1D2=1D1=1X2=2X3=2I1=1X1=1I1=1X1I2=1D1=1X3=1D2=1I1=2I3=1X2=4X1=3I2=2I2=1X1I1=3I3=1X1=1X1=1X3=1X1=2X2=1I1=1D2=2X1=1D2=1D1=2X1=1D1=1I1=1X1=1X1D2=4X2=2X1=1X2=2X1=1X3=2X1D1=1D2=1I2=1X1=1X1=1X1=1X1=1X2=1X3=1X2=2D3=2X1=1D2=1X2=3D1=2X2=2I2=1X2=1X2=1I1=1I2=1X1D1=2I3=1I2=1X1D1=1I2=1X1=1X1=1X1=1X1=1D1=1D2=2D2=1D3=3X1=1X1D2=1X3=1I1=1I1=1X1D1=2X2=1I2=1X1=1X1=1X6=3I2=1I1=1X3=2X1=1X1=3X1=1X2=1I4=1X1=1X1=3X3=1X1=1I2=1D1=1D3=1X1D1=1D3=1I1=1X1=1X3=2X1I3=1I1=2I1=1I1=1I1=1X1=2X1=1X1I1=1X2=3X3=1X1D1=1X2=1D1=1X2=1X1D2=1X1=1I2=1X1=1I1=3X1I2=1D1=1X1=1X1=1D2=1I2=1X1=1X3=1X1I4=1X1I4=1D1=4D2=1X1=1I1=2X1=1X2=2X2=1X2=1X2=1X1=2D1=1D4=1X1I4=2D3=2X2=1D2=1X1D1=1X1=1X1=1X1=1X1=1D1=1X1=1I3=1I1=1X2=3X3=1X1I1=2X2=1X1D2=1X2=1I5=3X1=3X1I2=1D1=3X1I1=1I1=1I2=2X3=1D1=1X1I2=2X1I3=1X1=1X2=1X3=1X1D1=1X2=1X1I4=1X1I1=2I2=1X1=1X3I3=1I1=1I3=2X1=1I1=1X1I1=3X1=1X2=1I1=1X1=1X1=1I2=1X2=1X2I3=1X1=5X1=2X1I2=5X1=1X1I1=1D1=1D1=2X3=3I2=1X1D1=2X3=2I2=1I1=1D2=1D1=2X1=3X1=1X1=1I2=3X2=1I2=1I2=1X4=1D2=1X1=2I1=1X2=2X1=1I1=1X5=1X1I1=1I2=1X2=1X3=1X1=1D2=1X2=1I1=3X1I1=2I1=3X2=2X1=1I2=1X3=1D1=2X1=2X2=1X2=4I1=1I1=1X2=1X1=2X2=2X1I5=1D1=1D1=1D3=2X1=2D3=1X1=2I1=1X1=1X1=2X1I2=1D2=1X1=2X2I1=1I3=1I3=1X1I1=1I4=2X1I2=1X1I1=1X1=1I1=4I2=1X1=2X1=2I1=1I3=2X2=2X1I2=1X2=1X2I3=1X1I2=1D1=1X5=2D2=1D1=1D4=3X1=2X1I3=2I3=1D1=1I2=1I1=3X1=2X1=1I1=1I2=2X1I2=1X2=2X3I1=1X1=1I3=1X1I3=1I1=2D4=3X1=2X1=1X1I3=1D2=1X2=2I1=1I2=1X1=1X1I4=1X1I1=1I2=1X1=1X1=2I2=1X2=1I1=1X2=1X3=1X1=1I1=5X1=2I4=1X1=1X1I1=1X2I2=1X1=1X2=2X1I3=1X1=1D3=1X2=1I1=1X1D1=1D2=1X1=3X1=3X3=1I1=1X1D1=1I2=1X1=2X1D1=1X1=1X1=1I1=1I2=1X3=1X1=1I1=3X1I4=1X1=2X1=1X2I1=1X2=1I1=1I1=1X1I1=1I2=1I3=1D1=2X1=1X5=2X1D1=2X1I2=2X1D2=1X1=3X1I1=1I1=1D2=1X2=1D1=1I1=1I1=1I1=2X1=1I1=1I1=1D3=3X1=1X1=3D1=2X2=1D2=1X1=1I2=3X2=1D6=1X1D1=2X2=1X1I1=1I3=1I1=1X1=3I2=1X1I1=1X3=1D1=1X2=1X1=1X1D1=1X1=1X1I1=2I5=2I3=1I1=1X1D1=2X2=2X3=1X1=1D1=1I1=2X1I1=1I1=2I2=2X1=1I3=2D3=2X2D4=2X1D3=1X4=2X2=1D1=1X1I2=1I1=1X1D2=1X1I1=1I2=1X1=1X1I1=2X1=2X1=2D1=1D5=2D2=1X2=1X2I2=1X2=1X1=1X1=2X1I5=1D1=1X4D1=1D3=1I2=1X1=1X6=2X1=1X2D1=2X1=2X1=1X1D1=2X3=3X1=1X2=1I2=2X2=2D1=1X1=1I1=1X1D1=1D1=1D2=1D2=1X1=1D2=1X2=1X1=1D1=2D1=4X4=2X3=1D4=1I2=2I2=2D1=1X4=2X2=2I2=1X2=1X1I1=1I1=1I1=1X3=1X1I2=1X1I3=1X1=1I1=3X2=4I1=1X2I21303=670I
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEOBID2QXNMPZYTVR2DSJ5TJLANCNFSM4SKPGLXA>
.
|
Thanks! But what is different about running pggb vs running edyeet followed by seqwish and smoothxg? From the description on the pggb page it seems like it runs these three tools precisely. |
Oh it's the same thing. I was just referring to the whole process. The
seqwish graph is very "literal" in representing the raw alignments. That
can make it hard to work with. You know this though.
…On Sat, Oct 10, 2020, 07:32 Eugene Goltsman ***@***.***> wrote:
Thanks! But what is different about running pggb vs running edyeet
followed by seqwish and smoothxg? From the description on the pggb page it
seems like it runs these three tools precisely.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDQEPKGUWQWVIYGUD6O7TSJ7WYJANCNFSM4SKPGLXA>
.
|
Ok, gotcha. Are there perhaps more aggressive smoothing setting in smoothxg that you would tweak to help get this properly represented in the graph? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Erik,
I'm using edyeet to induce a graph (w seqwish) on a small set of sequences that contain mostly large indels (~4-6kb). It seems like in this case edyeet is trying too hard to do base-level alignment where it should've either terminated or opened a large gap. In the first case below, there is a 5 kb inverted duplication (I know it because it was synthetically introduced) at pos 7544324 on Accn1, but the aligner is attempting to extend the alignment past the breakpoint following the initial ~50kb match.
Similarly, in the second case a 5kb inversion occurs at pos 7,573,027, but instead of terminating the alignment, edyeet is pushing through the area of virtually no identity. This leads to tiny graph segments and structures that later get called as bogus variants. I tried raising the -p cutoff to 95%, but that results in the entire 50kb block containing the inversion not being reported. It seems that this cutoff applies across the entire block. Is there anything else you could suggest tweaking that works at a local level, sort of like a gap extension vs mismatch penalty in smith-waterman ?
Thanks!
The text was updated successfully, but these errors were encountered: