Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Irregular last results #2

Closed
stephenturner opened this issue Jul 8, 2015 · 4 comments
Closed

Irregular last results #2

stephenturner opened this issue Jul 8, 2015 · 4 comments

Comments

@stephenturner
Copy link

Hi Joseph. I ran across this tool and wanted to give it a spin. I have a plasmid assembly and the dot plot tells me it's circular, but the ends are messy.

I'm trying to run apc on a single 90kb contig and I get the note that I have irregular LAST results. I tried running the commands that appear to be in the perl script, and I look at my alignment file, and this is what I see:

$ lastdb tmp ../hyb-plas120-nanopgm.fasta

$ lastal -s1 -x300 -f0 -T1 tmp ../hyb-plas120-nanopgm.fasta
# LAST version 548
#
# a=7 b=1 A=7 B=1 c=100000 e=40 d=15 x=300 y=9 z=300
# R=10 u=0 s=1 T=1 m=10 l=1 n=10 k=1 i=8192 w=1000 t=0.910239 g=1 j=3 Q=0
# tmp
# Reference sequences=1 normal letters=94654
#
#    A  C  G  T
# A  1 -1 -1 -1
# C -1  1 -1 -1
# G -1 -1  1 -1
# T -1 -1 -1  1
#
# Coordinates are 0-based.  For - strand matches, coordinates
# in the reverse complement of the 2nd sequence are used.
#
# score name1   start1  alnSize1    strand1 seqSize1    name2   start2  alnSize2    strand2 seqSize2    blocks
#
# batch 0
94654   plasmid_120_nanopore_pgm_corrected_pilon    0   94654   +   94654   plasmid_120_nanopore_pgm_corrected_pilon    0   94654   +   94654   94654
# Query sequences=1

That is, a single line, not 3, which is what the script was expecting. Any idea what's going on here? Thanks.

@jfass
Copy link
Owner

jfass commented Jul 8, 2015

Hey Stephen,

Thanks for trying it! This would suggest to me that there is no overlap at
the ends (which really deserves a better error message!). Have you tried
aligning your plasmid sequence to itself using Gepard
http://www.helmholtz-muenchen.de/en/icb/software/gepard/index.html? It'll
create a dotplot, which should pretty clearly show any self-overlap
(diagonals in the upper right and lower left corners) ...

~Joe

On Wed, Jul 8, 2015 at 10:59 AM, Stephen Turner notifications@github.com
wrote:

Hi Joseph. I ran across this tool and wanted to give it a spin. I have a
plasmid assembly and the dot plot tells me it's circular, but the ends are
messy.

I'm trying to run apc on a single 90kb contig and I get the note that I
have irregular LAST results. I tried running the commands that appear to be
in the perl script, and I look at my alignment file, and this is what I see:

$ lastdb tmp ../hyb-plas120-nanopgm.fasta

$ lastal -s1 -x300 -f0 -T1 tmp ../hyb-plas120-nanopgm.fasta

LAST version 548

a=7 b=1 A=7 B=1 c=100000 e=40 d=15 x=300 y=9 z=300

R=10 u=0 s=1 T=1 m=10 l=1 n=10 k=1 i=8192 w=1000 t=0.910239 g=1 j=3 Q=0

tmp

Reference sequences=1 normal letters=94654

A C G T

A 1 -1 -1 -1

C -1 1 -1 -1

G -1 -1 1 -1

T -1 -1 -1 1

Coordinates are 0-based. For - strand matches, coordinates

in the reverse complement of the 2nd sequence are used.

score name1 start1 alnSize1 strand1 seqSize1 name2 start2 alnSize2 strand2 seqSize2 blocks

batch 0

94654 plasmid_120_nanopore_pgm_corrected_pilon 0 94654 + 94654 plasmid_120_nanopore_pgm_corrected_pilon 0 94654 + 94654 94654

Query sequences=1

That is, a single line, not 3, which is what the script was expecting. Any
idea what's going on here? Thanks.


Reply to this email directly or view it on GitHub
#2.

Joseph Fass
Lead Data Analyst
UC Davis Genome Center - Bioinformatics Core
http://bioinformatics.ucdavis.edu/
jnfass@ucdavis.edu
phone ~ 530.752.2698

@stephenturner
Copy link
Author

Joe,

I did actually. I'm still puzzling over how to interpret (and deal with) the very large overlap, and the excess diagonals I see toward the end of the sequence (repetitive sequence?). But if I'm interpreting this correctly, the top-right and bottom-left diagonals suggest circularity, albeit with messy ends.

image

@jfass
Copy link
Owner

jfass commented Jul 9, 2015

Now that's bizarre. I'm having trouble wrapping my head around that graph.
It seems like there may be a (long) tandem duplication that's been
collapsed into one copy toward the end of the contig. Does that make sense?
So the diagonal that starts next to the "021_dim" on the y-axis indicates
that the penultimate sequence (let's say, "tuvw" in the alphabet that
covers the whole contig) overlaps the first sequence in the contig
("abcd"), but the following sequence "xyz" doesn't match "efg" ...
instead it matches "uvw" again. So either some sequence has been
incorrectly duplicated at the end of the contig, or a tandem duplication
(of 10-20kbp) has been collapsed at the start of the contig. If I'm seeing
it correctly.

How'd your plasmid name get flipped on the y-axis, btw?

What do you think? If there's a way to give a more meaningful error
message, to help figure out what's going on here, I'd be happy to add it
in. Actually, Keith Bradnam rewrote my awful Perl, and I have yet to apply
his suggestions. He also found another tool or two ... something from AMOS,
I believe, and a more recent tool called "Circulator" (which I can't find
at the moment. I just stumbled across this
https://github.com/sheikki/identifyCircularContigs/blob/master/identifyCircularContigs.
But I have no idea if any of them will be able to figure out your situation.

~Joe

On Thu, Jul 9, 2015 at 2:19 AM, Stephen Turner notifications@github.com
wrote:

Joe,

I did actually. I'm still puzzling over how to interpret (and deal with)
the very large overlap, and the excess diagonals I see toward the end of
the sequence (repetitive sequence?). But if I'm interpreting this
correctly, the top-right and bottom-left diagonals suggest circularity,
albeit with messy ends.

[image: image]
https://cloud.githubusercontent.com/assets/460076/8591991/fa56f698-25f9-11e5-84bb-631ca1531c98.png


Reply to this email directly or view it on GitHub
#2 (comment).

Joseph Fass
Lead Data Analyst
UC Davis Genome Center - Bioinformatics Core
http://bioinformatics.ucdavis.edu/
jnfass@ucdavis.edu
phone ~ 530.752.2698

@stephenturner
Copy link
Author

Thanks for those suggestions. I actually ran into circlator (no "u") prior to apc, but with outputs undocumented it's not immediately useful. I took identifyCircularContigs for a spin, but need to dust off my perl to get this working.

But you're right, I've got some odd things happening at the end of my assembly. This was a single-contig nanopore-only assembly from celera polished with ion torrent reads using pilon. May warrant another minION run with newer flow cells.

RE: axis, just used the most recent version of gepard and it was like this out of the box.

Closing this issue because I believe apc did just what it should have when it reported irregular last results. Only suggestion: echo to screen the command used to run lastdb/lastal. Or add a -verbose option to echo to screen all the commands issued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants