-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funannotate train crash at Pasa step with iso-seq + RNA-seq #326
Comments
Will be helpful to post the erroring part of the logfiles for train and for PASA. |
Of course, see attachment. |
Looks like maybe transcripts didn't get indexed, i.e.
Is this only a problem when you use Illumina and PacBio? |
I deleted the pasa directory and reran funannotate train, this time removing the iso-seq reads (don't know if that's enough, if not I can rerun the whole thing overnight):
The error file was too big, so I sent you a google-drive link by email |
Okay thanks. Will try to take a look after work. And one other follow-up, the RNA-seq tests run without error on your system? ie
|
Thanks Jon! I'm running the test, the prediction is still going, but the
training proceeded without error.
I'll run the training overnight with only the illumina reads and see if it
helps isolate the issue to including the iso-seq.
…--
-------------------------------------------------------------------------------------------------------------------------
Eyal Ben-David, PhD
Post-doctoral Scholar
Lab of Prof. Leonid Kruglyak
Department of Human Genetics
David Geffen School of Medicine
UCLA
On Mon, Sep 23, 2019 at 1:04 PM Jon Palmer ***@***.***> wrote:
Okay thanks. Will try to take a look after work. And one other follow-up,
the RNA-seq tests run without error on your system? ie
funannotate test -i rna-seq --cpus 2
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#326?email_source=notifications&email_token=ACQHDYLOS6Y64R5MPNP2LWLQLEONPA5CNFSM4IZLK4YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MC6ZQ#issuecomment-534261606>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQHDYJUT7OL6I5QS65QQ5DQLEONPANCNFSM4IZLK4YA>
.
|
The error in the log file seems to say an alignment exists but the sequence doesn't. This is probably due to re-running the command with different input, ie funannotate will re-use any data that is available, so if you ran with Illumina + PacBio (which failed) and then re-ran with only Illumina, it will still re-use the alignments if they are present. Lets see if the only illumina works. The other thing to try would be just the PacBio -- it will help me to fix if I know that error only happens when you pass both RNA-seq data. |
Any updates here? Did you find a solution that worked? I'm recalling something about Pacbio fasta headers that might causing some problems with PASA. |
Thanks Jon, I ended up just using the Illumina data and dropping the iso-seq, since that seemed to work. |
I'd like to figure out what was causing the error -- any chance you'd be able to sub sample the data to a smaller reproducible dataset that still causes the error? That is of course if you'd be willing to share this smaller test set with me confidentially so I can determine what the error is? |
I totally understand, and I would if I could. This isn't my own dataset,
unfortunately.
I will retry annotating a newer assembly version in a few weeks, please let
me know if I should close the issue in the meantime.
…--
-------------------------------------------------------------------------------------------------------------------------
Eyal Ben-David, PhD
Post-doctoral Scholar
Lab of Prof. Leonid Kruglyak
Department of Human Genetics
David Geffen School of Medicine
UCLA
On Wed, Nov 6, 2019 at 2:45 PM Jon Palmer ***@***.***> wrote:
I'd like to figure out what was causing the error -- any chance you'd be
able to sub sample the data to a smaller reproducible dataset that still
causes the error? That is of course if you'd be willing to share this
smaller test set with me confidentially so I can determine what the error
is?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#326?email_source=notifications&email_token=ACQHDYOT7MXJTDPYOWOMGIDQSNCI7A5CNFSM4IZLK4YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIIDCQ#issuecomment-550535562>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQHDYLBLQPUOJBPRD4RW43QSNCI7ANCNFSM4IZLK4YA>
.
|
I'll leave this open for now -- but would still like to hear if still an issue with the latest release. thanks! |
Sorry for the late response. I asked but my collaborators are uncomfortable with sharing any data for debugging. I don't agree with their reluctance but it is what it is.
|
Another small issue, it's possible that because the output of the Isoseq QC pipeline is fasta, and because funannotate train is importing it as if it were fastq, I lose half the reads (at least it reports half the reads in the log files). |
Looks like the same problem -- what does a grep of the sequence headers for |
Thanks for the help. I've run it again without the Isoseq data but when I finish I'll go back and try to debug it looking at that transcript. |
This is the output of alignment_assembly_loading.out for this cluster. I couldn't find any other reference to asmbl_16018 (the number has changed since I had to rerun it):
|
Don't know if this helps, but searching for these chains in the blat gff yields the following lines:
transcript/3505 (from __all_transcripts fasta):
transcript/2537 is the following:
|
Are there forward slashes in the fasta header?? ie |
yes, based on __all_transcripts fasta there are 97 transcripts with a forward slash in them. Running the training pipeline without the iso-seq data doesn't result in any transcripts with that naming convention. |
I would just rename the isoseq data with something simple and it will likely work. |
Thank you, I will try and report here back.
Fyi, I just used the fasta output file from the standard iso-seq3 pipeline
from pac-bio, so if this is indeed the issue I'd recommend either adding a
preprocessing step or documentation about this.
…On Tue, Dec 24, 2019, 11:00 Jon Palmer ***@***.***> wrote:
I would just rename the isoseq data with something simple and it will
likely work.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#326?email_source=notifications&email_token=ACQHDYLME4DN7LCWOTY3PWLQ2JL4ZA5CNFSM4IZLK4YKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHTRFGA#issuecomment-568791704>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACQHDYI6JWKPFIGASLA7DGDQ2JL4ZANCNFSM4IZLK4YA>
.
|
Yeah if that fixes it then I can just add a fasta header rename step. There is one already but it must not be working properly. |
After replacing the forward slashes with underscores in the Iso-seq fastq file, funannotate train completes successfully. Thank you for your support, closing the issue. |
Okay, added this via 1ee07cb. Also updated the parser to something simpler, so should be a little faster in processing the long reads. |
I've installed the latest funannotate docker image (changed manually the path to 1.6.0 in the installation instructions).
Am running the following command to assemble a genome using pacbio iso-seq and single end RNA seq as evidence:
funannotate train -i Dba.dovetail.masked.fasta -s *.fastq.gz --pacbio_isoseq DbaIsoSeq.fasta --species "Dunaliella" --cpus 2 --max_intronlen 30000 -o out
However, I'm getting a crash in Pasa with the following error:
The text was updated successfully, but these errors were encountered: