Problem with Evidencemodeler #528

LemoAlex · 2021-01-11T08:28:37Z

Hello funannotate users,

I am currently using funanotate v1.8.4, installed through docker, and funannotate check and testing works without issues.

I am trying to run funannotate predict on some fish genome assembly.

So, when I run:

funannotate-docker predict -i ~softmasked.genome.fasta -o ./output1 -s "Species name" --transcript_evidence Transcriptome.fasta --optimize_augustus --other_gff /home/alexandre/funannotate/Species.transdecoder.gff3 --protein_evidence uniprot.reviewed.fasta uniprot-reviewed.fasta --organism other --rna_bam ~/funannotate/alignment.bam --weights codingquarry:1 --cpus 4

Everything runs smoothly until the EvidenceModeler part. Then, I get this message :

funannotate-EVM.log
EVM: partitioning input to ~ 35 genes per partition
Traceback (most recent call last):
File "/venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 433, in
partitions=args.no_partitions)
File "/venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py", line 203, in create_partitions
k, len(SeqRecords[k])))
File "/venv/lib/python3.7/site-packages/Bio/File.py", line 248, in getitem
record = self._proxy.get(self._offsets[key])
KeyError: 'scaffold_1'
[Jan 11 08:24 AM]: Evidence modeler has failed, exiting
Traceback (most recent call last):
File "/venv/bin/funannotate", line 713, in
main()
File "/venv/bin/funannotate", line 703, in main
mod.main(arguments)
File "/venv/lib/python3.7/site-packages/funannotate/predict.py", line 1730, in main
os.remove(EVM_out)
FileNotFoundError: [Errno 2] No such file or directory: '~/output1/predict_misc/evm.round1.gff3'

The EVM logfile (attached) does not show any error, so I am a bit confused with what's going on here.

Thanks for the help,
Best,
Alexandre
funannotate-EVM.log

nextgenusfs · 2021-01-11T16:41:14Z

This seems odd from logfile.

[01/11/21 08:23:44]: 9,557 total contigs; skipping -51,760 contigs with no genes

Do you have the predict logfile that I could look at as well?

LemoAlex · 2021-01-11T17:01:00Z

Yes, here it is attached.
funannotate-predict.log

Thanks for your help.

Alexandre

nextgenusfs · 2021-01-11T18:15:45Z

Hmm, okay thanks. I can't quite tell, but maybe looks like the command line around the --species argument perhaps isn't getting passed properly, ie if you look at the log file that is printing the command:

/venv/bin/funannotate predict -i /home/alexandre/funannotate/fish.masked.fa -o ./output1 -s Species name--transcript_evidence /home/alexandre/funannotate/Alignment/Tran.fa --optimize_augustus --other_gff /home/alexandre/funannotate/Tran.fa.transdecoder.gff3 --protein_evidence uniprot-catfish-reviewed.fasta uniprot-zebrafish-reviewed.fasta --organism other --rna_bam /home/alexandre/funannotate/sorted.bam --weights codingquarry:1 --cpus 4

I don't know how that would necessarily be causing problems per say with EVM.... but seems like maybe just a typo? In your initial command above there is clearly a space.

-s Species name--transcript_evidence /home/alexandre/funannotate/Alignment/Tran.fa

So assuming above is not related to error, you can try to run the EVM command from that same directory and maybe that will yield more info to stdout, ie:

funannotate-docker /venv/bin/python /venv/lib/python3.7/site-packages/funannotate/aux_scripts/funannotate-runEVM.py -w /home/alexandre/funannotate/output1/predict_misc/weights.evm.txt -c 4 -g /home/alexandre/funannotate/output1/predict_misc/gene_predictions.gff3 -d /home/alexandre/funannotate/output1/predict_misc/EVM -f /home/alexandre/funannotate/output1/predict_misc/genome.softmasked.fa -l ./output1/logfiles/funannotate-EVM.log -m 10 -o /home/alexandre/funannotate/output1/predict_misc/evm.round1.gff3 --EVM_HOME /venv/opt/evidencemodeler-1.1.1 -p /home/alexandre/funannotate/output1/predict_misc/protein_alignments.gff3 -t /home/alexandre/funannotate/output1/predict_misc/transcript_alignments.gff3

nextgenusfs · 2021-01-11T18:18:51Z

Actually that will probably fail based on what I have in the bash script, you can create a new bash wrapper like this that will just run the image (it is same just doesn't include call to funannotate):

#!/usr/bin/env bash

realpath() {
  OURPWD=$PWD
  cd "$(dirname "$1")"
  LINK=$(readlink "$(basename "$1")")
  while [ "$LINK" ]; do
    cd "$(dirname "$LINK")"
    LINK=$(readlink "$(basename "$1")")
  done
  REALPATH="$PWD/$(basename "$1")"
  cd "$OURPWD"
  echo "$REALPATH"
}

timezone() {
    if [ "$(uname)" == "Darwin" ]; then
        TZ=$(readlink /etc/localtime | sed 's#/var/db/timezone/zoneinfo/##')
    else
        TZ=$(readlink /etc/timezone)
    fi
    echo $TZ
}

# Only allocate tty if one is detected. See - https://stackoverflow.com/questions/911168
if [[ -t 0 ]]; then IT+=(-i); fi
if [[ -t 1 ]]; then IT+=(-t); fi

USER="$(id -u $(logname)):$(id -g $(logname))"
WORKDIR="$(realpath .)"
MOUNT="type=bind,source=${WORKDIR},target=${WORKDIR}"
TZ="$(timezone)"

exec docker run --rm "${IT[@]}" --user "${USER}" -e TZ="${TZ}" --workdir "${WORKDIR}" --mount "${MOUNT}" nextgenusfs/funannotate:latest "$@"

nextgenusfs · 2021-01-11T18:51:37Z

Here is a generalized version of this bash script -- you could run with any docker container: https://github.com/nextgenusfs/dw/

LemoAlex · 2021-01-11T22:35:58Z

Hello again,

Thanks for the answers.
I tried by removing the spaces in the species name, but I still get the same error .

I also tried running the EVM step using the bash script through dw, but again I get the exact same output as I did when running the whole pipeline. I also get (I had it before aswell), a single file called : genes.1.bed in the predict_mis/EVM folder. It feels like EVM can't go past the first scaffold, could this be possible?

Thanks,
Alexandre

nextgenusfs · 2021-01-11T22:38:11Z

~~I suppose it could be running out of RAM. Can you increase the RAM allocated to docker?~~

Nevermind, saw your log file and it is already 264 GB.

When you call this are all of the files you are passing to the docker container located in the same run directory?

Other thing to try would be to just move into the docker image interactively and then try to run the EVM workflow, ie docker run -it -v {need to mount filesystem folders} nextgenusfs/funannotate /bash/bin

And then lastly, I assume the test dataset runs on your system?

funannotate-docker test -t rna-seq --cpus XX

nextgenusfs · 2021-01-11T22:48:43Z

One other thing to try would be to delete all of the EVM temp files and then try to add --no-evm-partitions to your predict command (I just realized its not in the help menu) -- but this will run the partitioning differently if that is what is causing EVM to die.

nextgenusfs · 2021-01-11T23:18:08Z

But going back to my original thought in the EVM log file, that this line seems strange:

[01/11/21 08:23:44]: 9,557 total contigs; skipping -51,760 contigs with no genes

What is happening in the code is this:

    # sort the results by contig and position
    ChrGeneCounts = {}
    sortedResults = natsorted(Results, key=lambda x: (x[0], x[1]))
    with open(bedGenes, 'w') as outfile:
        for x in sortedResults:
            outfile.write('{}\t{}\t{}\t{}\t{}\t{}\n'.format(x[0], x[1], x[2],
                                                            x[3], x[4], x[5]))
            if not x[0] in ChrGeneCounts:
                ChrGeneCounts[x[0]] = 1
            else:
                ChrGeneCounts[x[0]] += 1
    ChrNoGenes = len(SeqRecords) - len(ChrGeneCounts)
    lib.log.debug('{:,} total contigs; skipping {:,} contigs with no genes'.format(len(SeqRecords), ChrNoGenes))

This suggests something is wrong with the input files (something I've not seen before), it it is saying that it somehow found >50k contigs that don't have genes associated with them.

This suggests that something is wrong with the headers on one of these input files -- can you validate that the input files have appropriate FASTA/Sequence headers? For example, the custom GFF that you are passing do they match the genome FASTA headers? And the BAM file as well, do the headers match?

LemoAlex · 2021-01-11T23:38:57Z

For example, the custom GFF that you are passing do they match the genome FASTA headers?

Ok, maybe the problem is there! My GFF file comes from Transdecoder, but I used the transcriptome as an input. So obviously, the transcriptome and the genome don't have the same headers. Could the problem come from there? What could I use as an alternative then?

Thanks,

Alexandre

nextgenusfs · 2021-01-11T23:42:26Z

So if the transcripts aren't aligned to the genome reference then it shouldn't be passed as GFF_other. If you have transcripts from Transdecoder that you want to align, you can pass those as FASTA format to --transcript_evidence -- this option takes multiple inputs as space delimited.

Maybe its not obvious -- but the pipeline might work a lot better if you let funannotate train run Trinity/PASA/transdecoder. That way those tools get run in a way that funannotate knows the format....

…do not match and exit #528

LemoAlex · 2021-01-25T12:40:29Z

Hi,

Sorry for the long delay. Just to let you know that I ran it as you suggested and I was able to finish the whole pipeline successfully, so thank you!

Best,
Alexandre

nextgenusfs pushed a commit that referenced this issue Jan 12, 2021

predict; parse --other_gff contigs compare to assembly, warn user if …

2cbf7fe

…do not match and exit #528

nextgenusfs closed this as completed Apr 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with Evidencemodeler #528

Problem with Evidencemodeler #528

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021 •

edited

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021 •

edited

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 25, 2021

Problem with Evidencemodeler #528

Problem with Evidencemodeler #528

Comments

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021 • edited

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021 • edited

nextgenusfs commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 11, 2021

nextgenusfs commented Jan 11, 2021

LemoAlex commented Jan 25, 2021

nextgenusfs commented Jan 11, 2021 •

edited

nextgenusfs commented Jan 11, 2021 •

edited