Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeMoMa error #53

Open
sujianed opened this issue Jan 23, 2024 · 8 comments
Open

GeMoMa error #53

sujianed opened this issue Jan 23, 2024 · 8 comments
Labels
GeMoMa Everything what concerns GeMoMa

Comments

@sujianed
Copy link

code:
java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=200 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=ll a=Leptobrachium_leishanense.longest_isoform.gff3 g=ll.fa.gz
s=own i=xt a=Xenopus_tropicalis.longest_isoform.gff3 g=xt.fa.gz

error:Searching for the new GeMoMa updates ...
You are using the latest GeMoMa version.

Parameters of tool "GeMoMa pipeline" (GeMoMaPipeline, version: 1.9):
t - target genome (Target genome file (FASTA), type = fasta,fa,fas,fna,fasta.gz,fa.gz,fas.gz,fna.gz) = contig.hardmasked.fa
The following parameter(s) can be used multiple times:
s (1) - species (data for reference species, range={own, pre-extracted}, default = own) = own
Parameters for selection "own":
i (1) - ID (ID to distinguish the different reference species, OPTIONAL) = ll
a (1) - annotation (Reference annotation file (GFF or GTF), which contains gene models annotated in the reference genome, type = gff,gff3,gtf,gff.gz,gff3.gz,gtf.gz) = Leptobrachium_leishanense.longest_isoform.gff3
g (1) - genome (Reference genome file (FASTA), type = fasta,fa,fas,fna,fasta.gz,fa.gz,fas.gz,fna.gz) = ll.fa.gz
w (1) - weight (the weight can be used to prioritize predictions from different input files; each prediction will get an additional attribute sumWeight that can be used in the filter, valid range = [0.0, 1000.0], default = 1.0, OPTIONAL) = 1.0
ai (1) - annotation info (annotation information of the reference, tab-delimted file containing at least the columns transcriptName, GO and .*defline, type = tabular, OPTIONAL) = null
Parameters for selection "pre-extracted":
i (1) - ID (ID to distinguish the different reference species, OPTIONAL) = null
c (1) - cds parts (The query CDS parts file (protein FASTA), i.e., the CDS parts that have been searched in the target genome using for instance BLAST or mmseqs, type = fasta,fa,fas,fna) = null
a (1) - assignment (The assignment file, which combines CDS parts to proteins, type = tabular, OPTIONAL) = null
w (1) - weight (the weight can be used to prioritize predictions from different input files; each prediction will get an additional attribute sumWeight that can be used in the filter, valid range = [0.0, 1000.0], default = 1.0, OPTIONAL) = 1.0
ai (1) - annotation info (annotation information of the reference, tab-delimited file containing at least the columns transcriptName, GO and .*defline, type = tabular, OPTIONAL) = null
s (2) - species (data for reference species, range={own, pre-extracted}, default = own) = own
Parameters for selection "own":
i (2) - ID (ID to distinguish the different reference species, OPTIONAL) = mm
a (2) - annotation (Reference annotation file (GFF or GTF), which contains gene models annotated in the reference genome, type = gff,gff3,gtf,gff.gz,gff3.gz,gtf.gz) = Mus_musculus.longest_isoform.gff3
g (2) - genome (Reference genome file (FASTA), type = fasta,fa,fas,fna,fasta.gz,fa.gz,fas.gz,fna.gz) = mm.fa.gz
w (2) - weight (the weight can be used to prioritize predictions from different input files; each prediction will get an additional attribute sumWeight that can be used in the filter, valid range = [0.0, 1000.0], default = 1.0, OPTIONAL) = 1.0
ai (2) - annotation info (annotation information of the reference, tab-delimted file containing at least the columns transcriptName, GO and .*defline, type = tabular, OPTIONAL) = null
Parameters for selection "pre-extracted":
i (2) - ID (ID to distinguish the different reference species, OPTIONAL) = null
c (2) - cds parts (The query CDS parts file (protein FASTA), i.e., the CDS parts that have been searched in the target genome using for instance BLAST or mmseqs, type = fasta,fa,fas,fna) = null
a (2) - assignment (The assignment file, which combines CDS parts to proteins, type = tabular, OPTIONAL) = null
w (2) - weight (the weight can be used to prioritize predictions from different input files; each prediction will get an additional attri
...skipping...
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
at java.base/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
at projects.gemoma.GeMoMaPipeline$1.run(GeMoMaPipeline.java:609)
at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:1409)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
GeMoMa for species 0 (ll) split=114 throws an Exception
GeMoMa for species 1 (mm) split=25 throws an Exception
GeMoMa for species 0 (ll) split=108 throws an Exception
GeMoMa for species 1 (mm) split=3 throws an Exception
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133)
at java.base/java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1454)
at projects.gemoma.GeMoMaPipeline$1.run(GeMoMaPipeline.java:609)
at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:1409)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
GeMoMa for species 0 (ll) split=196 throws an Exception
GeMoMa for species 1 (mm) split=31 throws an Exception
GeMoMa for species 1 (mm) split=10 throws an Exception
GeMoMa for species 1 (mm) split=90 throws an Exception
GeMoMa for species 1 (mm) split=99 throws an Exception
GeMoMa for species 0 (ll) split=10 throws an Exception
GeMoMa for species 0 (ll) split=85 throws an Exception
GeMoMa for species 1 (mm) split=57 throws an Exception
GeMoMa for species 0 (ll) split=113 throws an Exception
GeMoMa for species 0 (ll) split=153 throws an Exception
Error, cannot open file gemoma/final_annotation.gff at /share/work/biosoft/EVidenceModeler/latest/EvmUtils/misc/GeMoMa_gff_to_gff3.pl line 23.

@JensKeilwagen
Copy link
Contributor

Hi sujianed,

thanks a lot for your interest in GeMoMa. I would be happy to help you.
Unfortunately, it seems that you have cut too much information. In addition, the error log and the standard out seems to be mixed. Could you please send more information, but split into standard out and error log?

best regards, Jens

@JensKeilwagen JensKeilwagen added the GeMoMa Everything what concerns GeMoMa label Jan 23, 2024
@sujianed
Copy link
Author

I can't seem to find the wrong information.When I use plants Poplustri Chocapa, Oriza Sativa, Japonica Glup, Arrábidop Staliana, Glesin Max, Vitis Vinnifila, are Oak's. note:All data is downloaded from ensembl
java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=10 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=ath a=Arabidopsis_thaliana.longest_isoform.gff3 g=$database/homo_protein/ath.fa.gz
s=own i=gma a=Glycine_max.longest_isoform.gff3 g=$database/homo_protein/gma.fa.gz
s=own i=vin a=Vitis_vinifera.longest_isoform.gff3 g=$database/homo_protein/vin.fa.gz
s=own i=ptr a=Populus_trichocarpa.longest_isoform.gff3 g=$database/homo_protein/ptr.fa.gz
s=own i=osa a=Oryza_sativa.longest_isoform.gff3 g=$database/homo_protein/osa.fa.gz

but,When I use animal Danio_rerio, Homo_sapiens, Leptobrachium_leishanense, Mus_musculus, Xenopus_tropicalis, it's not OK
java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=200 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=dr a=Danio_rerio.longest_isoform.gff3 g=dr.fa.gz
s=own i=hs a=Homo_sapiens.longest_isoform.gff3 g=hs.fa.gz
s=own i=ll a=Leptobrachium_leishanense.longest_isoform.gff3 g=ll.fa.gz
s=own i=mm a=Mus_musculus.longest_isoform.gff3 g=mm.fa.gz
s=own i=xt a=Xenopus_tropicalis.longest_isoform.gff3 g=xt.fa.gz

Oddly enough, when I use "java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=200 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=ll a=Leptobrachium_leishanense.longest_isoform.gff3 g=ll.fa.gz" it's OK

"java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=200 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=xt a=Xenopus_tropicalis.longest.longest_isoform.gff3 g=xt.fa.gz" it's OK

@JensKeilwagen
Copy link
Contributor

JensKeilwagen commented Jan 24, 2024

Hi sujianed,

I'm sorry that you still have problems and it is hard for me to follow.

However, I realized

  1. That you could run GeMoMa in principle. Hence, it should be no problem with installing GeMoMa, which is great.
  2. That you could run GeMoMa on a part of the complete input. You wrote that it finished successfully with input Leptobrachium_leishanense.longest_isoform.gff3 and also with input Xenopus_tropicalis.longest.longest_isoform.gff3.
  3. That you reported a problem with input "ll" and "mm" (cf. first post, interestingly the command line (ll, xt) and the output (ll, mm) are not matching). Hence, I would assume there is a problem with input "mm".

I would recommend to run GeMoMa independently for each input to see whether there is a problem for one input.

If you would send more information from the log, we could probably track the error more easily.

best regards, Jens

@sujianed
Copy link
Author

sujianed commented Feb 2, 2024

@sujianed
Copy link
Author

sujianed commented Feb 2, 2024

##################################

mkdir 08.GeMoMa
cd 08.GeMoMa

#同源物种蛋白,GeMoMa比对

ln -s /work/annotion_edta/edta_anno/09.RepeatFinal/contig.hardmasked.fa contig.hardmasked.fa

#Danio rerio斑马鱼
#Homo sapiens智人
#Lithobates catesbeianus美洲牛蛙 NCBI上没有,只有Aquarana catesbeiana
#Mus musculus小家鼠
#Nanorana parkeri高山倭蛙
#Xenopus laevis非洲爪蟾
#Xenopus tropicalis热带爪蟾
ln -s /work/data/homo_protein/ensemble/v102/Danio_rerio.GRCz11.102.chr.gff3.gz dr.gff3.gz
ln -s /work/data/homo_protein/ensemble/v102/Danio_rerio.GRCz11.dna.toplevel.fa.gz dr.fa.gz
ln -s /work/data/homo_protein/ensemble/v102/Homo_sapiens.GRCh38.102.chr.gff3.gz hs.gff3.gz
ln -s /work/data/homo_protein/ensemble/v102/Homo_sapiens.GRCh38.dna.toplevel.fa.gz hs.fa.gz
ln -s /work/data/homo_protein/ensemble/v102/Leptobrachium_leishanense.ASM966780v1.102.chr.gff3.gz ll.gff3.gz
ln -s /work/data/homo_protein/ensemble/v102/Leptobrachium_leishanense.ASM966780v1.dna.toplevel.fa.gz ll.fa.gz
ln -s /work/data/homo_protein/ensemble/v102/Mus_musculus.GRCm38.102.chr.gff3.gz mm.gff3.gz
ln -s /work/data/homo_protein/ensemble/v102/Mus_musculus.GRCm38.dna.toplevel.fa.gz mm.fa.gz
ln -s /work/data/homo_protein/ensemble/v102/Xenopus_tropicalis.Xenopus_tropicalis_v9.1.102.chr.gff3.gz xt.gff3.gz
ln -s /work/data/homo_protein/ensemble/v102/Xenopus_tropicalis.Xenopus_tropicalis_v9.1.dna.toplevel.fa.gz xt.fa.gz

agat_sp_filter_feature_by_attribute_value.pl --gff dr.gff3.gz --attribute biotype --value protein_coding -t '!' -o Danio_rerio.protein_coding.gff3
agat_sp_keep_longest_isoform.pl --gff Danio_rerio.protein_coding.gff3 -o Danio_rerio.longest_isoform.gff3

agat_sp_filter_feature_by_attribute_value.pl --gff hs.gff3.gz --attribute biotype --value protein_coding -t '!' -o Homo_sapiens.protein_coding.gff3
agat_sp_keep_longest_isoform.pl --gff Homo_sapiens.protein_coding.gff3 -o Homo_sapiens.longest_isoform.gff3

agat_sp_filter_feature_by_attribute_value.pl --gff ll.gff3.gz --attribute biotype --value protein_coding -t '!' -o Leptobrachium_leishanense.protein_coding.gff3
agat_sp_keep_longest_isoform.pl --gff Leptobrachium_leishanense.protein_coding.gff3 -o Leptobrachium_leishanense.longest_isoform.gff3

agat_sp_filter_feature_by_attribute_value.pl --gff mm.gff3.gz --attribute biotype --value protein_coding -t '!' -o Mus_musculus.protein_coding.gff3
agat_sp_keep_longest_isoform.pl --gff Mus_musculus.protein_coding.gff3 -o Mus_musculus.longest_isoform.gff3

agat_sp_filter_feature_by_attribute_value.pl --gff xt.gff3.gz --attribute biotype --value protein_coding -t '!' -o Xenopus_tropicalis.protein_coding.gff3
agat_sp_keep_longest_isoform.pl --gff Xenopus_tropicalis.protein_coding.gff3 -o Xenopus_tropicalis.longest_isoform.gff3

#GeMoMa 同源比对预测基因
java -jar /share/work/biosoft/GeMoMa/GeMoMa-1.9/GeMoMa-1.9.jar CLI GeMoMaPipeline
threads=200 AnnotationFinalizer.r=NO p=false o=true tblastn=false
t=contig.hardmasked.fa outdir=gemoma
s=own i=dr a=Danio_rerio.longest_isoform.gff3 g=dr.fa.gz
s=own i=hs a=Homo_sapiens.longest_isoform.gff3 g=hs.fa.gz
s=own i=ll a=Leptobrachium_leishanense.longest_isoform.gff3 g=ll.fa.gz
s=own i=mm a=Mus_musculus.longest_isoform.gff3 g=mm.fa.gz
s=own i=xt a=Xenopus_tropicalis.longest_isoform.gff3 g=xt.fa.gz

#转换数据
GeMoMa_gff_to_gff3.pl gemoma/final_annotation.gff > gemoma.evm.format.gff3

@sujianed
Copy link
Author

sujianed commented Feb 2, 2024

@JensKeilwagen
Copy link
Contributor

Hi sujianed,

Thanks a lot for the information. Unfortunately, I'm not sure whether you still have the problems or whether you solved it.

I did not see the content of the output file:

Uploading 08.GeMoMa.sh.o.txt…
For me, this appears as a link to the issue.

best regards, Jens

@JensKeilwagen
Copy link
Contributor

Hi sujianed,
any news from your side?
Could I close the issue?
best regards, Jens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GeMoMa Everything what concerns GeMoMa
Projects
None yet
Development

No branches or pull requests

2 participants