Skip to content

Add annotation workflow with Braker3#768

Merged
mvdbeek merged 46 commits intogalaxyproject:mainfrom
rlibouba:add_braker3_workflow
Dec 8, 2025
Merged

Add annotation workflow with Braker3#768
mvdbeek merged 46 commits intogalaxyproject:mainfrom
rlibouba:add_braker3_workflow

Conversation

@rlibouba
Copy link
Collaborator

Hello,
I would like to propose this workflow for annotating a genome from the GTN "Genome annotation with Braker3 ".
Thank you! Have a nice day!
Romane

@github-actions
Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Genome_annotation_with_braker3.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BUSCO lineage:

        • step_state: scheduled
      • Step 2: Genome sequence masked:

        • step_state: scheduled
      • Step 11: BUSCO on the predicted protein sequences:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              adv {"contig_break": "10", "evalue": "0.001", "limit": "3"}
              busco_mode {"__current_case__": 1, "mode": "tran"}
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              lineage {"__current_case__": 1, "lineage_dataset": "mucorales_odb10", "lineage_mode": "select_lineage"}
              lineage_conditional {"__current_case__": 0, "cached_db": "v5", "selector": "cached"}
              outputs ["short_summary", "image", "gff", "missing"]
      • Step 3: Alignments from RNA-seq:

        • step_state: scheduled
      • Step 4: Protein sequences:

        • step_state: scheduled
      • **Step 5: Fasta Statistics **:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is ok

            Command Line:

            • python '/tmp/shed_dir/toolshed.g2.bx.psu.edu/repos/iuc/fasta_stats/0dbb995c7d35/fasta_stats/fasta-stats.py' --fasta '/tmp/tmpd6ow920l/files/a/1/4/dataset_a146fbe2-8e88-4748-a103-fbaf427b6931.dat' --stats_output '/tmp/tmpd6ow920l/job_working_directory/000/4/outputs/dataset_ff264281-2a08-4cf4-93f8-ba97f261687f.dat'

            Exit Code:

            • 0

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              gaps_option false
              genome_size None
      • Step 6: BUSCO on the genome sequences:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • busco --in '/tmp/tmpd6ow920l/files/a/1/4/dataset_a146fbe2-8e88-4748-a103-fbaf427b6931.dat' --mode 'geno' --out busco_galaxy --cpu ${GALAXY_SLOTS:-4} --evalue 0.001 --limit 3 --contig_break 10  --offline --download_path /cvmfs/data.galaxyproject.org/byhand/busco/v5  --lineage_dataset 'mucorales_odb10'  --miniprot  && mkdir BUSCO_summaries && ls -l busco_galaxy/run_*/ && cp busco_galaxy/short_summary.*.txt BUSCO_summaries/ && generate_plot.py -wd BUSCO_summaries -rt specific  && echo "##gff-version 3" > busco_output.gff && cat busco_galaxy/run_*/busco_sequences/*busco_sequences/*.gff >> busco_output.gff 2> /dev/null || true

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              adv {"contig_break": "10", "evalue": "0.001", "limit": "3"}
              busco_mode {"__current_case__": 0, "mode": "geno", "use_augustus": {"__current_case__": 1, "use_augustus_selector": "miniprot"}}
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              lineage {"__current_case__": 1, "lineage_dataset": "mucorales_odb10", "lineage_mode": "select_lineage"}
              lineage_conditional {"__current_case__": 0, "cached_db": "v5", "selector": "cached"}
              outputs ["short_summary", "image", "gff", "missing"]
      • Step 7: Braker3:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Command Line:

            • if [ -z "$GENEMARK_PATH" ] ; then echo "GeneMark is not installed on this Galaxy server." >&2 ; exit 1 ; fi && if [ ! -f "$GENEMARK_PATH/gmes/gmes_petap.pl" ] ; then echo "GeneMark is not installed properly on this Galaxy server." >&2 ; exit 1 ; fi &&   export PATH="$GENEMARK_PATH/../tools/:$PATH" &&  cp -r "$AUGUSTUS_CONFIG_PATH/" augustus_dir/ && export AUGUSTUS_CONFIG_PATH=`pwd`/augustus_dir/ &&  braker.pl --genome '/tmp/tmpd6ow920l/files/a/1/4/dataset_a146fbe2-8e88-4748-a103-fbaf427b6931.dat'   --bam /tmp/tmpd6ow920l/files/7/e/0/dataset_7e0f0928-399f-4896-a80d-a81cc5b3f027.dat  --prot_seq /tmp/tmpd6ow920l/files/a/c/8/dataset_ac8778fd-ee01-41ea-b719-3d29852470af.dat   --gff3  --fungus   --rounds 5          --alternatives-from-evidence=true    --gc_probability 0.001 --downsampling_lambda 2   --threads  ${GALAXY_SLOTS:-2} --useexisting

            Exit Code:

            • 1

            Standard Error:

            • /tmp/tmpd6ow920l/job_working_directory/000/6/tool_script.sh: line 23: /gmes/gmes_petap.pl: No such file or directory
              

            Standard Output:

            • braker.pl version 3.0.8
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              advanced {"alternatives_from_evidence": true, "eval": null, "eval_pseudo": null, "filterOutShort": false}
              augustus {"AUGUSTUS_ab_initio": false, "crf": false, "keepCrf": false, "rounds": "5"}
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              dev {"downsampling_lambda": "2", "gc_probability": "0.001", "gm_max_intergenic": null, "min_contig": null, "splice_sites": null}
              evidences {"bam": {"values": [{"id": 2, "src": "hda"}]}, "prot_seq": {"values": [{"id": 3, "src": "hda"}]}}
              genemark {"fungus": true}
              output_format "gff3"
              softmasking true
              species None
      • Step 8: GFFRead:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              chr_replace None
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              decode_url false
              expose false
              filtering None
              full_gff_attribute_preservation false
              gffs {"__current_case__": 0, "gff_fmt": "none"}
              maxintron None
              merging {"__current_case__": 0, "merge_sel": "none"}
              reference_genome {"__current_case__": 2, "fa_outputs": ["-y pep.fa"], "genome_fasta": {"values": [{"id": 1, "src": "hda"}]}, "ref_filtering": null, "source": "history"}
              region {"__current_case__": 0, "region_filter": "none"}
      • Step 9: JBrowse:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fasta"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              action {"__current_case__": 0, "action_select": "create"}
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              gencode "1"
              jbgen {"aboutDescription": "", "defaultLocation": "", "hideGenomeOptions": false, "shareLink": true, "show_menu": true, "show_nav": true, "show_overview": true, "show_tracklist": true, "trackPadding": "20"}
              plugins {"BlastView": true, "ComboTrackSelector": false, "GCContent": false}
              reference_genome {"__current_case__": 1, "genome": {"values": [{"id": 1, "src": "hda"}]}, "genome_type_select": "history"}
              standalone "minimal"
              track_groups [{"__index__": 0, "category": "Braker3 annotation", "data_tracks": [{"__index__": 0, "data_format": {"__current_case__": 2, "annotation": {"values": [{"id": 10, "src": "hda"}]}, "data_format_select": "gene_calls", "index": false, "jb_custom_config": {"option": []}, "jbcolor_scale": {"color_score": {"__current_case__": 0, "color": {"__current_case__": 0, "color_select": "automatic"}, "color_score_select": "none"}}, "jbmenu": {"track_menu": []}, "jbstyle": {"max_height": "600", "style_classname": "feature", "style_description": "note,description", "style_height": "10px", "style_label": "product,name,id"}, "match_part": {"__current_case__": 1, "match_part_select": false}, "override_apollo_drag": "False", "override_apollo_plugins": "False", "track_config": {"__current_case__": 3, "html_options": {"topLevelFeatures": null}, "track_class": "NeatHTMLFeatures/View/Track/NeatFeatures"}, "track_visibility": "default_off"}}]}]
              uglyTestingHack ""
      • Step 10: OMArk:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is paused

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "b11a35c8096311f08a577c1e5258711c"
              chromInfo "/tmp/tmpd6ow920l/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "LUCA-v2.0.0.h5"
              dbkey "?"
              input_iso None
              omark_mode false
              outputs "detail_sum"
              r None
              t None
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 85e23c94bcb17d8c
      • history_state

        • error
      • invocation_id

        • 85e23c94bcb17d8c
      • invocation_state

        • scheduled
      • workflow_id

        • 85e23c94bcb17d8c

@rlibouba rlibouba marked this pull request as draft March 25, 2025 10:32
@github-actions
Copy link

Test Results (powered by Planemo)

Test Summary

Test State Count
Total 1
Passed 0
Error 1
Failure 0
Skipped 0
Errored Tests
  • ❌ Genome_annotation_with_braker3.ga_0

    Execution Problem:

    • Failed to run workflow, at least one job is in [error] state.
      

    Workflow invocation details

    • Invocation Messages

    • Steps
      • Step 1: BUSCO lineage:

        • step_state: scheduled
      • Step 2: Genome sequence masked:

        • step_state: scheduled
      • Step 3: Alignments from RNA-seq:

        • step_state: scheduled
      • Step 4: Protein sequences:

        • step_state: scheduled
      • Step 5: BUSCO on the genome sequences:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is error

            Command Line:

            • ln -s '/tmp/tmp87j16dob/files/7/6/2/dataset_762691bf-c0ae-4fc1-9e71-cf0076d3667a.dat' input.fa &&  busco --in 'input.fa' --mode 'geno' --out busco_galaxy --cpu ${GALAXY_SLOTS:-4} --evalue 0.001 --limit 3 --contig_break 10 --offline --download_path '/cvmfs/data.galaxyproject.org/byhand/busco/v5'  --lineage_dataset 'mucorales_odb10'  --miniprot  && mkdir BUSCO_summaries && cp busco_galaxy/short_summary.*.txt BUSCO_summaries/ && generate_plot.py -wd BUSCO_summaries -rt specific  && echo "##gff-version 3" > busco_output.gff && (cat busco_galaxy/run_*/busco_sequences/*busco_sequences/*.gff >> busco_output.gff 2> /dev/null || true)

            Exit Code:

            • 127

            Standard Error:

            • /tmp/tmp87j16dob/job_working_directory/000/4/tool_script.sh: line 10: busco: command not found
              

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              adv {"contig_break": "10", "evalue": "0.001", "limit": "3"}
              busco_mode {"__current_case__": 0, "mode": "geno", "use_augustus": {"__current_case__": 1, "use_augustus_selector": "miniprot"}}
              cached_db "v5"
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              lineage {"__current_case__": 1, "lineage_dataset": "mucorales_odb10", "lineage_mode": "select_lineage"}
              outputs ["short_summary", "image", "gff", "missing"]
              test None
      • Step 6: Braker3:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is queued

            Command Line:

            • if [ -z "$GENEMARK_PATH" ] ; then echo "GeneMark is not installed on this Galaxy server." >&2 ; exit 1 ; fi && if [ ! -f "$GENEMARK_PATH/gmes/gmes_petap.pl" ] ; then echo "GeneMark is not installed properly on this Galaxy server." >&2 ; exit 1 ; fi &&   export PATH="$GENEMARK_PATH/../tools/:$PATH" &&  cp -r "$AUGUSTUS_CONFIG_PATH/" augustus_dir/ && export AUGUSTUS_CONFIG_PATH=`pwd`/augustus_dir/ &&  braker.pl --genome '/tmp/tmp87j16dob/files/7/6/2/dataset_762691bf-c0ae-4fc1-9e71-cf0076d3667a.dat'   --bam /tmp/tmp87j16dob/files/1/a/4/dataset_1a4ac51e-fa2b-411b-9f1c-acfe154f5d82.dat  --prot_seq /tmp/tmp87j16dob/files/c/6/1/dataset_c6147020-bcb8-4710-a846-bdcb5a5b5482.dat   --gff3     --rounds 5          --alternatives-from-evidence=true    --gc_probability 0.001 --downsampling_lambda 2   --threads  ${GALAXY_SLOTS:-2} --useexisting

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              advanced {"alternatives_from_evidence": true, "eval": null, "eval_pseudo": null, "filterOutShort": false}
              augustus {"AUGUSTUS_ab_initio": false, "crf": false, "keepCrf": false, "rounds": "5"}
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              dev {"downsampling_lambda": "2", "gc_probability": "0.001", "gm_max_intergenic": null, "min_contig": null, "splice_sites": null}
              evidences {"bam": {"values": [{"id": 2, "src": "hda"}]}, "prot_seq": {"values": [{"id": 3, "src": "hda"}]}}
              genemark {"fungus": false}
              output_format "gff3"
              softmasking true
              species None
      • Step 7: JBrowse:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "fasta"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              action {"__current_case__": 0, "action_select": "create"}
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              gencode "1"
              jbgen {"aboutDescription": "", "defaultLocation": "", "hideGenomeOptions": false, "shareLink": true, "show_menu": true, "show_nav": true, "show_overview": true, "show_tracklist": true, "trackPadding": "20"}
              plugins {"BlastView": true, "ComboTrackSelector": false, "GCContent": false}
              reference_genome {"__current_case__": 1, "genome": {"values": [{"id": 1, "src": "hda"}]}, "genome_type_select": "history"}
              standalone "minimal"
              track_groups [{"__index__": 0, "category": "Helixer Annotation", "data_tracks": [{"__index__": 0, "data_format": {"__current_case__": 2, "annotation": {"values": [{"id": 9, "src": "hda"}]}, "data_format_select": "gene_calls", "index": false, "jb_custom_config": {"option": []}, "jbcolor_scale": {"color_score": {"__current_case__": 0, "color": {"__current_case__": 0, "color_select": "automatic"}, "color_score_select": "none"}}, "jbmenu": {"track_menu": []}, "jbstyle": {"max_height": "600", "style_classname": "feature", "style_description": "note,description", "style_height": "10px", "style_label": "product,name,id"}, "match_part": {"__current_case__": 1, "match_part_select": false}, "override_apollo_drag": "False", "override_apollo_plugins": "False", "track_config": {"__current_case__": 3, "html_options": {"topLevelFeatures": null}, "track_class": "NeatHTMLFeatures/View/Track/NeatFeatures"}, "track_visibility": "default_off"}}]}]
              uglyTestingHack ""
      • Step 8: GFFRead:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              chr_replace None
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              decode_url false
              expose false
              filtering None
              full_gff_attribute_preservation false
              gffs {"__current_case__": 0, "gff_fmt": "none"}
              maxintron None
              merging {"__current_case__": 0, "merge_sel": "none"}
              reference_genome {"__current_case__": 2, "fa_outputs": ["-y pep.fa"], "genome_fasta": {"values": [{"id": 1, "src": "hda"}]}, "ref_filtering": null, "source": "history"}
              region {"__current_case__": 0, "region_filter": "none"}
      • Step 9: OMArk:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              database "LUCA-v2.0.0.h5"
              dbkey "?"
              input_iso None
              omark_mode false
              outputs "detail_sum"
              r None
              t None
      • Step 10: BUSCO on the predicted protein sequences:

        • step_state: scheduled

        • Jobs
          • Job 1:

            • Job state is new

            Traceback:

            Job Parameters:

            • Job parameter Parameter value
              __input_ext "input"
              __workflow_invocation_uuid__ "8d70ba5e477d11f0bb097c1e524172af"
              adv {"contig_break": "10", "evalue": "0.001", "limit": "3"}
              busco_mode {"__current_case__": 1, "mode": "tran"}
              cached_db "v5"
              chromInfo "/tmp/tmp87j16dob/galaxy-dev/tool-data/shared/ucsc/chrom/?.len"
              dbkey "?"
              lineage {"__current_case__": 1, "lineage_dataset": "mucorales_odb10", "lineage_mode": "select_lineage"}
              outputs ["short_summary", "image", "gff", "missing"]
              test None
    • Other invocation details
      • error_message

        • Failed to run workflow, at least one job is in [error] state.
      • history_id

        • 1af438c5f7138931
      • history_state

        • error
      • invocation_id

        • 1af438c5f7138931
      • invocation_state

        • scheduled
      • workflow_id

        • 1af438c5f7138931

@rlibouba rlibouba force-pushed the add_braker3_workflow branch from fc2d489 to d8d6290 Compare July 16, 2025 13:10
@rlibouba rlibouba marked this pull request as ready for review July 17, 2025 08:00
@mvdbeek
Copy link
Member

mvdbeek commented Jul 30, 2025

You seem to have helixer changes in here, let me know if you want to remove those ?

@rlibouba
Copy link
Collaborator Author

Hi @mvdbeek, I had a conflict, so I tried a rebase to fix it.

@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 16, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 16, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 16, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 18, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 18, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 18, 2025
@galaxyproject galaxyproject deleted a comment from github-actions bot Sep 18, 2025
@rlibouba
Copy link
Collaborator Author

I wanted to let users choose which database to use.

@mvdbeek mvdbeek requested a review from Delphine-L September 18, 2025 15:15
@rlibouba
Copy link
Collaborator Author

Hello @Delphine-L , Do you think these latest changes are good ?

@rlibouba
Copy link
Collaborator Author

Hello @mvdbeek and @Delphine-L, do you think this PR can be merged? Or are there other changes that need to be made?
Have a nice day!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new genome annotation workflow using Braker3 to the IWC repository. The workflow integrates RNA-seq and protein data to annotate eukaryotic genomes, with quality evaluation using BUSCO and OMArk, sequence extraction with GFFRead, and visualization via JBrowse.

Key changes:

  • New workflow for genome annotation with Braker3 that combines GeneMark-ETP and AUGUSTUS predictions
  • Comprehensive quality evaluation with BUSCO (on both genome and predicted proteins) and OMArk
  • Support for both fungal and non-fungal genomes with configurable parameters

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
Genome_annotation_with_braker3.ga Main workflow file defining the Braker3 annotation pipeline with quality evaluation steps
Genome_annotation_with_braker3-tests.yml Test configuration using Zenodo-hosted test data (mucorales genome)
README.md Documentation explaining workflow purpose, inputs, steps, and expected outputs
CHANGELOG.md Version history documenting the initial release
.dockstore.yml Dockstore configuration with correct file paths and author information

@mvdbeek
Copy link
Member

mvdbeek commented Dec 5, 2025

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great as always, thank you @rlibouba!

Co-authored-by: Marius van den Beek <m.vandenbeek@gmail.com>
@mvdbeek mvdbeek enabled auto-merge December 5, 2025 16:45
@mvdbeek mvdbeek merged commit 6fc5430 into galaxyproject:main Dec 8, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants