Skip to content

Improve activity and run_application metadata#2113

Merged
GernotMaier merged 22 commits into
mainfrom
settings-directories
Apr 15, 2026
Merged

Improve activity and run_application metadata#2113
GernotMaier merged 22 commits into
mainfrom
settings-directories

Conversation

@GernotMaier
Copy link
Copy Markdown
Contributor

Several important improvements regarding metadata / reproducibility.

Best to look at the simulation model parameter setting changes related to this PR:

Every process of running a simtools application generates now a activity_id (UUID7). This is also written into the activity:id field in the metadata:

  activity:
    name: plot_tabular_data
    type: software
    id: 019d82c8-a267-7353-bdf1-8c8fb71ebe29
    software:
      name: simtools
      version: 0.29.1.dev142+g3150e8719
    start: '2026-04-12T17:41:21+00:00'
    end: '2026-04-12T17:41:21+00:00'

For the simtools-run-application we execute several simtools after each other in the process of the setting workflows.

  • change directory structure in the input directories of the setting workflows and replace dates by UUID7s.
  • these IDs are workflow IDs and are propagated to the metadata
  • add the list of activities to the model parameter metadata file as context:associated_activities, e.g.\
    associated_activities:
    - name: simtools-derive-photon-electron-spectrum
      activity_id: 019d82c8-a266-7ba9-b064-c637329871a6
    - name: simtools-plot-tabular-data
      activity_id: 019d82c8-a267-7353-bdf1-8c8fb71ebe29
    - name: simtools-submit-model-parameter-from-external
      activity_id: 019d82c8-a268-78a1-aead-59360a46c1cd

The activity ID added to the model parameter metadata is above workflow ID, e.g.:

activity:
    name: setting_workflow
    type: software
    id: 019d776b-e24c-741d-bc05-e3f6f7ec77c7
    software:
      name: simtools
      version: 0.29.1.dev142+g3150e8719
    start: '2026-04-12T17:41:20+00:00'
    end: '2026-04-12T17:41:23+00:00'
    runtime_environment: null

Add runtime_environment field to copy this entry from the input configuration to the model parameter metadata file.

Additional added some simplification in the log file generation.

Technical notes:

  • use UUID7 as their are sortable. From python 3.14 on, the uuid package can generate UUIDs version 7. As we allow python >=3.12, we need to add a dependency (yes, we use uuid6 to generate UUID7)
  • generalized the uuid generation in general.py to allow easy change in case we decide for a different ID in future.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves simtools reproducibility metadata by introducing UUID7-based activity identifiers for single-application runs and multi-step simtools-run-application workflows, and by propagating workflow context/runtime environment into model-parameter metadata.

Changes:

  • Add UUID7 generation/helpers and propagate activity_id through settings, CLI parsing, logging, and metadata collection.
  • Enhance simtools-run-application to track per-step activities and inject workflow-level activity + context:associated_activities into produced model-parameter metadata files.
  • Extend metadata/workflow schemas to include runtime_environment and associated_activities, and add/adjust unit tests accordingly.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/simtools/utils/general.py Add UUID7 generator wrapper plus path/placeholder helper utilities used by workflows.
src/simtools/settings.py Introduce config.activity_id lifecycle (args-provided or auto-generated).
src/simtools/schemas/metadata.metaschema.yml Extend CTA metadata schema with activity.runtime_environment and context.associated_activities.
src/simtools/schemas/application_workflow.metaschema.yml Add workflow-level activity_id field to the workflow schema.
src/simtools/runners/simtools_runner.py Generate/propagate workflow + per-step activity IDs; update model-parameter metadata with workflow context and associated activities.
src/simtools/data_model/workflow_metadata.py New module to build workflow activity metadata and update model-parameter metadata files.
src/simtools/data_model/metadata_collector.py Use UUID7 for product IDs; allow overriding activity start/end; optionally copy runtime environment and associated activities.
src/simtools/configuration/configurator.py Switch activity ID generation to UUID7 wrapper.
src/simtools/configuration/commandline_parser.py Add --activity_id CLI argument.
src/simtools/applications/run_application.py Align CLI arg naming (config_file) and runner invocation; adjust initialization behavior for orchestrator use.
src/simtools/applications/plot_tabular_data.py Support __SETTING_WORKFLOW__ placeholder replacement for workflow-style paths.
src/simtools/applications/db_add_value_from_json_to_db.py Use UUID7 wrapper for test DB version generation.
src/simtools/applications/db_add_file_to_db.py Use UUID7 wrapper for test DB name suffix generation.
src/simtools/application_control.py Include activity ID in log filename and in file log line prefix.
pyproject.toml Add uuid6 dependency to support UUID7 on Python < 3.14.
environment.yml Add uuid6 to dev environment; adjust micromamba update command.
docs/source/api-reference/data_model.md Document new workflow_metadata module in API reference.
docs/changes/2112.feature.md Add changelog fragment describing UUID7/activity metadata improvements.
tests/unit_tests/utils/test_general.py Add unit tests for UUID7/path extraction and recursive placeholder replacement helpers.
tests/unit_tests/test_settings.py Add tests ensuring activity_id is set from args or generated when missing.
tests/unit_tests/test_application_control.py Update logging test to assert activity ID appears in file logs.
tests/unit_tests/runners/test_simtools_runner.py Update runner tests for new return values, activity propagation, and per-app log file naming.
tests/unit_tests/data_model/test_workflow_metadata.py Add tests for building workflow activity metadata and updating model-parameter metadata files.
tests/unit_tests/configuration/test_configurator.py Adjust configurator test to account for activity_id now being present by default.
tests/unit_tests/configuration/test_commandline_parser.py Add test coverage for --activity_id parsing.

Comment thread src/simtools/application_control.py
@GernotMaier GernotMaier marked this pull request as ready for review April 13, 2026 08:59
Copy link
Copy Markdown
Collaborator

@tobiaskleiner tobiaskleiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GernotMaier, this is a good refactor for uuids and metadata. I have added a few comments about consistency and some potential checks.

)

simtools_runner.run_applications(app_context.args, app_context.logger)
simtools_runner.run_applications(app_context.args)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentional to remove the logger injection here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - to have the usage of logger in simtools_runner consistent with other modules.

anyOf:
- type: string
- type: "null"
- type: number
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why number? UUID should not be numeric

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

anyOf:
- type: string
- type: "null"
- type: number
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not changing this - as this is an older version of the metaschema and used before we consistently applied UUIDs. I don't want to break something (although unlikely).

metadata_args["instrument"] = workflow_context.get("instrument")

collector = MetadataCollector(metadata_args, clean_meta=False)
return collector.get_top_level_metadata().get("cta", {}).get("activity", {})
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assume cta is there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that in general we do not hardwire the observatory name. This case is a bit different, as we only one metadata schema (SimtoolsOutputMetadata), which is for CTA (and has this key included). Using a different metadata scheme would require a bit of work - and we can do this when this is becoming relevant.

for config in configurations:
app = config.get("application")
if not config.get("run_application"):
logger.info(f"Skipping application: {app}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to skip the application, but still continue with the workflow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is exactly what happens - each config is a step in the workflow. What we do here is we keep single steps (there is functionality to give e.g., on the command line to say: run step 1, 3, 5 of the workflow)

Name of the associated activity.
anyOf:
- type: string
- type: "null"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activity_name is required, but here you allow "null" as value which makes this a bit pointless in my opinion. Should these fields be strictly required as non-null strings instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is a difference - the field is required here, meaning it is a reminder that either there was no activity (e.g., something is set by hand) or we don't know the activity id.

Identifier of the associated activity.
anyOf:
- type: string
- type: "null"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

Comment thread src/simtools/utils/general.py Outdated
return env_values


def uuid():
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this replaces the builtin uuid in python? Maybe better to name it differently..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - renamed it to get_uuid()

Comment thread src/simtools/utils/general.py Outdated
if len(subdirs) == 0:
raise ValueError(f"Could not find subdirectory under '{anchor}'")

return "/".join(subdirs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path(*subdirs) is better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread src/simtools/utils/general.py Outdated
ValueError
If anchor is not present or no subdirectories are found after the anchor.
"""
path = Path(path).resolve()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just careful here that this also resolves symlinks

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I actually want to keep the original path and not resolve it. Remove the .resolve().

@GernotMaier
Copy link
Copy Markdown
Contributor Author

@tobiaskleiner - thanks for the careful review! I've implemented / commented on everything, let me know if this is fine.

@ctao-sonarqube
Copy link
Copy Markdown

@tobiaskleiner
Copy link
Copy Markdown
Collaborator

Thanks @GernotMaier for the answers and the fixes. Can be merged now.

@GernotMaier GernotMaier merged commit f3a667f into main Apr 15, 2026
16 checks passed
@GernotMaier GernotMaier deleted the settings-directories branch April 15, 2026 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants