Improve activity and run_application metadata#2113
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves simtools reproducibility metadata by introducing UUID7-based activity identifiers for single-application runs and multi-step simtools-run-application workflows, and by propagating workflow context/runtime environment into model-parameter metadata.
Changes:
- Add UUID7 generation/helpers and propagate
activity_idthrough settings, CLI parsing, logging, and metadata collection. - Enhance
simtools-run-applicationto track per-step activities and inject workflow-level activity +context:associated_activitiesinto produced model-parameter metadata files. - Extend metadata/workflow schemas to include
runtime_environmentandassociated_activities, and add/adjust unit tests accordingly.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/simtools/utils/general.py |
Add UUID7 generator wrapper plus path/placeholder helper utilities used by workflows. |
src/simtools/settings.py |
Introduce config.activity_id lifecycle (args-provided or auto-generated). |
src/simtools/schemas/metadata.metaschema.yml |
Extend CTA metadata schema with activity.runtime_environment and context.associated_activities. |
src/simtools/schemas/application_workflow.metaschema.yml |
Add workflow-level activity_id field to the workflow schema. |
src/simtools/runners/simtools_runner.py |
Generate/propagate workflow + per-step activity IDs; update model-parameter metadata with workflow context and associated activities. |
src/simtools/data_model/workflow_metadata.py |
New module to build workflow activity metadata and update model-parameter metadata files. |
src/simtools/data_model/metadata_collector.py |
Use UUID7 for product IDs; allow overriding activity start/end; optionally copy runtime environment and associated activities. |
src/simtools/configuration/configurator.py |
Switch activity ID generation to UUID7 wrapper. |
src/simtools/configuration/commandline_parser.py |
Add --activity_id CLI argument. |
src/simtools/applications/run_application.py |
Align CLI arg naming (config_file) and runner invocation; adjust initialization behavior for orchestrator use. |
src/simtools/applications/plot_tabular_data.py |
Support __SETTING_WORKFLOW__ placeholder replacement for workflow-style paths. |
src/simtools/applications/db_add_value_from_json_to_db.py |
Use UUID7 wrapper for test DB version generation. |
src/simtools/applications/db_add_file_to_db.py |
Use UUID7 wrapper for test DB name suffix generation. |
src/simtools/application_control.py |
Include activity ID in log filename and in file log line prefix. |
pyproject.toml |
Add uuid6 dependency to support UUID7 on Python < 3.14. |
environment.yml |
Add uuid6 to dev environment; adjust micromamba update command. |
docs/source/api-reference/data_model.md |
Document new workflow_metadata module in API reference. |
docs/changes/2112.feature.md |
Add changelog fragment describing UUID7/activity metadata improvements. |
tests/unit_tests/utils/test_general.py |
Add unit tests for UUID7/path extraction and recursive placeholder replacement helpers. |
tests/unit_tests/test_settings.py |
Add tests ensuring activity_id is set from args or generated when missing. |
tests/unit_tests/test_application_control.py |
Update logging test to assert activity ID appears in file logs. |
tests/unit_tests/runners/test_simtools_runner.py |
Update runner tests for new return values, activity propagation, and per-app log file naming. |
tests/unit_tests/data_model/test_workflow_metadata.py |
Add tests for building workflow activity metadata and updating model-parameter metadata files. |
tests/unit_tests/configuration/test_configurator.py |
Adjust configurator test to account for activity_id now being present by default. |
tests/unit_tests/configuration/test_commandline_parser.py |
Add test coverage for --activity_id parsing. |
tobiaskleiner
left a comment
There was a problem hiding this comment.
Thanks @GernotMaier, this is a good refactor for uuids and metadata. I have added a few comments about consistency and some potential checks.
| ) | ||
|
|
||
| simtools_runner.run_applications(app_context.args, app_context.logger) | ||
| simtools_runner.run_applications(app_context.args) |
There was a problem hiding this comment.
Is this intentional to remove the logger injection here?
There was a problem hiding this comment.
yes - to have the usage of logger in simtools_runner consistent with other modules.
| anyOf: | ||
| - type: string | ||
| - type: "null" | ||
| - type: number |
There was a problem hiding this comment.
why number? UUID should not be numeric
| anyOf: | ||
| - type: string | ||
| - type: "null" | ||
| - type: number |
There was a problem hiding this comment.
I am not changing this - as this is an older version of the metaschema and used before we consistently applied UUIDs. I don't want to break something (although unlikely).
| metadata_args["instrument"] = workflow_context.get("instrument") | ||
|
|
||
| collector = MetadataCollector(metadata_args, clean_meta=False) | ||
| return collector.get_top_level_metadata().get("cta", {}).get("activity", {}) |
There was a problem hiding this comment.
Can we assume cta is there?
There was a problem hiding this comment.
You are right that in general we do not hardwire the observatory name. This case is a bit different, as we only one metadata schema (SimtoolsOutputMetadata), which is for CTA (and has this key included). Using a different metadata scheme would require a bit of work - and we can do this when this is becoming relevant.
| for config in configurations: | ||
| app = config.get("application") | ||
| if not config.get("run_application"): | ||
| logger.info(f"Skipping application: {app}") |
There was a problem hiding this comment.
Would it make sense to skip the application, but still continue with the workflow?
There was a problem hiding this comment.
But this is exactly what happens - each config is a step in the workflow. What we do here is we keep single steps (there is functionality to give e.g., on the command line to say: run step 1, 3, 5 of the workflow)
| Name of the associated activity. | ||
| anyOf: | ||
| - type: string | ||
| - type: "null" |
There was a problem hiding this comment.
activity_name is required, but here you allow "null" as value which makes this a bit pointless in my opinion. Should these fields be strictly required as non-null strings instead?
There was a problem hiding this comment.
No, there is a difference - the field is required here, meaning it is a reminder that either there was no activity (e.g., something is set by hand) or we don't know the activity id.
| Identifier of the associated activity. | ||
| anyOf: | ||
| - type: string | ||
| - type: "null" |
| return env_values | ||
|
|
||
|
|
||
| def uuid(): |
There was a problem hiding this comment.
So this replaces the builtin uuid in python? Maybe better to name it differently..
There was a problem hiding this comment.
Good point - renamed it to get_uuid()
| if len(subdirs) == 0: | ||
| raise ValueError(f"Could not find subdirectory under '{anchor}'") | ||
|
|
||
| return "/".join(subdirs) |
There was a problem hiding this comment.
Path(*subdirs) is better
| ValueError | ||
| If anchor is not present or no subdirectories are found after the anchor. | ||
| """ | ||
| path = Path(path).resolve() |
There was a problem hiding this comment.
just careful here that this also resolves symlinks
There was a problem hiding this comment.
Good point - I actually want to keep the original path and not resolve it. Remove the .resolve().
|
@tobiaskleiner - thanks for the careful review! I've implemented / commented on everything, let me know if this is fine. |
|
|
Thanks @GernotMaier for the answers and the fixes. Can be merged now. |




Several important improvements regarding metadata / reproducibility.
Best to look at the simulation model parameter setting changes related to this PR:
Every process of running a simtools application generates now a
activity_id(UUID7). This is also written into theactivity:idfield in the metadata:For the simtools-run-application we execute several simtools after each other in the process of the setting workflows.
context:associated_activities, e.g.\The activity ID added to the model parameter metadata is above workflow ID, e.g.:
Add
runtime_environmentfield to copy this entry from the input configuration to the model parameter metadata file.Additional added some simplification in the log file generation.
Technical notes:
uuidpackage can generate UUIDs version 7. As we allow python >=3.12, we need to add a dependency (yes, we useuuid6to generate UUID7)