Skip to content

Feature: Configuration - support wildcards and regex for array definitions #202

@gupichon

Description

@gupichon

Description, motivation and use case

The current configuration requires explicitly listing all element IDs in array or device definitions. This becomes fragile and hard to maintain when new elements are added or naming conventions evolve.

Two complementary improvements are proposed:

  1. Pattern-based selection inside elements lists (wildcards, regular expressions and exclusions).
  2. Python-based configuration macros, allowing scripted generation of configuration blocks directly from YAML.

This issue focuses on extending the configuration layer to support:

  • Wildcards and regular expressions inside elements.
  • A simple negation syntax for exclusions.
  • A scripting mechanism (elements_code) allowing dynamic generation of configuration entries using embedded Python.

The goal is to improve scalability, maintainability and expressiveness of the configuration layer while preserving full backward compatibility.


Part 1 — Pattern-based element selection

Proposed solution

Keep the existing elements key unchanged and make the element string itself expressive.

Supported syntax:

Syntax Meaning
pattern inclusion using exact match or wildcard
~pattern exclusion using exact match or wildcard
re:<regex> inclusion using regular expression
~re:<regex> exclusion using regular expression

Examples:

elements:
  - BPM_C*-*
  - ~BPM_C04-*
  - re:^BPM_C1[0-9]-[0-9]{2}$
  - ~re:^BPM_C10-.*

Interpretation

Given the following example:

elements:
  - BPM_C*-*
  - ~BPM_C04-*
  - re:^BPM_C1[0-9]-[0-9]{2}$
  - ~re:^BPM_C10-.*

The evaluation proceeds as follows:

  1. Include all BPM_C*-*
  2. Remove those from cell C04
  3. Add elements matching the regex
  4. Remove elements matching the exclusion regex
  5. Sort the resulting list according to the reference ordering defined in the accelerator devices section

This ensures that array definitions remain stable and consistent with the canonical accelerator device ordering.


Resolution rules

  • elements remains a list of strings.
  • Each entry is resolved independently.

Rule grammar:

rule := ["~"] ("re:" <regex> | <glob>)

Resolution semantics:

  • Rules are evaluated sequentially.
  • Positive rules add matching elements.
  • Negative rules remove matching elements.

Matching operations always use the reference ordering defined in the devices section of the accelerator configuration.

The resulting list:

  • preserves the reference ordering of matched elements
  • removes duplicates deterministically

Example:

elements:
  - BPM_C*
  - ~BPM_C04*

This allows selective reinsertion of elements while keeping the accelerator reference ordering.


Matching rules

Pattern interpretation:

  • re: prefix → regular expression
  • string containing * or ? → wildcard (glob)
  • otherwise → exact element ID

Resolution must:

  • Use the device order defined in the accelerator devices section as the canonical ordering.
  • Be deterministic.
  • Clearly define behavior for 0 matches (error or empty contribution — must be specified and consistent).
  • Remove duplicates deterministically.

Full backward compatibility shall be preserved.


Part 2 — Python-based configuration macros

Motivation

While wildcards and regex improve selection, some configurations require:

  • Nested loops
  • Multiple parameterized ranges
  • Systematic naming patterns
  • Complex device definitions
  • Computed attribute values

For such cases, pattern matching is insufficient. We therefore propose a controlled macro mechanism using embedded Python code.

This was discussed as an additional proposal.


Proposed solution: elements_code

Introduce a new optional key:

elements_code: |
    <python code>

The embedded Python code:

  • Must return either:

    • A dict (single configuration entry), or
    • A list[dict] (multiple entries).
  • If a list is returned, it is expanded into the surrounding list.

  • The macro block is replaced by the returned structure before normal parsing continues.

This mechanism follows the same philosophy as existing file-expansion logic: detection → execution → replacement → continue parsing.


Example — BPM generation

devices:
  - elements_code: |
      out: list[dict] = []
      ranges = [(4,33), (1,4)]
      for r in ranges:
          for cell in range(r[0], r[1]):
              for elem in range(1, 11):
                  bpm = {
                      "type": "pyaml.bpm.bpm",
                      "name": f"BPM_C{cell:02d}-{elem:02d}",
                      "model": {
                          "type": "pyaml.bpm.bpm_simple_model",
                          "x_pos": {
                              "type": "tango.pyaml.attribute_read_only",
                              "attribute": f"srdiag/bpm/c{cell:02d}-{elem:02d}/SA_HPosition",
                              "unit": "m",
                          },
                          "y_pos": {
                              "type": "tango.pyaml.attribute_read_only",
                              "attribute": f"srdiag/bpm/c{cell:02d}-{elem:02d}/SA_VPosition",
                              "unit": "m",
                          },
                      },
                  }
                  out.append(bpm)
      return out

The resulting list is injected into the configuration tree and parsed normally.


Execution semantics

The macro mechanism must:

  • Execute before object construction.

  • Run in a controlled execution environment.

  • Provide:

    • A minimal safe namespace.
    • Optional helper utilities (e.g. range, math, etc.).
  • Enforce that the return type is either dict or list[dict].

  • Raise a clear configuration error otherwise.

The parsing process becomes:

  1. Load YAML.
  2. Detect elements_code.
  3. Execute code.
  4. Replace macro block with returned structure.
  5. Continue parsing recursively.

Determinism and reproducibility

To preserve reproducibility:

  • Execution must be deterministic.
  • No implicit access to external state unless explicitly allowed.
  • No hidden side effects.
  • Ordering of generated elements must be preserved as returned.

Backward compatibility

  • Existing YAML files remain valid.
  • elements_code is optional.
  • No change to current schema validation for standard entries.
  • Pattern-based elements resolution remains independent.

Considered alternatives

Separate external Python generation script

Instead of embedding Python inside YAML, users could generate YAML files using standalone Python scripts.

Pros:

  • Clear separation of code and configuration.

Cons:

  • Breaks self-contained configuration principle.
  • Harder to track provenance.
  • More complex CI workflows.

The embedded macro approach keeps configuration declarative while allowing structured generation when necessary.

This approach will always remain possible, independently of the macro mechanism proposed here. Nothing in this proposal prevents users from generating YAML files externally using Python (or any other language) before loading them into pyAML.


Dedicated DSL instead of Python

A domain-specific language could replace Python.

Pros:

  • Safer, constrained syntax.

Cons:

  • Reinvents control structures.
  • Higher implementation complexity.
  • Lower expressiveness.

Given Python is already the core language of the project, leveraging it is consistent with project philosophy.


Checklist

  • I've assigned this issue to a project
  • I've @-mentioned relevant people

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions