Skip to content

Duplicate entries in YAML mappings (dicts) gets implicitly overwritten #272

@albestro

Description

@albestro

This is the example that caused me problems and it was not immediately clear to me why it was problematic.

my-uenv:
  ...
  views:
    default:
      link: run
      uenv:
        env_vars:
          prepend_path:
            - PATH: /user-environment/paraview/bin
          set:
            - PARAVIEW_PLUGINS_DIR: /user-environment/paraview-plugins
          prepend_path:
            - LD_LIBRARY_PATH: /user-environment/paraview/lib
            - LD_LIBRARY_PATH: /user-environment/paraview/lib64

Moreover, stackinator didn't complain about it, it just went on and actually it produced a uenv, but the uenv_vars were not fully set as expected. The problem is the duplicate prepend_path entry.

IMHO this should raise an error (or at least a warning) about this problem.

YAML Spec

PyYAML library, which stackinator uses for reading YAML files, is not fully compliant with the YAML spec, which states (starting from YAML 1.0)

A mapping is an unordered set of key/value node pairs, with the restriction that each of the keys is unique.

And, in partial defense of PyYAML, this section of the YAML spec adds

This restriction has non-trivial implications [...] Since YAML mappings require key uniqueness, representations must include a mechanism for testing the equality of nodes. This is non-trivial since YAML presentations allow various ways to write a given scalar.

The way PyYAML currently handles this problem is by ignoring duplicates and overwriting.

Solutions

As said, I think that YAML spec should be enforced. I don't think there are, and IMHO there shouldn't be, other cases where the duplicate entries are useful.

The solutions I see to enforce YAML spec at the moment are

  • customize PyYAML to raise an error for duplicate entries: see the one proposed here Duplicate keys are not handled properly yaml/pyyaml#165 (comment) which might just work for our use-case
  • explore other libraries that might be conformant with the YAML spec. I read about:
    • ruamel.yaml
    • ruyaml
    • not sure about differences, but they self-describes as "derivated from PyYAML" but where "many of the bugs filed against PyYAML, but that were never acted upon, have been fixed in".

I opened this issue to decide if/how we would like to proceed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions