Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and fix up recursive resolve and processing functions #126

Merged
merged 13 commits into from
Jun 24, 2024

Conversation

achilleas-k
Copy link
Member

@achilleas-k achilleas-k commented Jun 21, 2024

This PR refactors and fixes the recursive resolve functionality to match the spec. All but one of the examples in test/data/base/ pass successfully. The 05-target-in-include.yaml example fails because the _process() function in command.py expects a target in the entrypoint and errors if it doesn't find one.

Commit messages should tell the whole story, but let me highlight a few things here I think are the most important:

  • The new ParserState, originally introduced by @mvo5 in another PR and pulled in here, tracks two things: the path to the file being processed and a reference to the global defines or a sub-part of them depending on the state. When a sub-block of a defines block is being processed, the ParserState.defines references the sub-block being processed, so variable definitions can be added to that sub-block directly. It will also, eventually, probably carry a stack of directives being processed so we can do things like disallow includes under defines.
  • The process_include() function loads a yaml file and sends it through resolve() before returning the contents. I think this should eventually be the entrypoint for the recursion. In other words, the path to the entrypoint yaml file specified on the command line can be sent directly through to process_include() and it should all work out.
    • However, this means that we wont be able to (nor I think we would have to) do any preprocessing of the entrypoint. I think eventually it might be good to load the file, start processing, and validate after everything is included and resolved.
  • Target processing is currently ignored. I don't think this is particularly difficult to take care of. I think a good approach to target processing and selection might be to process all targets found while running through the resolve() recursion, then print the target that matches the one specified on the command line (or the default if there's only one). Before printing we would have a tree that contains everything we loaded and resolved, so we could print the flattened document with all targets included and variables resolved and we could also run some validation over it.
  • The process_defines() is different from all the other ones in that it modifies the context in-place and doesn't return anything. I wonder if things would be simpler if every resolver/process function worked this way and modified the tree in-place.

@achilleas-k
Copy link
Member Author

I just realised that the only failing test now is the target-in-include test that fails during the initial validation that requires a target in the entrypoint.

@achilleas-k achilleas-k force-pushed the transform-all-the-things branch 3 times, most recently from f4f56e1 to e734408 Compare June 21, 2024 20:41
@supakeen
Copy link
Member

supakeen commented Jun 21, 2024

I just realised that the only failing test now is the target-in-include test that fails during the initial validation that requires a target in the entrypoint.

Feel free to drop it in this PR if that turns it green; we decided in #112 to do so. It only needs the removal of the check in ensure and to move target selection to after the resolve in command.py:compile :)

@achilleas-k achilleas-k marked this pull request as ready for review June 22, 2024 12:49
@achilleas-k achilleas-k changed the title Transform all the things Refactor recursive resolve and processing functions Jun 22, 2024
@achilleas-k achilleas-k changed the title Refactor recursive resolve and processing functions Refactor and fix up recursive resolve and processing functions Jun 22, 2024
@achilleas-k
Copy link
Member Author

Disabled test (without removing it) and marked as ready.

Some of the functions in directive.py would will require resolving
values recursively.  For example, include() should run the loaded tree
through the top-level resolve() function to resolve the whole content of
the included file recursively.

Merge the two modules so that we don't end up with circular imports.
src/otk/command.py Outdated Show resolved Hide resolved
src/otk/parserstate.py Outdated Show resolved Hide resolved
src/otk/parserstate.py Outdated Show resolved Hide resolved
Copy link
Member

@supakeen supakeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really just naming things otherwise great stuff <3

This commit adds a new `traversal.State` that is part of the `resolve()`
recursion. It is needed to keep (recursive) state about what state the
parsing is in. Currently this is used to keep track of what include is
used so that nested includes work and also so that errors can know in
what file the issue happened.
supakeen
supakeen previously approved these changes Jun 24, 2024
mvo5
mvo5 previously approved these changes Jun 24, 2024
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is very nice indeed - I have a bunch of small nitpicks/ideas/suggestions but if we want to move fast we can ignore them all and just merge as is (I went over this commit by commit so some comments are just related to the commit order/granularity).

src/otk/traversal.py Show resolved Hide resolved
src/otk/context.py Show resolved Hide resolved
continue

if key.startswith("otk.include"):
# TODO: disallow this (see https://github.com/osbuild/otk/issues/116)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the plan here is that we introduce it and then revert in a single commit so that (if we actually change our mind) we can get it back trivially ? Did I get this right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mostly. But also my thinking was that this is a spec change that came from a discussion while the implementation was happening. I wanted to isolate the change so that it has more visibility and stands on its own (e.g. a PR that only changes this, linked from the issue, that can be read on its own).

Also, I wasn't sure at the time if it needed some state tracking (something to tell us that we're in a defines block). Now I realise since defines can't branch out to any other directives (except other defines), it's not really necessary. But we should also think about what happens with other directives in general.

src/otk/transform.py Show resolved Hide resolved
src/otk/transform.py Show resolved Hide resolved
# Replace any variables in a value immediately before doing anything
# else, so that variables defined in strings are considered in the
# processing of all directives.
if isinstance(val, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if a test for this could be easily extracted from the example given in the commit message for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the example tests covers this (09-include-with-var). Or would you rather also have a small unit test for it?

test/data/error/00-invalid.err Outdated Show resolved Hide resolved
@achilleas-k achilleas-k dismissed stale reviews from mvo5 and supakeen via 815d989 June 24, 2024 10:56
achilleas-k and others added 5 commits June 24, 2024 13:14
The defines property of the State will be used to keep track of
subblocks of variable definitions when processing otk.define directives.

The copy() method creates a new State with replaced or copied
properties.

This will enable us to create State objects with references to
sub-objects of the global defines variables.  While processing defines
sub-objects, the State's defines can be written to to define values
nested under the key being processed, while using the global Context
object to resolve variable names inside defines.
Doesn't do anything for now, but we pass it around because we're going
to need it.

Co-authored-by: Michael Vogt <michael.vogt@gmail.com>
This might be reverted if we can find a nicer way of handling it, but it
will be very convenient to have direct access to the defines dictionary.
The new process_defines() function handles defines recursively with the
following properties:
- An otk.define nested under another otk.define is ignored (as if the
  directive was omitted) and the defines below it are processed
  normally.
- Defines are set in the global context immediately, meaning that a
  variable can be referenced as soon as it's set.  For example, the
  following is valid:
    otk.define.ones:
        one: 1
        eins: "${one}"  # sets eins = 1
- Variable references in values are resolved immediately, before being
  set on the context.
- Subblocks are supported and variables are added and available via
  their global name while blocks are processed.
- Variable references are always resolved from the global context.  For
  example:
    otk.define.nested:
      alpha: 1
      subblock:
        alpha: 2
        a1: "${alpha}"  # = 1, from the global context
        a2: "${subblock.alpha}"  # = 2
- Includes are resolved during processing of the defines block.
    - This is subject to change.
- Object (dictionary) values under keys that are already defined and
  are also objects (dictionaries) are merged.  For example, the
  following:
    otk.define.multi_1:
      subblock:
        alpha: 1
        beta: 2
    otk.define.multi_2:
      subblock:
        gamma: 3
  results in {"subblock": {"alpha": 1, "beta": 2, "gamma": 3}}

The process_defines() function has no return value.  It updates the
subblock directly, and the global ctx.defines indirectly as soon as it
finds a value to set.  This is convenient for variable reference lookup
during processing.

Add tests for process_defines():

Replace define() tests and move them to the test_transform module since
we moved the process_defines() function to the transform module now.
Since the process_defines() function doesn't return anything, we use the
ctx.defines dictionary to evaluate the test.
The new process_include() function loads a yaml file and immediately
runs it through resolve before returning it, processing directives and
variable references as they are found.
The path to the file is resolved relative to the file that included it.
Replace any variables in a string value immediately before doing
anything else, so that the resolved values are used in the processing of
all directives.  This adds support for specifying a directive with a
string value that can contain variables, such as:

  otk.include: "${distro}-vars.yaml"
- Remove elses and elifs when not necessary to make the linter happy.
  No need for else after return or continue.
- Use process_defines() instead of resolve() when a define directive is
  found.  The function updates the global context so there is no need
  to modify the tree after calling it.
- Change the generic Exception() for siblings to a KeyError().
- Move the processing of the external directive above the sibling check.
  Externals can now have siblings.
- When processing an include, copy the tree, remove the directive, and
  update the copy of the tree before returning it.  This is necessary
  otherwise the resulting tree will still contain the include directive.
Add a test that produces an error with an expected message.  For now it
only tests that a file not found error includes the name of the source
file in the message.
Copy link
Contributor

@mvo5 mvo5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thank you!

@supakeen supakeen added this pull request to the merge queue Jun 24, 2024
Merged via the queue into osbuild:main with commit 842ea69 Jun 24, 2024
3 checks passed
@achilleas-k achilleas-k deleted the transform-all-the-things branch June 24, 2024 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants