Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split up yaml loaders into multiple files #23774

Merged
merged 3 commits into from May 9, 2019

Conversation

ties
Copy link
Contributor

@ties ties commented May 9, 2019

Breaking Change:

Some implementation details of homeassistant.util.yaml are no longer public (their path changed). Some of these may not be technically internal. All exports from homeassistant.util.yaml that were accesses from outside the tests are still available using their original name. See commit messages for exact details.

API changes:

  • homessistant.util.yaml.represent_odict is not exposed as part of homeassistant.util.yaml but can be accessed from homeassistant.util.yaml.dumper.represent_odict
  • homeasistant.util.yaml.{NodeListClass, NodeStrClass} are not exposed as parts of `homeassistant.util.
  • homeassistant.util.yaml does not longer export imported (and monkeypatched) yaml module

Note that NodeStrClass does not appear to be actually used because the condition in yaml.py#L82 never matches - this is seen in test coverage.

Description:

homeassistant.util.yaml has been split up in preparation for more cleanup, following the structure from Ansible.

I intended to add a feature to the yaml loader to expand environment variables in strings (such as in docker-compose.yml). When considering this I noticed that the homeassistant.util.yaml was one large monolithic file that monkey patches system yaml.

Steps

The first improvement I would like to propose is contained in this PR and splits up the yaml code, following the structure in Ansible where the custom yaml loaders are split up into multiple files with clear roles.

A second step would be to no longer monkey patch yaml.yaml but to create a custom SafeLoader such as in Ansible. Since this is an important change I would propose to do this in a separate iteration.

A (optional) third step would be to add "expand environment variables in strings" to make values (e.g. database connection strings) configurable using environment variables that are expanded using the common ${VAR} syntax instead of !!env and templating the value outside of the configuration. Or to use jinja2 for strings in the configuration such as happens in ansible.

Checklist:

  • The code change is tested and works locally.
  • Local tests pass with tox. Your PR cannot be merged unless tests pass
  • There is no commented out code in this PR.
  • I have followed the development checklist

If user exposed functionality or configuration variables are added/changed:

If the code communicates with devices, web services, or third-party tools:

  • The manifest file has all fields filled out correctly. Update and include derived files by running python3 -m script.hassfest.
  • New or updated dependencies have been added to requirements_all.txt by running python3 -m script.gen_requirements_all.
  • Untested files have been added to .coveragerc.

If the code does not interact with devices:

  • Tests have been added to verify that the new code works.

Move parts of yaml loader out of the single large file and start
to create the structure of the yaml loaders in Ansible [0].

[0]: https://github.com/ansible/ansible/tree/devel/lib/ansible/parsing/yaml
@ties ties requested a review from a team as a code owner May 9, 2019 10:28
  * Move code around to finish the migration
  * Update the mocks so that `open` is patched in
    `homeassistant.util.yaml.loader` instead of
    `homeassistant.util.yaml`.
  * Updated mypy ignores
  * Updated external API of `homeasistant.util.yaml`, see below:

Checked what part of the api of `homeassistant.util.yaml` was actually
called from outside the tests and added an `__ALL__` that contains only
these elements.

Updated the tests so that references to internal parts of the API (e.g.
the yaml module imported into `homeassistant.util.yaml.loader`) are
referenced directly from `homeassistant.util.yaml.loader`.

In `tests/test_yaml.py` the import `yaml` refers to
`homeassistant.util.yaml` and `yaml_loader` refers to `~.loader`.

Future work that remains for the next iteration is to create a custom
SafeConstructor and refers to that instead of monkey patching `yaml` with
custom loaders.
@ties
Copy link
Contributor Author

ties commented May 9, 2019

I think this kind of a tricky PR but think the code gets a lot more structured because of it. It is a purely structural change.

I realized that I did not check other test cases (i.e. test/components for mocks that need to be updated, will do that over lunch) and reopen later.

@ties ties closed this May 9, 2019
@ties ties reopened this May 9, 2019
@ties ties requested a review from kellerza as a code owner May 9, 2019 13:03
@awarecan
Copy link
Contributor

awarecan commented May 9, 2019

FYI, there is an open PR #23761 to update PyYaml, you may want to coordinate with @BKPepe

Another reminder, our ultimate goal is to replace PyYaml by ruamel.yaml

@ties
Copy link
Contributor Author

ties commented May 9, 2019

a heads up: Part of the complete test suite is currently unstable on my laptop and a Debian Buster machine so I am relying on circleci to give the heads up that all is ok.

@ties
Copy link
Contributor Author

ties commented May 9, 2019

FYI, there is an open PR #23761 to update PyYaml, you may want to coordinate with @BKPepe

I'll follow that one, thanks for the heads up! I think that I caught all the places directly using homeassistant.util.yaml.yaml.load in one of these commits, that could be beneficial for that PR as well.

Another reminder, our ultimate goal is to replace PyYaml by ruamel.yaml

I was aware of that and think this work is orthogonal to that. If this restructuring (and the structure as in Ansible) is actually clearer to other developers.

Is there a way to add telemetry so we can see how often these special loaders are actually used? In my opinion some of them seem to have niche use cases.

@ties ties mentioned this pull request May 9, 2019
5 tasks
@OttoWinter
Copy link
Member

A (optional) third step would be to add "expand environment variables in strings" to make values (e.g. database connection strings) configurable using environment variables [...]

I don't think we should expand environment variables in all strings in the config - we shouldn't change the value of any YAML object at all without the user intentionally requesting to do so with a YAML tag.

Reason is: There are probably many real use-cases (like shell script component) where they should not be expanded. If we would expand them, that might be a very un-intuitive annoyance. Plus it would be a bad breaking change. Plus I don't see why !env is bad - yes it's verbose but that's good!

Another option would be a system esphome uses: substitutions (https://esphome.io/guides/configuration-types.html#config-substitutions) - only keys that are explicitly defined by the user are expanded. Everything else that looks like a substitution but is not defined as one, is not expanded.

Or to use jinja2 for strings

Problem is HA templates also use jinja2 - if we evaluate jinja2 at load time it would fundamentally mess with HA templating.

If you want to pursue it further, I would suggest opening an architecture issue (https://github.com/home-assistant/architecture).

Copy link
Member

@balloob balloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clean up looks great, thanks !

I do agree that adding further substitutions to our YAML should be treated with care, as there are many follow-up parts of the code that parse the YAML as templates.

@balloob balloob merged commit 4004867 into home-assistant:dev May 9, 2019
@ties ties deleted the feature/refactor_yaml branch May 9, 2019 20:42
@ties
Copy link
Contributor Author

ties commented May 13, 2019

I was away for the weekend so this comes a bit late - but I'll expand on the environment variables.

I don't think we should expand environment variables in all strings in the config - we shouldn't change the value of any YAML object at all without the user intentionally requesting to do so with a YAML tag.

Reason is: There are probably many real use-cases (like shell script component) where they should not be expanded. If we would expand them, that might be a very un-intuitive annoyance. Plus it would be a bad breaking change. Plus I don't see why !env is bad - yes it's verbose but that's good!

Thanks for the quick feedback and the explanation of your considerations!

It is a product change I would definitely not consider before having access to telemetry. Changing semantics without a good reason is annoying. I would only consider this for single line strings and only if it conflicts in a limited number of cases.

The use case is the general "configure through environment variables" pattern that is the part of the 12 factor app pattern that I appreciate.

My use case is building a connection string to influxdb and mysql from the environment variables that docker compose creates. This needs multiple tokens (postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@hass-postgres/${POSTGRESS_DB}) which !env_var does not support. os.path.expandvars could be coerced into doing what I want.

If you want to pursue it further, I would suggest opening an architecture issue (https://github.com/home-assistant/architecture).

That is the way. There are many paths to take (adding a (version) tag to the yaml, only applying to single line strings, only apply to some type of multiline quoting, only apply within quotes, etc). For now I hardcoded the connection string in my config 🙃

@balloob balloob mentioned this pull request Jun 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants