Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrappers for lazy loading/computing configuration in relation to other configuration #7

Open
RonnyPfannschmidt opened this issue Jul 6, 2022 · 12 comments

Comments

@RonnyPfannschmidt
Copy link

i have a setup where credentials are loaded from vault into the configuration,
this is done via lazy loading, so that even if there is a full configuration, only the necessary roundtrips are made

if there was a tool to have config nodes load deferred when they are used (not merged) that would be a help

@cjw296
Copy link
Member

cjw296 commented Jul 6, 2022

When providing a lazy-loaded config, what would be a nice API to let you specify that?

@RonnyPfannschmidt
Copy link
Author

RonnyPfannschmidt commented Jul 6, 2022

a key here is that not the "config" is lazy loaded, but rather a specific value/subtree

# myconfig.yaml
users:
   myuser:
     username: !from_vault {path: a/b, key: username}
     username: !from_vault {path: a/b, key: password}

then i would like to have something like

class VaultResolver(ConfigResolver):
   path: str
   key: str

   def resolve(self, config: Config, path: Path):
      return config.vault_loader.value_from(self.path, self.key)

@cjw296
Copy link
Member

cjw296 commented Jul 7, 2022

Are you able to share your current implementation of from_vault and your preferred way of registering it with pyyaml?

Also, in your example, is self.config a configurator Config object? If so, how do you see .vault_loader coming into existence?

@RonnyPfannschmidt
Copy link
Author

that as a typo, its only config

ideally a way to override the default yaml loader with one thats extended would be the mechanism of choice, as pyyaml and ruamel.yaml differ, i left it out

class VaultResolver(ConfigResolver):
   path: str
   key: str

   def resolve(self, config: Config, path: Path):
      return config.vault_loader.value_from(self.path, self.key)


class MyLoader(yamllib.Loader):
  pass

MyLoader.register(...., VaultResolver)

def vault_loader_for_settings(config):
  ...


def get_settings(given_vault_loader = None)
   config = config.from_file(..., parser=MyLoader.load) #insert correct call here
   config.vault_loader = given_vault_loader or vault_loader_for_settings()
   return config

@RonnyPfannschmidt
Copy link
Author

the current impl of the vault loader looks like

class VaultSecretFetcher:
    @classmethod
    def from_settings(cls, settings):
        mountpoint = settings.get("REDACTED_VAULT_MOUNT_POINT") or "secret"
        if settings.get("REDACTED_VAULT_LOADER_ENABLED"):
            client = _create_vault_client(settings)
            _login_and_renew_token(client, settings)
        else:
            client = None
        return cls(mountpoint=mountpoint, client=client, settings=settings)

    loaded_secrets: Dict[str, Dict[str, object]]

    def __init__(self, mountpoint: str, client: Client, settings: Settings):
        self._client = client
        self._mount_point = mountpoint
        self._settings = settings
        self.loaded_secrets = {}

    def _get_path_secret_from_vault(self, path):
        if self._client is None:
            raise InvalidPath(
                f"Unable to load path '{self._mount_point}/{path}' when vault client is disabled"
            )
        if not self._client.is_authenticated():
            _login_and_renew_token(self._client, self._settings)
        try:
            data = self._client.secrets.kv.read_secret_version(path, mount_point=self._mount_point)
        except InvalidPath:
            # Give more details in the InvalidPath error message
            raise InvalidPath(f"Unable to load path '{self._mount_point}/{path}'")
        else:
            return data.get("data", {}).get("data", {})

    def _get_path_secret(self, path):
        if path not in self.loaded_secrets:
            self.loaded_secrets[path] = self._get_path_secret_from_vault(path)
        return self.loaded_secrets[path]

    def get_value_from_vault(self, path, key):
        data = self._get_path_secret(path)
        if key not in data:
            raise VaultError(f"key '{key}' not found at path '{self._mount_point}/{path}'")
        log.debug("loaded vault secret from %s/%s", self._mount_point, path)
        return data[key]

@RonnyPfannschmidt
Copy link
Author

@cjw296 i think it may be a good idea to have the resolving of lazy objects be part of a wrapper

the patterns that can be implemented are

  1. resolve all lazy values in a part of the configuration as either new dictionary, or a new config object
  2. have a wrapper that looks like config but resolves lazy objects dynamically (resolver is responsible for caching)
  3. have a wrapper that looks like config and replaces dynamic references on first use with the resolved value

@cjw296
Copy link
Member

cjw296 commented Jul 13, 2022

I've been playing around with this a fair bit, observations so far:

  • I'm keen to preserve .data always being the raw thing that came back from the config. ConfigNode is always just an API for working with that. origin of data indication #9 introduces some problems around that, but I want to leave that problem separate.
  • The places where items or attrs need to be resolved feel like the place the lazy resolving should happen. There's more of these than I'd like, but I think some refactoring can make that less painful 🙂 However, this way, loading and/or caching can remain encapsulated in whatever lazy object .data ends up being.

Thinking through your points:

  1. I reckon Config is going to end up with some kind of visitor helper or pattern, this will cover this case as well as helping with case insensitive loading/lookup as opt-in #10.
  2. If the loading of lazy objects happens as described in my second point above, I reckon we can get this: The lazy object / resolver gets asked for the value each time, and it gets to decide if it wants to cache. I was excited when I thought we could use the descriptor __get__ for this, unfortunately almost none of the ways .data is obtained from a container use this. 🤦
  3. This feels like it could end up brittle and also breaks my contract in the first bullet above: .data and everything it contains is "your" stuff and configurator shouldn't get involved with it.

@cjw296
Copy link
Member

cjw296 commented Jul 13, 2022

Oh, and yes, plugging in parsers in a more flexible way will need to be part of implementing this. Here's my current sketch test case:

    def test_lazy_load(self, dir):
        yaml = pytest.importorskip("yaml")

        class LazyVault:
            def __init__(self, key):
                self.key = key

            @classmethod
            def from_yaml(cls, loader, node):
                return cls(loader.construct_mapping(node)['key'])

            def load_config(self):
                assert self.key == 'someuser'
                return {'name': 'Some User', 'password': '...'}

        class Loader(yaml.Loader):
            pass

        Loader.add_constructor('!from_vault', LazyVault.from_yaml)

        path = dir.write('lazy.yml', '''
        users:
        - !from_vault {key: someuser}
        ''')
        config = Config.from_path(path, parser=lambda p: yaml.load(p, Loader))
        compare(config.users[0].data, expected={'name': 'Some User', 'password': '...'})

        # make sure clone still works:
        config.clone()
        compare(config.users.data[0], expected=LazyVault(key='someuser'))

@RonnyPfannschmidt
Copy link
Author

A key reason why I wanted to suggest a wrapper is that it would allow item/attributes access to be handled transparent while allowing the wrapper to be the location of both resolvers and caches separated from the other objects

So the config itself would be raw data, and a resolving config would handle the rest

I'll type up a example once im back to the computer

@cjw296
Copy link
Member

cjw296 commented Jul 15, 2022

Yeah, I tried moving towards having ConfigNode instances used much more and it started getting messy quickly.
So, here's another idea, which I think will work well for both #9 and #7:
#12

I'm planning on adding DataMapping and DataSequence, but I think DataValue is actually the most invasive one so wanted to start there.

Thoughts?

@RonnyPfannschmidt
Copy link
Author

I'm currently under the impression that sequences and maps can be easily extended with a shadow sequence that tracks data origins +yaml/toml/json loaders that provide them

My impression is, that this would be painful for values as they would require a pretty magical wrapper

A key reason why I think collections should track is, that it would play nicely with integration of merges

@cjw296
Copy link
Member

cjw296 commented Jul 15, 2022

I think we're agreeing with each other? Thankfully, looks like the wrapper for values doesn't actually end up being that magical :-) In fact, I think the necessary changes are now in #12 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants