Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question/proposal: Lazy sources #181

Closed
Ohad31415 opened this issue Oct 30, 2023 · 6 comments
Closed

Question/proposal: Lazy sources #181

Ohad31415 opened this issue Oct 30, 2023 · 6 comments
Assignees

Comments

@Ohad31415
Copy link

Some settings sources like fetching a settings key from a key-value store are better skipped if already a higher priority source can fill that spot, to save time, api calls etc.

Let's take this for example -

class Settings(BaseSettings):
    foo: str
    bar: str

We have a key-value source that would try fetch values for foo and bar fields, but in case we already have an env that gives a value for foo it'd be good to be able to skip the evaluation for the next source.
This is not what's happening now since BaseSettings at init time evaluates all sources and unpack them to a unified dict.

Can we have such "lazy" source evaluated only if needed?

@hramezani
Copy link
Member

@Ohad31415 Thanks for reporting this!

Yeah, right now there is no config or something to skip the evaluation of some settings sources.
You can do it by overriding the BaseSettings._settings_build_values which is not a good idea.

BTW, any idea/contribution here is welcome.

@hramezani hramezani added enhancement New feature or request feature request and removed unconfirmed labels Oct 30, 2023
@hramezani
Copy link
Member

You can change the priority of sources or even not include them. by removing the source class from the result list of settings_customise_sources function, it won't be evaluated.

@hramezani hramezani removed enhancement New feature or request feature request labels Nov 8, 2023
@jessemyers-lettuce
Copy link

@hramezani I don't think changing the priority of sources works. If I'm reading the source correctly, sources are applied using:

return deep_update(*reversed([source() for source in sources]))

which has the effect of invoking every source. That means that if a source is slow and you want it to be lazy, it will still be called, even if another source provides the value.

I'd love to be wrong here because I have a source that is slow and really want a way to only call it if higher priority sources come up empty.

@hramezani
Copy link
Member

Yeah, it invokes every settings source class that returns from settings_customise_sources. So, by having a custom settings_customise_sources you can ignore some sources.

But in general I agree with you that it's not a lazy source. you can only exclude the sources.

I'd love to be wrong here because I have a source that is slow and really want a way to only call it if higher priority sources come up empty.

what do you mean by that, what if a higher priority source returns non-empty result but the result dict does not contain all the required field data. in this case if we ignore other sources, pydantic-settigns can't find data for other fields.

@jessemyers-lettuce
Copy link

what if a higher priority source returns non-empty result but the result dict does not contain all the required field data

I can think of two possible answers.

First, you could provide context to the source() call, telling it the merged dictionary so far contains and letting the source decide whether to load data or not.

Second, you could evaluate the model iteratively and only proceed to the next highest source if data is missing. This approach is weaker because it requires you to make a judgement call about the definition of missing and that's bound to break someone's use case.

Taking a step back, let me elaborate on my use case. I wrote a source that loads data from AWS SSM Parameter Store. We use this in the context of Lambda functions because Lambda does not have a built-in way to inject secrets as environment variables. We use Annotated to provide metadata to the settings so they know where in SSM to look, something like:

class FooSettings(OurSettingsBaseClass):
   bar: Annotated[SecretStr, FromParameter("/path/to/bar")]

We also have software running in AWS ECS which does support environment-based injection of secrets. We also have some shared library software that defines common settings used in both ECS and Lambdas and we currently provide settings classes for use in both cases. Rather than make different versions of settings for each case -- which is maybe what we should have done -- we extend the base class and let the sources resolve whether the secret comes from SSM or the environment.

The nit here is that we spend time querying SSM even if we have the data locally (e.g. from the environment or statically in a unit test). This adds observable latency.

Naively, I was hoping I could write the source to only load if the data it needed was not already available.

@hramezani
Copy link
Member

We need to make the source classes aware of other sources provided data then the source can decide about.

with the current pydantic-settings structure, source classes are not aware of each other. So, probably your approach of writing a source is the best for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants