Question/proposal: Lazy sources #181

Ohad31415 · 2023-10-30T10:01:34Z

Some settings sources like fetching a settings key from a key-value store are better skipped if already a higher priority source can fill that spot, to save time, api calls etc.

Let's take this for example -

class Settings(BaseSettings):
    foo: str
    bar: str

We have a key-value source that would try fetch values for foo and bar fields, but in case we already have an env that gives a value for foo it'd be good to be able to skip the evaluation for the next source.
This is not what's happening now since BaseSettings at init time evaluates all sources and unpack them to a unified dict.

Can we have such "lazy" source evaluated only if needed?

hramezani · 2023-10-30T13:35:40Z

@Ohad31415 Thanks for reporting this!

Yeah, right now there is no config or something to skip the evaluation of some settings sources.
You can do it by overriding the BaseSettings._settings_build_values which is not a good idea.

BTW, any idea/contribution here is welcome.

hramezani · 2023-11-08T12:34:50Z

You can change the priority of sources or even not include them. by removing the source class from the result list of settings_customise_sources function, it won't be evaluated.

jessemyers-lettuce · 2023-12-12T20:09:26Z

@hramezani I don't think changing the priority of sources works. If I'm reading the source correctly, sources are applied using:

return deep_update(*reversed([source() for source in sources]))

which has the effect of invoking every source. That means that if a source is slow and you want it to be lazy, it will still be called, even if another source provides the value.

I'd love to be wrong here because I have a source that is slow and really want a way to only call it if higher priority sources come up empty.

hramezani · 2023-12-13T12:12:09Z

Yeah, it invokes every settings source class that returns from settings_customise_sources. So, by having a custom settings_customise_sources you can ignore some sources.

But in general I agree with you that it's not a lazy source. you can only exclude the sources.

I'd love to be wrong here because I have a source that is slow and really want a way to only call it if higher priority sources come up empty.

what do you mean by that, what if a higher priority source returns non-empty result but the result dict does not contain all the required field data. in this case if we ignore other sources, pydantic-settigns can't find data for other fields.

jessemyers-lettuce · 2023-12-13T16:10:33Z

what if a higher priority source returns non-empty result but the result dict does not contain all the required field data

I can think of two possible answers.

First, you could provide context to the source() call, telling it the merged dictionary so far contains and letting the source decide whether to load data or not.

Second, you could evaluate the model iteratively and only proceed to the next highest source if data is missing. This approach is weaker because it requires you to make a judgement call about the definition of missing and that's bound to break someone's use case.

Taking a step back, let me elaborate on my use case. I wrote a source that loads data from AWS SSM Parameter Store. We use this in the context of Lambda functions because Lambda does not have a built-in way to inject secrets as environment variables. We use Annotated to provide metadata to the settings so they know where in SSM to look, something like:

class FooSettings(OurSettingsBaseClass):
   bar: Annotated[SecretStr, FromParameter("/path/to/bar")]

We also have software running in AWS ECS which does support environment-based injection of secrets. We also have some shared library software that defines common settings used in both ECS and Lambdas and we currently provide settings classes for use in both cases. Rather than make different versions of settings for each case -- which is maybe what we should have done -- we extend the base class and let the sources resolve whether the secret comes from SSM or the environment.

The nit here is that we spend time querying SSM even if we have the data locally (e.g. from the environment or statically in a unit test). This adds observable latency.

Naively, I was hoping I could write the source to only load if the data it needed was not already available.

hramezani · 2023-12-13T16:21:29Z

We need to make the source classes aware of other sources provided data then the source can decide about.

with the current pydantic-settings structure, source classes are not aware of each other. So, probably your approach of writing a source is the best for now.

pydantic-hooky bot assigned dmontagu Oct 30, 2023

pydantic-hooky bot added the unconfirmed label Oct 30, 2023

hramezani added enhancement New feature or request feature request and removed unconfirmed labels Oct 30, 2023

hramezani removed enhancement New feature or request feature request labels Nov 8, 2023

hramezani closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question/proposal: Lazy sources #181

Question/proposal: Lazy sources #181

Ohad31415 commented Oct 30, 2023

hramezani commented Oct 30, 2023

hramezani commented Nov 8, 2023

jessemyers-lettuce commented Dec 12, 2023

hramezani commented Dec 13, 2023

jessemyers-lettuce commented Dec 13, 2023

hramezani commented Dec 13, 2023

Question/proposal: Lazy sources #181

Question/proposal: Lazy sources #181

Comments

Ohad31415 commented Oct 30, 2023

hramezani commented Oct 30, 2023

hramezani commented Nov 8, 2023

jessemyers-lettuce commented Dec 12, 2023

hramezani commented Dec 13, 2023

jessemyers-lettuce commented Dec 13, 2023

hramezani commented Dec 13, 2023