-
-
Notifications
You must be signed in to change notification settings - Fork 115
Feature: Add lazy load support in GCP #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks @fede-bello for the PR. I think we only want it for I think it doesn't make sense to have lazy loading for the env source or dotenv. People usually initialize settings on application startup and they usually do it once. |
|
Do we want it for other cloud providers? AWS or Azure? Or just GCP that it's what I was able to test? |
230dc09 to
482b847
Compare
16e6fa8 to
e401bc4
Compare
Let's do it for GCP now because you can test it and probably maintain it later. |
ced9069 to
a4e0d5b
Compare
| 1. **Initialization**: Settings are created with minimal overhead. Sources return empty dictionaries instead of eagerly fetching all values. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- What happens if other sources' values have more priority than GCP settings source?
- What happens if the value provided by a source is not a valid value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Not sure in what sense you mean. If I understand your question correctly, higher priority sources shadow lower ones. So if theres a key in a higher priority source, that one will be loaded and the one for GCP won't be consulted
- Trying to access will return None. It will enter the
get_field_valuemethod inEnvSettingsSource, thefield_valuewill not be found and will return None
PS: left a fix with an issue with the model dump that wasn't loading the lazy fields
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to access will return None. It will enter the get_field_value method in EnvSettingsSource, the field_value will not be found and will return None
I mean if the value is not a valid value for the field. like you defined an int field but the value is string. or you put some limitation on the string length.
a4e0d5b to
ec75ebd
Compare
ec75ebd to
3894a89
Compare
| env_ignore_empty: bool | None = None, | ||
| env_parse_none_str: str | None = None, | ||
| env_parse_enums: bool | None = None, | ||
| lazy_load: bool | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we keep this here? we agreed to enable lazy loading for GCP secret source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it there so eventually is easier to implement for other sources providers and it's not a only-gcp fix. It the parameter is not passed to the base class the logic shouldn't change
|
@fede-bello I think this is going to be complicated. Like we need to add and maintain a lot of code, and the lazy loading values from GCP is not important IMHO. As What do you think? |
|
I don't know if there's a more straightforward way to implement the feature The thing is that we live far from our gcp instances are, and each secret fetch can take up to 2-3 seconds. It's not the worst but it can get annoying when having a lot of secrets and performing short tests. A 20 second start up just to try a path that doesn't use a secret Maybe we can add a wrapper to the gcp ourselves and not even have to modify the library, but seems like a issue that more people might have in the future |
|
Hey @hramezani. Just some context, I'm working with @fede-bello on a feature that needs a solution to the issue he described in his last comment; that's why you see some comments from me from time to time :). I agree this functionality doesn't super align with Pydantic's ethos of validating data and erroring out. It's also been an issue with FastAPI apis we've had to produce where we have some Implementing lazy loading for this cloud provider is a bit of a slippery slope as well. Adding one lazy loader may open up to implementing all sources as lazy, and thay will probably bring issues in the future, since Pydantic is designed with front-loading and validation in mind. However, I'd be surprised if we're the only ones facing this limitation. Aside from the latency Fede described (imagine having a CLI and having to wait many seconds to see the
Aside from the caveats I mentioned, I don't see why doing this #713 (comment) wouldn't work for us. Maybe this is just a matter of documenting this "hack" and its caveats, instead of altering all the core and increasing the maintenance burden on you guys, which isn't ideal. In essence what we're looking for is a post-process hook that runs just before returning the retrieved attribute and has access to the instance. That's why my suggestion was an override to the |
|
Hey @sebastian-correa. Thanks for your explanation.
Yeah, the whole point of
I was working on a code base like the thing that you described. at the end we solved the problem by splitting our settings and not loading unrelated values in containers. each container only receives the needed values. then only containers that needs values from e.g. GCP secret have to wait for it.
Aside from the complexity and future maintenance problems, the fact that we can't validate the value provided in GCP secret(I mentioned it in the first paragraph) is important for me because having a lazy gcp settings source means ignoring validation for gcp secret source. Probably not related, take a look at pydantic validator doc. pre field and model validator might be helpful for you |
|
@hramezani yes, setting (secret) splitting can solve the issue, I agree. In fact, our settings are split in reality (not a single file). To clarify: I didn't mean for my #713 (comment) to be incorporated into the codebase. I was looking for validation of the code from someone who probably knows the internals better than we do, to see if the implementation makes sense. If it does, we'd be providing this internally to users without any changes to Pydantic or def _setting_is_cloud_secret(name: str, value: str) -> bool: # noqa: ARG001 needed to comply with interface.
"""Check if the given setting (as name and value) is special.
Currently, a setting is special if its value starts with "__cloud_secret:".
Args:
name (str): Setting name.
value (str): Setting value.
Returns:
bool: `True` if special.
"""
return value.startswith("__cloud_secret:")
def _maybe_return_from_gcp_and_update(getter: Callable) -> Callable:
"""If the value got by `getter` `_setting_is_cloud_secret`, replace it from from GCP.
This should decorate methos of classes, as the returned funtion expects `self` as the first arg.
Args:
getter (Callable): The getter, used to find the current value.
Returns:
Callable: A callable, to replace the previous one, that performs the
`_setting_is_cloud_secret` on the name and current value of the setting, and fetches
the setting from GCP if it is special.
"""
# This is intended as a decorator of `Dynabox` and `BoxList`, so the first arg must be `self`.
@wraps(getter)
def check_and_replace(self, item, *args, **kwargs):
current_value = getter(self, item, *args, **kwargs)
if isinstance(current_value, str) and _setting_is_cloud_secret(item, current_value):
_, name, version = current_value.split(":")
new_value = _get_secret_from_gcp(name, version)
self[item] = new_value
return new_value
return current_value
return check_and_replace
# Monkey patch Dynaconf's container classes to include our custom post-processing.
dynaconf.utils.boxing.DynaBox.__getattr__ = _maybe_return_from_gcp_and_update(
dynaconf.utils.boxing.DynaBox.__getattr__
)
dynaconf.utils.boxing.DynaBox.__getitem__ = _maybe_return_from_gcp_and_update(
dynaconf.utils.boxing.DynaBox.__getitem__
)
dynaconf.vendor.box.box_list.BoxList.__getitem__ = _maybe_return_from_gcp_and_update(
dynaconf.vendor.box.box_list.BoxList.__getitem__
)
dynaconf.utils.boxing.DynaBox.get = _maybe_return_from_gcp_and_update(
dynaconf.utils.boxing.DynaBox.get
)The bit about hooks was a suggestion. It seems like Pydantic could provide some mechanism to hook into access, with validation incorporated. But, again, this would probably go against Pydantic's ethos. It would also add some overhead to access which may be undesirable. Model and field validators hook into the validation process, not the access process. Unfortunately, those wouldn't help us solve the issue. |
Lazy Loading Support for Pydantic Settings Sources
Summary
This PR implements lazy loading for settings sources, deferring field value resolution until fields are accessed rather than fetching all values during initialization. This enables significant performance improvements for expensive operations such as API calls to cloud secret managers.
Solves #713
What Changed
Problem
Currently, all settings sources eagerly fetch values for every field during
Settingsinstantiation, even if those fields are never accessed. This is problematic for expensive operations:Solution: LazyMapping
The implementation introduces a
LazyMappingclass, a dict-like mapping that:__getitem__()When
lazy_load=True:__call__()LazyMappingis stored onsource._lazy_mappingTest Coverage
Note: Integration tests were performed for GCP Secret Manager. The fix is implemented at the
PydanticBaseEnvSettingsSourceclass level, so all inheriting providers automatically support lazy loading. The parameter was only added for GCP Secret Manager, but extending it to new providers should be as simple as adding the parameter.Why LazyMapping
Backward Compatibility
lazy_loaddefaults toFalse, preserving eager loading behavior.Alternative Approaches Considered
I don’t think this is the most intuitive implementation, and I initially wanted something simpler. However, I've discussed some other options and nothing convinced me:
Lazy attribute access on Settings (
__getattr__)settings.db_password)Separate
LazySettingsclassProperty-based field access
Async initialization
async def __init__()to fetch values asynchronouslyawait. Too invasive.