Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add the ability to add extra settings sources #2107

Merged
merged 7 commits into from Feb 11, 2021

Conversation

kozlek
Copy link
Contributor

@kozlek kozlek commented Nov 9, 2020

Change Summary

  • refactor BaseSettings internal logic
  • expose filter_relevant_env_vars and load_env_vars_from_source functions to ease creation of external sources plugins
  • add extra_settings_sources ModelConfig key for BaseSettings

More info in the linked issue.

Related issue number

#2106

Checklist

  • Unit tests for the changes exist
  • Tests pass on CI and coverage remains at 100%
  • Documentation reflects the changes where applicable
  • changes/<pull request or issue id>-<github username>.md file added describing change
    (see changes/README.md for details)

@codecov
Copy link

codecov bot commented Nov 9, 2020

Codecov Report

Merging #2107 (25f0767) into master (13a5c7d) will decrease coverage by 0.11%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##            master    pydantic/pydantic#2107      +/-   ##
===========================================
- Coverage   100.00%   99.88%   -0.12%     
===========================================
  Files           21       22       +1     
  Lines         4199     4351     +152     
  Branches       854      875      +21     
===========================================
+ Hits          4199     4346     +147     
- Misses           0        5       +5     
Impacted Files Coverage Δ
pydantic/env_settings.py 100.00% <100.00%> (ø)
pydantic/types.py 100.00% <0.00%> (ø)
pydantic/fields.py 100.00% <0.00%> (ø)
pydantic/schema.py 100.00% <0.00%> (ø)
pydantic/typing.py 100.00% <0.00%> (ø)
pydantic/_hypothesis_plugin.py 95.68% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 13a5c7d...25f0767. Read the comment docs.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be useful, but I think it needs to be rethought. It will also need tests to pass and lots of docs.

)

def filter_relevant_env_vars(self, env_vars: Mapping[str, Optional[str]]) -> Dict[str, Optional[str]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method can be private. This also makes sure that it can't conflict with settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with most of the methods below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this down to where the code was before? If so hopefully we can reduce the number of changes and keep history easy to track.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to revert this modification as the new implementation will only use load_env_vars_from_source.
Also, load_env_vars_from_source will be made private to avoid conflicts as you requested.

for field in self.__fields__.values():
for env_name in field.field_info.extra['env_names']:
value = loader(env_name)
if value != undefined:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely we should use None or KeyError here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the undefined as it allows the source to support any type including None.
About the KeyError check, I think it's too specific to dict based sources: in the case of a REST API based source, a call to loader can trigger an API call that simply returns a flat value.
What do you think ?

@@ -44,19 +45,58 @@ def _build_values(
_env_file_encoding: Optional[str] = None,
_secrets_dir: Union[Path, str, None] = None,
) -> Dict[str, Any]:
extra_settings = [source(self) for source in reversed(self.__config__.extra_settings_sources)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the reverse?

Also shouldn't this be

Suggested change
extra_settings = [source(self) for source in reversed(self.__config__.extra_settings_sources)]
extra_settings = [self.load_env_vars_from_source(source) for source in reversed(self.__config__.extra_settings_sources)]

?

I think this would reduce the logic and complexity in your customer loaders/sources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained in my latest comment, the settings source sequence is expressed from the highest priority to the lowest one.
We need to reverse it to send the settings dicts to deep_update from the lowest priority to the highest one.

In the next revision, I'll move to your implementation as all sources can be treated as case 2 sources.

@samuelcolvin
Copy link
Member

please see pydantic/pydantic-settings#32 and its PR #2154.

We need a way to make customising the priority of settings sources easier as well as adding new ones. I would propose something like this:

from typing import Tuple, Any, Callable
from pydantic import BaseSettings

SettingsSourceCallable = Callable[[str], Any]

class Settings(BaseSettings):
    foo: str
    bar: str

    class Config:
        @classmethod
        def customise_sources(
            cls,
            init_settings: SettingsSourceCallable,
            env_settings: SettingsSourceCallable,
            file_secret_settings: SettingsSourceCallable,
        ) -> Tuple[SettingsSourceCallable, ...]:
            return init_settings, env_settings, file_secret_settings

Here customise_sources performs the default behaviour, but you could change the behaviour by altering the order of the functions returned, or adding your own. Each function would then be passed to load_env_vars_from_source to build dicts which are then merged using deep_update like currently in _build_values.

Advantages of this approach:

  • customise_sources (or whatever we end up calling it) is a public function without the risk of clashing with fields
  • this allows (almost) complete flexibility with a reasonably simple interface

Disadvantages:

@kozlek
Copy link
Contributor Author

kozlek commented Nov 30, 2020

First of all, thanks for your feedback 🙏

I agree with your solution involving a customise_sources classmethod. This lets full operability to the user, while its complexity is not so disturbing as most of people will stick with standard env priority & sources.

SettingsSourceCallable implementation

About the implementation of SettingsSourceCallable , I thought we could split sources in two main categories:

  1. Sources that are able to provide a full dict with all the source's variables. This is the case of the .env file or a JSON config file: it is more efficient to load the full file and parse it once, rather than reading it for each settings field.
  2. Sources that load variables one by one. This is the case of Docket secrets or any proprietary API that don't provide a list endpoint: we may want to load a variable only if this is absolutely necessary - if the variable cannot be found in higher priority sources.

In the first revision of the PR, I let the filter_relevant_env_vars and load_env_vars_from_source methods as public utils to help users build custom sources.
filter_relevant_env_vars allows to sort useful variables from irrelevant ones thanks to its access to self.__fields__. If we don't filter out the extra variables, it forces us to use extra='ignore'. Also, we gain access to the field instance and the is_complex property, which is useful to load nested settings.
load_env_vars_from_source uses the same access to load only the necessary fields.

However, I see now we can simplify the workflow if the custom source handles the "bulk loading" itself (case 1).

import json
from functools import cached_property
from pathlib import Path

class JSONConfigSource:
    def __init__(self, path: Path = 'config.json'):
        self.config_path = path

    @cached_property
    def json_config(self):
        return json.loads(self.config_path.read_text())

    def __call__(self, env_name: str, field: ModelField, settings: BaseSettings):
        return self.json_config.get(env_name, undefined)

In this case, all sources can move to the case 2 and we can simplify the workflow as suggest. We still have to edit slightly the load_env_vars_from_source method to handle the is_complex case but that's not complicated.
Even if most source implementations can work with only the env_name, I think we should pass both field and settings instances to the SettingsSourceCallable as they provide useful context. I probably won't not use it in my own implementations but others may.

Settings priority

As you mentioned it in your comments, settings priority seems to be important for some people.
I think your solution fit perfectly with their needs and go even further as it allows to disable some of the built-in sources !
About the expression of the desired order of the settings, I think we agree to say we declare settings from the highest to lowest priority.
That's why we have to reverse the settings sequence when passing them to the deep_update method.

Custom source config

Built-in sources like .env or docker secrets can be configured directly using BaseSettings(_env_file=".env", _secrets_dir="/secrets/").
For now, custom sources can be configured using class based callable:

from pathlib import Path
from pydantic import BaseSettings
from .ext_sources import JSONConfigSource  # defined ealier

class Settings(BaseSettings):
    foo: str
    bar: str

    class Config:
        @classmethod
        def customise_sources(
            cls,
            init_settings: SettingsSourceCallable,
            env_settings: SettingsSourceCallable,
            file_secret_settings: SettingsSourceCallable,
        ) -> Tuple[SettingsSourceCallable, ...]:
            json_config_settings = JSONConfigSource(path=Path("config.json"))
            # here we disable both env_settings and file_secret_settings
            return init_settings, json_config_settings

settings = Settings()

That's not ideal but it works so we can start like this.

Based on your comments and the related issues, the whole feature appears clearer to me. Let me know if you want to add / change something else. I'll try to do the necessary changes by the end of the week, including docs & tests 🙂

@kozlek
Copy link
Contributor Author

kozlek commented Dec 4, 2020

By defining a common type for all settings sources (SettingsSourceCallable = Callable[[str], Any]), we consider there are all equals.
But as of today, this is not true.

Init settings

init_settings has multiples particularities that make it different from the others sources:

  • all init variables must be loaded even if there are not defined in the model fields: depending on the extra settings, the model will raise a ValidationError (default, Extra.forbid) or ignore the fields.
  • init variables are loaded using field.alias while others sources use env_names.

More than simply messing up with unit tests, this would introduce breaking changes, which we might want to avoid.
Multiple solutions are possible:

  1. Accept the following breaking changes:
  • Extra init kwargs will be always ignored (the extra will be useless)
  • Init kwargs will have to be named using the env_name rather the field.name (not really ideal 😕)
  1. Treat InitSettingsSource as special case: either force it to be the first settings source (not acceptable for BaseSettings: Customization of field value priority via Config pydantic-settings#32) either always load all variables for this source. Also, this source will use a custom load_env_from_source method that will use field.name instead of env_names.

  2. Add field.name to the env_name, so both env_names and field.name will be used: this could lead to an unexpected load of variables after pydantic's upgrade.

  3. Move the loading strategy inside the SettingsSourceCallable: this will work well and increase the , but it will break the DRY pattern offered by the load_env_from_source method.

For now, I'll start my implementation with the solution 4) as this is the most complete and the closest from the actual source code.

@kozlek kozlek force-pushed the extra-settings-sources branch 2 times, most recently from c21de41 to e8d65f6 Compare December 5, 2020 01:25
@kozlek
Copy link
Contributor Author

kozlek commented Dec 5, 2020

I've modified my implementation to include a way of customising the priority of settings sources.
I had to make some tradeoff in my implementation to avoid breaking changes, as explain in my previous post ; let me know if you want me to change something.
I also added some docs with examples to explain how to reorder the settings, add / remove settings sources.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is looking great, just a few things to fix.

If the default order of priority doesn't match your needs, it's possible to change it by overriding a config method:

```py
class Settings(BaseSettings):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this code (and the rest below) into python files in examples.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might need to add one or two files instead of ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's cleaner this way 🙂

docs/usage/settings.md Outdated Show resolved Hide resolved
docs/usage/settings.md Outdated Show resolved Hide resolved
pydantic/env_settings.py Show resolved Hide resolved
tests/test_settings.py Outdated Show resolved Hide resolved
@michaeloliverx
Copy link

This is a really useful feature. I would love to be able to load configuration via a local .toml config file while allowing overriding via environment variables.

Copy link
Member

@PrettyWood PrettyWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This customization will be very handy and is very explicit! LGTM! 👍
Samuel may still have some remarks though

Copy link
Member

@PrettyWood PrettyWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just forgot to test your custom __repr__ for ...SettingsSource and to add a change file

@kozlek
Copy link
Contributor Author

kozlek commented Jan 20, 2021

This customization will be very handy and is very explicit! LGTM! 👍
Samuel may still have some remarks though

Thanks 🙏

You just forgot to test your custom __repr__ for ...SettingsSource and to add a change file

You're right, I added the missing test and a change file to my latest commit. Tell me if we are missing anything else 🙂

changes/2107-kozlek.md Outdated Show resolved Hide resolved
Co-authored-by: Eric Jolibois <em.jolibois@gmail.com>
@samuelcolvin samuelcolvin merged commit 1155de8 into pydantic:master Feb 11, 2021
@samuelcolvin
Copy link
Member

thanks so much. I've extended the docs slightly as the original docs on this were pretty minimal.

@kozlek
Copy link
Contributor Author

kozlek commented Feb 11, 2021

No problem, thanks for taking the time to merge this one 👌
Don't hesitate to ping me about settings-related issues & feature requests, it will be a pleasure to help when I have some time 😉

@DomWeldon
Copy link

As a thank-you to everyone on this thread, I spent yesterday browsing the master branch and saw this hook before I saw this issue. I was just testing why it didn't work now and saw that it was because this feature isn't released yet!

It's a really nice implementation and fits my needs exactly - thanks. Looking forward to this hitting a new release.

In the meantime, my draft implementation - to pull some secrets from SSM in prod - is at the gist below.

https://gist.github.com/DomWeldon/ce7e070283d97368cd9abc5be71b247d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants