Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Settings: Custom parsing of environment variables (non-JSON) #1458

Closed
pjbull opened this issue Apr 29, 2020 · 15 comments · Fixed by #4406
Closed

Settings: Custom parsing of environment variables (non-JSON) #1458

pjbull opened this issue Apr 29, 2020 · 15 comments · Fixed by #4406
Labels
feature request help wanted Pull Request welcome

Comments

@pjbull
Copy link

pjbull commented Apr 29, 2020

Settings: Custom parsing of environment variables (non-JSON)

Often we want complex environment variables that are not represented as JSON. One example is a list of items. It's not uncommon to comma-delimit lists like this in bash:

import os
from typing import List

from pydantic import BaseSettings

os.environ['options'] = "a,b,c"

class Settings(BaseSettings):
    options: List

s = Settings()

This results in a JSONDecodeError. Writing a list of items as valid json is error prone and not human friendly to read in the context of environment variables.

Workaround

One workaround is to store the variable as valid json, which is tricky to type correctly in lots of systems where you have to enter environment variables:

OPTIONS='["a","b","c"]'

Another (simplified) workaround is to set json_loads. but its not very elegant since json_loads doesn't know what field its is parsing, which could be error prone:

import json
import os
from typing import List

from pydantic import BaseSettings, BaseModel, root_validator

os.environ['options'] = "a,b,c"
    
def list_parse_fallback(v):
    try:
        return json.loads(v)
    except Exception as e:
        return v.split(",")

class Settings(BaseSettings):
    options: List
        
    class Config:
        json_loads = list_parse_fallback
       
s = Settings()

I can see a couple options for implementing the fix:

1. Store the parsing method in the field info extra:

parse_func = lambda x: x.split(",")

class Settings(BaseSettings):
    options: List = Field(..., env_parse=parse_func)

If we take this approach, I think that we can update this logic branch:
https://github.com/samuelcolvin/pydantic/blob/42395056e18dfaa3ef299373374ab3b12bb196ac/pydantic/env_settings.py#L170-L174

Adding something like the following:

if field.is_complex():
    if field.extra.get("env_parse", None) is not None:
        try:
            env_val = field.extra["env_parse"](env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error with custom parsing function for "{env_name}"') from e
    else:
        try:
            env_val = self.__config__.json_loads(env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error parsing JSON for "{env_name}"') from e
d[field.alias] = env_val

2. Add a new config option just for Settings for overriding how env vars are parsed

Another implementation option is to add a new property like Settings.Config.parse_env_var which takes the field and the value so that it can be overridden to handle dispatching to different parsing methods for different names/properties of field (currently, just overriding json_loads means you are passed a value without knowing where it will be stored so you have to test for and handle all of the possible settings values.

class BaseSettings:
    ...
    class Config:
         ...
        @classmethod
        def parse_env_var(cls, field, raw_val):
            return cls.json_loads(raw_val)

Then the following line is the only change that is needed:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L62

Changes to self.__config__.parse_env_var(field, env_val)

3. Call field validators on the raw string instead of trying to load from json first

Change the same line to:

env_val, ee = field.validate(env_val)

# collect ee and raise errors

Pros:

  • Adding validators for fields is well documented / understood

Cons:

  • Breaks existing JSON functionality if those fields already have validators
  • Mixes up the abstractions in that the functions would now do parsing and validation

4. Custom (de)serialization

Let fields implement custom serialization/deserialization methods. Currently there is json_encoders but not an equivalent json_decoders for use per-field.

There's some discussion of this here: #951

5. Something else

Other ideas? Happy to implement a different suggestion.


Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: True
                 install path: /Users/bull/miniconda3/envs/sandbox/lib/python3.7/site-packages/pydantic
               python version: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
                     platform: Darwin-19.4.0-x86_64-i386-64bit
     optional deps. installed: []

@samuelcolvin
Copy link
Member

I think we should emove the if field.is_complex() bit of BaseSettings and replace it with a universal validator validate('*', pre=True) which tries decoding JSON, but also in the case of lists supports comma separated lists.

That way you could easily override the validator if you so wished.

I think this should be backwards compatible so could be done before v2. PR welcome.

@samuelcolvin samuelcolvin added the help wanted Pull Request welcome label Apr 30, 2020
@jayqi
Copy link

jayqi commented May 2, 2020

Hi @samuelcolvin, I took a stab at your suggested implementation of moving the json decoding out of the _build_environ method and into a universal validator. Here is what that looks like (before trying to add in any parsing of comma-separated lists):

class BaseSettings(BaseModel):

    ...

    @validator('*', pre=True)
    def validate_env_vars(cls, v, config, field):
        if isinstance(v, str) and field.is_complex():
            try:
                return config.json_loads(v)  # type: ignore
            except ValueError as e:
                raise SettingsError(f'error parsing JSON for "{field}"') from e
        return v

Unfortunately, this implementation ran into a problem with two tests: test_nested_env_with_basemodel and test_nested_env_with_dict.

These two tests have an field top that is an inner mapping with fields apple and banana. These two fields are specified in different places and stitched together: dict {'apple': 'value'} is passed into the constructor and the string '{"banana": "secret_value"}' is set as an environment variable. In the previous implementation, everything happens inside __init__: _build_environ is run first, which decodes the banana json into a dict, and then they are stitched together with the deep_update function.

Moving the json decoding into a validator breaks this functionality because it runs after __init__. Because the banana mapping is still a string when __init__ runs, deep_update isn't able to merge them and instead replaces the banana dict with the apple dict.

Option A: Drop support for merging nested objects

One option would be to drop support for merging nested objects. Complex environment variables seems like an unusual use case to start with, and needing to merge partially specified ones from different sources seems even more unusual.

I discussed this with @pjbull and this is what we like the most: it would simplify the code and remove the need for a deep update (a shallow update would be enough).

Option B: Merging input streams after the universal decoding validator

Currently input streams are merged in __init__ before running the validators. We could try an approach where the universal validator that does decoding runs on each of the input streams first, and then they get merged after.

This doesn't really fit into the existing flow very neatly and would involve making some part of the validation flow more complex.

Option C: Keeping decoding in _build_environ and use a different approach

Such as the approaches that @pjbull brainstormed.

@chbndrhnns
Copy link
Contributor

I faced this issue today, as well. I wanted to parse something like this

REDIS_SENTINELS=192.168.0.1 192.168.0.2

using such a settings class:

class S(BaseSettings):
   sentinels: List[str] = Field(..., env='REDIS_SENTINELS')

    @validator('sentinels', pre=True)
    def validate(cls, val):
        return val.split(' ')

@pikeas
Copy link

pikeas commented Aug 4, 2020

Ditto, I'd like to validate a string env var and transform it into a valid submodel. For example, MY_DB=user:pass@server?max_connections=100 -> a settings model containing a DB submodel with valid DSN and other settings. Currently, it seems I can only pass MY_DB as a stringified JSON object.

@pjbull
Copy link
Author

pjbull commented Aug 4, 2020

Given the complexity of these potential use cases, I'm kind of liking something like proposal (1) in the first comment in this thread. I may have some time on Friday to implement if that approach is interesting

@zdens
Copy link

zdens commented Oct 27, 2020

hi all,
I'm also faced with this issue today.

taking into account this example, after almost a day of digging I realized that validator is not fired in case the class attribute is of List-type. If attribute type is str, for example, the attribute validator works just fine.

I used this code:

import os
from typing import List
from pydantic import BaseSettings, validator

os.environ['test_var'] = 'test_val'


class S1(BaseSettings):
    test_var: str
    
    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return v

class S2(BaseSettings):
    test_var: List[str]

    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return [v]

and then instantiating s1 = S1() prints this validator is called: test_val while the code s2 = S2() throws errors and prints nothing:

>>> s2 = S2()
Traceback (most recent call last):
  File "pydantic/env_settings.py", line 118, in pydantic.env_settings.BaseSettings._build_environ
  File "/home/den/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "pydantic/env_settings.py", line 35, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/env_settings.py", line 48, in pydantic.env_settings.BaseSettings._build_values
  File "pydantic/env_settings.py", line 120, in pydantic.env_settings.BaseSettings._build_environ
pydantic.env_settings.SettingsError: error parsing JSON for "test_var"

is there any errors in my example code?

@jayqi
Copy link

jayqi commented Oct 29, 2020

@zdens per some of the discussion earlier in the thread (kind of mixed in there with proposed changes), the reason you don't see the validator firing is because the code that is failing is the part that parses the environment variables, and that happens before the validators run.

Your variable that is the list needs to be valid JSON, like this:

OPTIONS='["a","b","c"]'

That's why you see a JSONDecodeError.

@zdens
Copy link

zdens commented Oct 29, 2020

@jayqi ,
thank you for your reply!
now I understood why this thread has been started.

@pmsoltani
Copy link

As a workaround (until something solid is implemented) I changed the following

CORS_ORIGINS: List[AnyHttpUrl] = Field(..., env="CORS_ORIGINS")

into this:

CORS_ORIGINS: Union[str, List[AnyHttpUrl]] = Field(..., env="CORS_ORIGINS")

Which, admittedly, is not elegant but enables pydantic to fire up the validator function:

@validator("CORS_ORIGINS", pre=True)
    def _assemble_cors_origins(cls, cors_origins):
        if isinstance(cors_origins, str):
            return [item.strip() for item in cors_origins.split(",")]
        return cors_origins

My .env file:

# Use comma (,) to separate URLs
CORS_ORIGINS=http://localhost:3000,http://localhost:8000

@samuelcolvin
Copy link
Member

I've just discovered how annoying this can be myself while working on a github action for pydantic 🙈.

I'd love someone to fix this, otherwise I will soon.

@tiangolo
Copy link
Member

tiangolo commented May 3, 2021

I planned on working on #1848 again soon (re-implement/finish it), but after finishing with #2721

@pjbull
Copy link
Author

pjbull commented May 3, 2021

🎉 I think that if #1848 goes in then we can check for those types here:
https://github.com/samuelcolvin/pydantic/blob/42395056e18dfaa3ef299373374ab3b12bb196ac/pydantic/env_settings.py#L170-L174

And then we can keep the JSON fallback behavior as well.

@jooola
Copy link

jooola commented Jan 1, 2022

I suggest that this issue should also consider adding the ability to parse environment variables into dictionaries directly:

import os
from typing import Optional, Dict
from pydantic import BaseSettings, Field, validator

os.environ["COLORS"] = "red:#FF0000,blue:#0000FF,green:#00FF00"


class Example(BaseSettings):
    colors: Optional[Dict[str, str]] = Field(None, env="COLORS")

    # Result would be:
    colors = {"red": "#FF0000", "blue": "#0000FF", "green": "#00FF00"}

@filipe-cantarelli
Copy link

It seems with the addition of SecretsSettingsSource the parsing code got a little bit duplicate.

I think that now it makes more sense to follow Option 1 described by @pjbull, but also to delegate the whole parsing logic to the Field, i.e: field.parse(value), instead of having the following code in all sources

if field.is_complex():
          try:
              secret_value = settings.__config__.json_loads(secret_value)
          except ValueError as e:
              raise SettingsError(f'error parsing JSON for "{env_name}"') from e

I could try this implementation if desired @samuelcolvin.

@rt87
Copy link

rt87 commented Aug 18, 2022

+1, this json stuff for everything that is of non-basic type is somewhat annoying

acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Aug 19, 2022
acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Aug 19, 2022
samuelcolvin added a commit that referenced this issue Aug 22, 2022
…se_env_var in Config object (#4406)

* Fix #1458 - Allow for custom parsing of environment variables via env_parse

* Add docs for env_parse usage

* Add changes file for #3977

* fixup: remove stray print statement

* Revert env_parse property on field

* Add parse_env_var classmethod in nested Config

* Update documentation for parse_env_var

* Update changes file.

* fixup: linting in example

* Rebase and remove quotes around imported example

* fix example

* my suggestions

* remove unnecessary Field(env_parse=_parse_custom_dict)

Co-authored-by: Samuel Colvin <s@muelcolvin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment