Settings: Custom parsing of environment variables (non-JSON) #1458

pjbull · 2020-04-29T23:58:02Z

Settings: Custom parsing of environment variables (non-JSON)

Often we want complex environment variables that are not represented as JSON. One example is a list of items. It's not uncommon to comma-delimit lists like this in bash:

import os
from typing import List

from pydantic import BaseSettings

os.environ['options'] = "a,b,c"

class Settings(BaseSettings):
    options: List

s = Settings()

This results in a JSONDecodeError. Writing a list of items as valid json is error prone and not human friendly to read in the context of environment variables.

Workaround

One workaround is to store the variable as valid json, which is tricky to type correctly in lots of systems where you have to enter environment variables:

OPTIONS='["a","b","c"]'

Another (simplified) workaround is to set json_loads. but its not very elegant since json_loads doesn't know what field its is parsing, which could be error prone:

import json
import os
from typing import List

from pydantic import BaseSettings, BaseModel, root_validator

os.environ['options'] = "a,b,c"
    
def list_parse_fallback(v):
    try:
        return json.loads(v)
    except Exception as e:
        return v.split(",")

class Settings(BaseSettings):
    options: List
        
    class Config:
        json_loads = list_parse_fallback
       
s = Settings()

I can see a couple options for implementing the fix:

1. Store the parsing method in the field info extra:

parse_func = lambda x: x.split(",")

class Settings(BaseSettings):
    options: List = Field(..., env_parse=parse_func)

If we take this approach, I think that we can update this logic branch:
https://github.com/samuelcolvin/pydantic/blob/42395056e18dfaa3ef299373374ab3b12bb196ac/pydantic/env_settings.py#L170-L174

Adding something like the following:

if field.is_complex():
    if field.extra.get("env_parse", None) is not None:
        try:
            env_val = field.extra["env_parse"](env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error with custom parsing function for "{env_name}"') from e
    else:
        try:
            env_val = self.__config__.json_loads(env_val)  # type: ignore
        except ValueError as e:
            raise SettingsError(f'error parsing JSON for "{env_name}"') from e
d[field.alias] = env_val

2. Add a new config option just for Settings for overriding how env vars are parsed

Another implementation option is to add a new property like Settings.Config.parse_env_var which takes the field and the value so that it can be overridden to handle dispatching to different parsing methods for different names/properties of field (currently, just overriding json_loads means you are passed a value without knowing where it will be stored so you have to test for and handle all of the possible settings values.

class BaseSettings:
    ...
    class Config:
         ...
        @classmethod
        def parse_env_var(cls, field, raw_val):
            return cls.json_loads(raw_val)

Then the following line is the only change that is needed:
https://github.com/samuelcolvin/pydantic/blob/master/pydantic/env_settings.py#L62

Changes to self.__config__.parse_env_var(field, env_val)

3. Call field validators on the raw string instead of trying to load from json first

Change the same line to:

env_val, ee = field.validate(env_val)

# collect ee and raise errors

Pros:

Adding validators for fields is well documented / understood

Cons:

Breaks existing JSON functionality if those fields already have validators
Mixes up the abstractions in that the functions would now do parsing and validation

4. Custom (de)serialization

Let fields implement custom serialization/deserialization methods. Currently there is json_encoders but not an equivalent json_decoders for use per-field.

There's some discussion of this here: #951

5. Something else

Other ideas? Happy to implement a different suggestion.

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: True
                 install path: /Users/bull/miniconda3/envs/sandbox/lib/python3.7/site-packages/pydantic
               python version: 3.7.6 (default, Jan  8 2020, 13:42:34)  [Clang 4.0.1 (tags/RELEASE_401/final)]
                     platform: Darwin-19.4.0-x86_64-i386-64bit
     optional deps. installed: []

The text was updated successfully, but these errors were encountered:

samuelcolvin · 2020-04-30T18:57:14Z

I think we should emove the if field.is_complex() bit of BaseSettings and replace it with a universal validator validate('*', pre=True) which tries decoding JSON, but also in the case of lists supports comma separated lists.

That way you could easily override the validator if you so wished.

I think this should be backwards compatible so could be done before v2. PR welcome.

jayqi · 2020-05-02T00:59:59Z

Hi @samuelcolvin, I took a stab at your suggested implementation of moving the json decoding out of the _build_environ method and into a universal validator. Here is what that looks like (before trying to add in any parsing of comma-separated lists):

class BaseSettings(BaseModel):

    ...

    @validator('*', pre=True)
    def validate_env_vars(cls, v, config, field):
        if isinstance(v, str) and field.is_complex():
            try:
                return config.json_loads(v)  # type: ignore
            except ValueError as e:
                raise SettingsError(f'error parsing JSON for "{field}"') from e
        return v

Unfortunately, this implementation ran into a problem with two tests: test_nested_env_with_basemodel and test_nested_env_with_dict.

These two tests have an field top that is an inner mapping with fields apple and banana. These two fields are specified in different places and stitched together: dict {'apple': 'value'} is passed into the constructor and the string '{"banana": "secret_value"}' is set as an environment variable. In the previous implementation, everything happens inside __init__: _build_environ is run first, which decodes the banana json into a dict, and then they are stitched together with the deep_update function.

Moving the json decoding into a validator breaks this functionality because it runs after __init__. Because the banana mapping is still a string when __init__ runs, deep_update isn't able to merge them and instead replaces the banana dict with the apple dict.

Option A: Drop support for merging nested objects

One option would be to drop support for merging nested objects. Complex environment variables seems like an unusual use case to start with, and needing to merge partially specified ones from different sources seems even more unusual.

I discussed this with @pjbull and this is what we like the most: it would simplify the code and remove the need for a deep update (a shallow update would be enough).

Option B: Merging input streams after the universal decoding validator

Currently input streams are merged in __init__ before running the validators. We could try an approach where the universal validator that does decoding runs on each of the input streams first, and then they get merged after.

This doesn't really fit into the existing flow very neatly and would involve making some part of the validation flow more complex.

Option C: Keeping decoding in _build_environ and use a different approach

Such as the approaches that @pjbull brainstormed.

chbndrhnns · 2020-06-02T13:19:03Z

I faced this issue today, as well. I wanted to parse something like this

REDIS_SENTINELS=192.168.0.1 192.168.0.2

using such a settings class:

class S(BaseSettings):
   sentinels: List[str] = Field(..., env='REDIS_SENTINELS')

    @validator('sentinels', pre=True)
    def validate(cls, val):
        return val.split(' ')

pikeas · 2020-08-04T22:24:42Z

Ditto, I'd like to validate a string env var and transform it into a valid submodel. For example, MY_DB=user:pass@server?max_connections=100 -> a settings model containing a DB submodel with valid DSN and other settings. Currently, it seems I can only pass MY_DB as a stringified JSON object.

pjbull · 2020-08-04T22:56:36Z

Given the complexity of these potential use cases, I'm kind of liking something like proposal (1) in the first comment in this thread. I may have some time on Friday to implement if that approach is interesting

zdens · 2020-10-27T14:45:31Z

hi all,
I'm also faced with this issue today.

taking into account this example, after almost a day of digging I realized that validator is not fired in case the class attribute is of List-type. If attribute type is str, for example, the attribute validator works just fine.

I used this code:

import os
from typing import List
from pydantic import BaseSettings, validator

os.environ['test_var'] = 'test_val'


class S1(BaseSettings):
    test_var: str
    
    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return v

class S2(BaseSettings):
    test_var: List[str]

    @validator('test_var', pre=True)
    def val_func(cls, v):
        print('this validator is called: {}'.format(v))
        return [v]

and then instantiating s1 = S1() prints this validator is called: test_val while the code s2 = S2() throws errors and prints nothing:

>>> s2 = S2()
Traceback (most recent call last):
  File "pydantic/env_settings.py", line 118, in pydantic.env_settings.BaseSettings._build_environ
  File "/home/den/anaconda3/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/den/anaconda3/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "pydantic/env_settings.py", line 35, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/env_settings.py", line 48, in pydantic.env_settings.BaseSettings._build_values
  File "pydantic/env_settings.py", line 120, in pydantic.env_settings.BaseSettings._build_environ
pydantic.env_settings.SettingsError: error parsing JSON for "test_var"

is there any errors in my example code?

jayqi · 2020-10-29T14:14:29Z

@zdens per some of the discussion earlier in the thread (kind of mixed in there with proposed changes), the reason you don't see the validator firing is because the code that is failing is the part that parses the environment variables, and that happens before the validators run.

Your variable that is the list needs to be valid JSON, like this:

OPTIONS='["a","b","c"]'

That's why you see a JSONDecodeError.

zdens · 2020-10-29T15:19:26Z

@jayqi ,
thank you for your reply!
now I understood why this thread has been started.

pmsoltani · 2021-03-02T16:51:16Z

As a workaround (until something solid is implemented) I changed the following

CORS_ORIGINS: List[AnyHttpUrl] = Field(..., env="CORS_ORIGINS")

into this:

CORS_ORIGINS: Union[str, List[AnyHttpUrl]] = Field(..., env="CORS_ORIGINS")

Which, admittedly, is not elegant but enables pydantic to fire up the validator function:

@validator("CORS_ORIGINS", pre=True)
    def _assemble_cors_origins(cls, cors_origins):
        if isinstance(cors_origins, str):
            return [item.strip() for item in cors_origins.split(",")]
        return cors_origins

My .env file:

# Use comma (,) to separate URLs
CORS_ORIGINS=http://localhost:3000,http://localhost:8000

samuelcolvin · 2021-05-03T14:22:09Z

I've just discovered how annoying this can be myself while working on a github action for pydantic 🙈.

I'd love someone to fix this, otherwise I will soon.

tiangolo · 2021-05-03T14:30:11Z

I planned on working on #1848 again soon (re-implement/finish it), but after finishing with #2721

pjbull · 2021-05-03T16:56:32Z

🎉 I think that if #1848 goes in then we can check for those types here:
https://github.com/samuelcolvin/pydantic/blob/42395056e18dfaa3ef299373374ab3b12bb196ac/pydantic/env_settings.py#L170-L174

And then we can keep the JSON fallback behavior as well.

jooola · 2022-01-01T18:06:08Z

I suggest that this issue should also consider adding the ability to parse environment variables into dictionaries directly:

import os
from typing import Optional, Dict
from pydantic import BaseSettings, Field, validator

os.environ["COLORS"] = "red:#FF0000,blue:#0000FF,green:#00FF00"


class Example(BaseSettings):
    colors: Optional[Dict[str, str]] = Field(None, env="COLORS")

    # Result would be:
    colors = {"red": "#FF0000", "blue": "#0000FF", "green": "#00FF00"}

filipe-cantarelli · 2022-02-20T01:01:11Z

It seems with the addition of SecretsSettingsSource the parsing code got a little bit duplicate.

I think that now it makes more sense to follow Option 1 described by @pjbull, but also to delegate the whole parsing logic to the Field, i.e: field.parse(value), instead of having the following code in all sources

if field.is_complex():
          try:
              secret_value = settings.__config__.json_loads(secret_value)
          except ValueError as e:
              raise SettingsError(f'error parsing JSON for "{env_name}"') from e

I could try this implementation if desired @samuelcolvin.

… via env_parse

rt87 · 2022-08-18T13:05:02Z

+1, this json stuff for everything that is of non-basic type is somewhat annoying

… via env_parse

…se_env_var in Config object (#4406) * Fix #1458 - Allow for custom parsing of environment variables via env_parse * Add docs for env_parse usage * Add changes file for #3977 * fixup: remove stray print statement * Revert env_parse property on field * Add parse_env_var classmethod in nested Config * Update documentation for parse_env_var * Update changes file. * fixup: linting in example * Rebase and remove quotes around imported example * fix example * my suggestions * remove unnecessary Field(env_parse=_parse_custom_dict) Co-authored-by: Samuel Colvin <s@muelcolvin.com>

pjbull added the feature request label Apr 29, 2020

samuelcolvin added the help wanted Pull Request welcome label Apr 30, 2020

tiangolo mentioned this issue Aug 17, 2020

✨ Add types SpaceSeparated, CommaSeparated, CommaSeparatedStripped #1848

Closed

4 tasks

sirex mentioned this issue Jan 30, 2021

Environment names for complex types #2304

Closed

4 tasks

acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Apr 4, 2022

Fix pydantic#1458 - Allow for custom parsing of environment variables…

c7d0043

… via env_parse

acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Apr 5, 2022

Fix pydantic#1458 - Allow for custom parsing of environment variables…

7efaa2f

… via env_parse

acmiyaguchi mentioned this issue Apr 5, 2022

Fix #1458 - Allow for custom parsing of environment variables via env_parse #3977

Closed

5 tasks

ChrisB1123 mentioned this issue Jun 27, 2022

BACKEND_CORS_ORIGINS json decode error ChristopherGS/ultimate-fastapi-tutorial#35

Open

acmiyaguchi mentioned this issue Aug 19, 2022

Fix #1458 - Allow for custom parsing of environment variables via parse_env_var in Config object #4406

Merged

5 tasks

acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Aug 19, 2022

Fix pydantic#1458 - Allow for custom parsing of environment variables…

5e7ea38

… via env_parse

acmiyaguchi added a commit to acmiyaguchi/pydantic that referenced this issue Aug 19, 2022

Fix pydantic#1458 - Allow for custom parsing of environment variables…

6b99afe

… via env_parse

samuelcolvin closed this as completed in #4406 Aug 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Settings: Custom parsing of environment variables (non-JSON) #1458

Settings: Custom parsing of environment variables (non-JSON) #1458

pjbull commented Apr 29, 2020 •

edited

samuelcolvin commented Apr 30, 2020

jayqi commented May 2, 2020 •

edited

chbndrhnns commented Jun 2, 2020

pikeas commented Aug 4, 2020

pjbull commented Aug 4, 2020

zdens commented Oct 27, 2020

jayqi commented Oct 29, 2020

zdens commented Oct 29, 2020

pmsoltani commented Mar 2, 2021

samuelcolvin commented May 3, 2021

tiangolo commented May 3, 2021

pjbull commented May 3, 2021

jooola commented Jan 1, 2022

filipe-cantarelli commented Feb 20, 2022

rt87 commented Aug 18, 2022

Settings: Custom parsing of environment variables (non-JSON) #1458

Settings: Custom parsing of environment variables (non-JSON) #1458

Comments

pjbull commented Apr 29, 2020 • edited

Settings: Custom parsing of environment variables (non-JSON)

Workaround

1. Store the parsing method in the field info extra:

2. Add a new config option just for Settings for overriding how env vars are parsed

3. Call field validators on the raw string instead of trying to load from json first

4. Custom (de)serialization

5. Something else

samuelcolvin commented Apr 30, 2020

jayqi commented May 2, 2020 • edited

Option A: Drop support for merging nested objects

Option B: Merging input streams after the universal decoding validator

Option C: Keeping decoding in _build_environ and use a different approach

chbndrhnns commented Jun 2, 2020

pikeas commented Aug 4, 2020

pjbull commented Aug 4, 2020

zdens commented Oct 27, 2020

jayqi commented Oct 29, 2020

zdens commented Oct 29, 2020

pmsoltani commented Mar 2, 2021

samuelcolvin commented May 3, 2021

tiangolo commented May 3, 2021

pjbull commented May 3, 2021

jooola commented Jan 1, 2022

filipe-cantarelli commented Feb 20, 2022

rt87 commented Aug 18, 2022

pjbull commented Apr 29, 2020 •

edited

jayqi commented May 2, 2020 •

edited