Move json parsing for complex field to pydantic #16

hramezani · 2023-04-19T08:17:23Z

Currently, there is a logic for parsing env variable values by json.loads whenever a field is complex.
We may can move the parsing to Pydantic by using some hook to automatically insert the JSON validator in the core schema if field type is dict, list, set, ...

Here is the suggested code by @samuelcolvin:

    @classmethod
    def __get_pydantic_core_schema__(cls, source: Type[Any], handler: Callable[[Any], core_schema.CoreSchema]) -> core_schema.CoreSchema:
        class WalkCoreSchemaIgnoreJson(WalkCoreSchema):
            def handle_json_schema(self, schema: core_schema.JsonSchema) -> core_schema.CoreSchema:
                return schema

        def insert_json_decoding(s: core_schema.CoreSchema) -> core_schema.CoreSchema:
            if s['type'] in ('list', 'dict'):
                return core_schema.json_schema(s)
            return s

        schema = handler(source)
        return WalkCoreSchemaIgnoreJson(insert_json_decoding, apply_before_recurse=False).walk(schema)

The text was updated successfully, but these errors were encountered:

hramezani · 2023-04-19T08:21:52Z

@dmontagu helped me to have a version of WalkCoreSchema which can parse list , dict, set, ...
But it is not enough for pydantic-settings. here are the problems:

There is env_nested_delimiter logic that has to be applied for complex fields. for example:

class SubValue(BaseSettings):
    v4: str
   
class TopValue(BaseSettings):
    v1: str
    sub: SubValue

class Cfg(BaseSettings):
    top: TopValue

env.set('top', '{"v1": "json-1", "sub": {"v5": "xx"}}')
env.set('top__sub__v5', '5')

In pydantic-settings-v1, top.sub=5 .First get_field_value returns '{"v1": "json-1", "sub": {"v5": "xx"}}' for top and then in prepare_field_value it json.loads the previous value and then update it by nested envs by usingdeep_update.
By wrapping the field with json_schema, we only have the string value which is returned by get_field_value, and parsing will happen in the model itself. So, we can't use deep_update to update envs by nested envs

How can we handle complex validation_alias . for example:

foo: str = Field(validation_alias=AliasChoices('foo', AliasPath('foo1', 'bar', 1)))

samuelcolvin · 2023-04-19T08:37:54Z

Ye I see the problem.

I don't think aliases will work.

Options:

Change the logic so you can use JSON or dot separated paths, but not both
If the config flag is set, parse the JSON in pydantic-settings, and deep merge, like we used to - make base settings passes a mapping object to the validator which takes care of merging different inputs?
Make a new validator in core which takes (json_str, overrides_mapping) as input type

I think option two is best

hramezani · 2023-04-19T09:01:57Z

Another problem with the new approach. Consider the following code:

    class Settings(BaseSettings):
        top: Dict[str, str]

    env.set('top', '{"banana": "secret_value"}')
    s = Settings(top={'apple': 'value'})
    assert s.top == {'apple': 'value', 'banana': 'secret_value'}

The above code will fail in assertion because InitSettingsSource returns {'top': {'apple': 'value'}}(value of top is a dict) but the EnvSettingsSource returns {'top': '{"banana": "secret_value"}'} (value of top is str) and then these two values can't be deep merge.

I am going to reject the new approach because:

The problems that I mentioned above and I am afraid that the hack for fixing them make the code complex
We can't get rid of json parsing completely and we still need to handle it in some special cases.
The walking logic itself can be complex

What do you think @samuelcolvin

samuelcolvin · 2023-04-19T09:19:19Z

Agreed, sorry for the mistaken suggestion.

samuelcolvin · 2023-04-19T09:20:40Z

But we do need a config flag for whether to parse JSON, and we do need to inspect the core_schema to see if a field is "complex".

hramezani · 2023-04-19T09:27:20Z

But we do need a config flag for whether to parse JSON

Why do we need the flag?

and we do need to inspect the core_schema to see if a field is "complex".

we still need to handle the complex field logic. I am still trying to find it without inspecting the core_schema

samuelcolvin · 2023-04-19T09:33:30Z

Why do we need the flag?

There's an issue somewhere on pydantic (can't find it right now) where people completely legitimately want to parse things in their own way using validators, but the JSON parsing is getting in the way. Users need a way to disable that.

But maybe it's sufficient to have the process_field method, then they can subclass the source and disable JSON parsing.

hramezani · 2023-04-19T09:35:15Z

Yes, they can override the prepare_field_value and have the unparsed data available there

hramezani · 2023-04-26T09:29:59Z

We decided to don't use this approach

hramezani closed this as completed Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move json parsing for complex field to pydantic #16

Move json parsing for complex field to pydantic #16

hramezani commented Apr 19, 2023 •

edited

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

hramezani commented Apr 26, 2023

Move json parsing for complex field to pydantic #16

Move json parsing for complex field to pydantic #16

Comments

hramezani commented Apr 19, 2023 • edited

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

samuelcolvin commented Apr 19, 2023

hramezani commented Apr 19, 2023

hramezani commented Apr 26, 2023

hramezani commented Apr 19, 2023 •

edited