Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move json parsing for complex field to pydantic #16

Closed
hramezani opened this issue Apr 19, 2023 · 9 comments
Closed

Move json parsing for complex field to pydantic #16

hramezani opened this issue Apr 19, 2023 · 9 comments

Comments

@hramezani
Copy link
Member

hramezani commented Apr 19, 2023

Currently, there is a logic for parsing env variable values by json.loads whenever a field is complex.
We may can move the parsing to Pydantic by using some hook to automatically insert the JSON validator in the core schema if field type is dict, list, set, ...

Here is the suggested code by @samuelcolvin:

    @classmethod
    def __get_pydantic_core_schema__(cls, source: Type[Any], handler: Callable[[Any], core_schema.CoreSchema]) -> core_schema.CoreSchema:
        class WalkCoreSchemaIgnoreJson(WalkCoreSchema):
            def handle_json_schema(self, schema: core_schema.JsonSchema) -> core_schema.CoreSchema:
                return schema

        def insert_json_decoding(s: core_schema.CoreSchema) -> core_schema.CoreSchema:
            if s['type'] in ('list', 'dict'):
                return core_schema.json_schema(s)
            return s

        schema = handler(source)
        return WalkCoreSchemaIgnoreJson(insert_json_decoding, apply_before_recurse=False).walk(schema)
@hramezani
Copy link
Member Author

@dmontagu helped me to have a version of WalkCoreSchema which can parse list , dict, set, ...
But it is not enough for pydantic-settings. here are the problems:

  • There is env_nested_delimiter logic that has to be applied for complex fields. for example:
class SubValue(BaseSettings):
    v4: str
   
class TopValue(BaseSettings):
    v1: str
    sub: SubValue

class Cfg(BaseSettings):
    top: TopValue

env.set('top', '{"v1": "json-1", "sub": {"v5": "xx"}}')
env.set('top__sub__v5', '5')

In pydantic-settings-v1, top.sub=5 .First get_field_value returns '{"v1": "json-1", "sub": {"v5": "xx"}}' for top and then in prepare_field_value it json.loads the previous value and then update it by nested envs by usingdeep_update.
By wrapping the field with json_schema, we only have the string value which is returned by get_field_value, and parsing will happen in the model itself. So, we can't use deep_update to update envs by nested envs

  • How can we handle complex validation_alias . for example:
foo: str = Field(validation_alias=AliasChoices('foo', AliasPath('foo1', 'bar', 1)))

@samuelcolvin
Copy link
Member

Ye I see the problem.

I don't think aliases will work.

Options:

  • Change the logic so you can use JSON or dot separated paths, but not both
  • If the config flag is set, parse the JSON in pydantic-settings, and deep merge, like we used to - make base settings passes a mapping object to the validator which takes care of merging different inputs?
  • Make a new validator in core which takes (json_str, overrides_mapping) as input type

I think option two is best

@hramezani
Copy link
Member Author

Another problem with the new approach. Consider the following code:

    class Settings(BaseSettings):
        top: Dict[str, str]

    env.set('top', '{"banana": "secret_value"}')
    s = Settings(top={'apple': 'value'})
    assert s.top == {'apple': 'value', 'banana': 'secret_value'}

The above code will fail in assertion because InitSettingsSource returns {'top': {'apple': 'value'}}(value of top is a dict) but the EnvSettingsSource returns {'top': '{"banana": "secret_value"}'} (value of top is str) and then these two values can't be deep merge.

I am going to reject the new approach because:

  • The problems that I mentioned above and I am afraid that the hack for fixing them make the code complex
  • We can't get rid of json parsing completely and we still need to handle it in some special cases.
  • The walking logic itself can be complex

What do you think @samuelcolvin

@samuelcolvin
Copy link
Member

Agreed, sorry for the mistaken suggestion.

@samuelcolvin
Copy link
Member

But we do need a config flag for whether to parse JSON, and we do need to inspect the core_schema to see if a field is "complex".

@hramezani
Copy link
Member Author

But we do need a config flag for whether to parse JSON

Why do we need the flag?

and we do need to inspect the core_schema to see if a field is "complex".

we still need to handle the complex field logic. I am still trying to find it without inspecting the core_schema

@samuelcolvin
Copy link
Member

Why do we need the flag?

There's an issue somewhere on pydantic (can't find it right now) where people completely legitimately want to parse things in their own way using validators, but the JSON parsing is getting in the way. Users need a way to disable that.

But maybe it's sufficient to have the process_field method, then they can subclass the source and disable JSON parsing.

@hramezani
Copy link
Member Author

Yes, they can override the prepare_field_value and have the unparsed data available there

@hramezani
Copy link
Member Author

We decided to don't use this approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants