Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split multiple types into anyOf subschema #20

Merged
merged 4 commits into from
May 9, 2023

Conversation

6293
Copy link
Contributor

@6293 6293 commented May 9, 2023

Closes #16

@6293 6293 requested a review from a team as a code owner May 9, 2023 07:51
@@ -175,6 +176,20 @@ pub enum JsonSchemaType {
Null,
}

impl From<JsonSchemaType> for InstanceType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have a conversion function for the opposite direction on crate::diff_walker::schemars_to_own. The reasoning at the time was that we don't want to make the conversion public API, but I actually have no strong opinion on it.

please either:

  • change this to be a simple function in crate::diff_walker
  • change schemars_to_own to be a impl From<InstanceType> for JsonSchemaType

the choice is yours, but they should be next to each other to ensure they are consistent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the latter idea


/// Split a schema into multiple schemas, one for each type in the multiple type.
/// Returns the new schema and whether the schema was changed.
fn split_types(schema_object: &mut SchemaObject) -> (SchemaObject, bool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can just do *schema_object = new_object, and change function signature to (&mut SchemaObject) -> bool, then you can avoid a clone() in case you don't need to change the schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the insight

@untitaker
Copy link
Member

This is a completely unrealistic schema (I hope), but I'm just curious, do you think it is possible to handle this case and produce no diff?

        let lhs = json! {{
            "anyOf": [
                {"properties": {"foo": {"type": "integer"}}},
                {"properties": {"foo": {"type": "string"}}}
            ]
        }};
        let rhs = json! {{
            "properties": {
                "foo": {
                    "type": ["integer", "string"]
                }
            }
        }};

@untitaker untitaker merged commit 1c6b1cb into getsentry:main May 9, 2023
2 checks passed
@untitaker
Copy link
Member

Thank you!

@6293
Copy link
Contributor Author

6293 commented May 9, 2023

This is a completely unrealistic schema (I hope), but I'm just curious, do you think it is possible to handle this case and produce no diff?

We will need another internal conversion, though we should not convert them into rhs-ish style like the following:

{"properties": {"foo": {"anyOf": [{"type": "integer"}, {"type": "string"}]}}}

Because converting lhs into this style is not always possible. Consider more complex example:

"anyOf": [
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "string"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}},
]

In this case, foo can be string only if bar is integer. Thus we are not able to write like

{"properties": {"foo": {"anyOf": [???]}, "bar": {"anyOf": [???]}}}

@6293
Copy link
Contributor Author

6293 commented May 9, 2023

Better conversion would be from rhs to lhs. e.g.

"anyOf": [{
    "properties": {
        "foo": {
            "type": ["integer", "string"]
        },
        "bar": {
            "type": ["integer", "string"]
        }
    }
}]

to

"anyOf": [
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "integer"}, "bar": {"type": "string"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "integer"}}},
    {"properties": {"foo": {"type": "string"}, "bar": {"type": "string"}}},
]

@untitaker
Copy link
Member

a fun problem to think about :) i am not sure if it is worth solving. if we go with the conversion from rhs to lhs as per your last comment, we should have some sort of limit in place to prevent combinatorial explosion and OOM. some good performance tests would be in https://github.com/getsentry/sentry-kafka-schemas/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better support for anyOf
2 participants