Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roundtrip of Substrait Plan doesn't provide the same output. #203

Open
vibhatha opened this issue Nov 16, 2023 · 7 comments
Open

Roundtrip of Substrait Plan doesn't provide the same output. #203

vibhatha opened this issue Nov 16, 2023 · 7 comments

Comments

@vibhatha
Copy link
Contributor

I wrote a test case to evaluate the following and once we do the roundrip we don't get the same plan.
I observed the diff and seems like the order of extension_uris comes in the opposite order

INPUT

    "extensionUris": [
        {
            "extensionUriAnchor": 2,
            "uri": "/functions_arithmetic.yaml"
        },
        {
            "extensionUriAnchor": 1,
            "uri": "/functions_string.yaml"
        }
    ],

OUTPUT

    "extensionUris": [
        {
            "extensionUriAnchor": 1,
            "uri": "/functions_string.yaml"
        },
        {
            "extensionUriAnchor": 2,
            "uri": "/functions_arithmetic.yaml"
        }
Plan.Builder builder = io.substrait.proto.Plan.newBuilder();
String jsonPlan = readJsonFile(("my_substrait_plan_in.json"));
JsonFormat.parser().merge(jsonPlan, builder);
var inputPlan =  builder.build();
io.substrait.plan.Plan plan = new ProtoPlanConverter(EXTENSION_COLLECTION).from(inputPlan);
PlanProtoConverter planProtoConverter = new PlanProtoConverter();
io.substrait.proto.Plan roundtripPlan = planProtoConverter.toProto(plan);
var roundtripJson = JsonFormat.printer().print(roundtripPlan);
ObjectMapper objectMapper = new ObjectMapper();
JsonNode originalTree = objectMapper.readTree(jsonPlan);
JsonNode roundTripTree = objectMapper.readTree(roundtripJson);
assert !originalTree.equals(roundTripTree);

I don't think this would do any harm execution wise, but one would expect it to be similar in property order.

@EpsilonPrime
Copy link
Member

The original source likely ordered the functions in alphabetical order and the Java code is likely ordering based on the order that the functions were discovered.

@vibhatha
Copy link
Contributor Author

Also I did check, the io.substrait.plan.Plan of corresponding plans would yields to false when we use the equals method.

@vibhatha
Copy link
Contributor Author

@EpsilonPrime seems like it. Would you recommend that this needs fixing?

@EpsilonPrime
Copy link
Member

One way to address this would be to implement a "normalize plan" method. It could sort the functions in place, populate a map with the old and new values, and then walk the protobuf replacing function references with the updated value. It could also do other normalization. I have something like that in substrait-cpp:

https://github.com/substrait-io/substrait-cpp/blob/main/src/substrait/textplan/converter/ReferenceNormalizer.cpp#L279

@vbarua
Copy link
Member

vbarua commented Nov 16, 2023

There's nothing in the spec that constrains that extensions need to be ordered in a specific way. I'm hesitant to officially encode any behaviour or expectations of ordering. That's not to say we can't provide a utility to normalize plans, it just wouldn't be based on anything in the spec.

Also I did check, the io.substrait.plan.Plan of corresponding plans would yields to false when we use the equals method.

If we consider these plans to be equal, this would be a good argument for a smarter equals method.

@vibhatha
Copy link
Contributor Author

@vbarua I agree with the standard, this was a mere observation. I think for this case what we should provide is a smarter equals method. Do you think this would be a good feature to be added to a future release?

@danepitkin
Copy link
Contributor

I think a smarter equality comparison would be a good approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants