Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unknown enum handling #853

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

osa1
Copy link
Member

@osa1 osa1 commented Jun 22, 2023

This fixes handling of unknown enum values in repeated and map fields and in
top-level fields.

Current behavior:

  • Unknown enums in map and repeated fields throw an exception.
  • Unknown enums in top-level fields reset the field value to the default one.

With this PR we just "ignore" the values:

  • Unknown enums in map and repeated fields are skipped.
  • Unknown enums in top-level fields do not reset the field's value to the
    default.

Fixes #849.

@osa1 osa1 requested a review from sigurdm June 22, 2023 08:12
@sigurdm
Copy link
Collaborator

sigurdm commented Jun 22, 2023

I think we have to add a new value for "unknown" to each enum, then we can also (perhaps?) migrate to using real dart enums. That would give us exhaustiveness checking!

See https://b.corp.google.com/issues/287930910 for more context.

We can still do this PR first - but we should consider it.

@sigurdm
Copy link
Collaborator

sigurdm commented Jun 22, 2023

...Of course it is not trivial to use actual dart enums, because we want to store the integer value behind the "unknown" for re-serialization. I can only see how this could work by having a getter that translates the enum to a dart enum, but that is syntactically breaking.

We could also try to think in sealed classes - not sure how well that would work.

@osa1
Copy link
Member Author

osa1 commented Jun 22, 2023

It looks like resetting the top-level fields when the value is an unknown enum was deliberate as we have tests about it, but it doesn't make sense to me as resetting a field means we're not really ignoring the value (i.e. it has an effect on the message). As far as I can see proto3 spec doesn't mention this case, so I'll check other implementations and see what they do about this.

Java is skipping the field: https://github.com/protocolbuffers/protobuf/blob/f71a9263f9380836c4fc356b67710203e691bc95/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L1491-L1493

@sigurdm
Copy link
Collaborator

sigurdm commented Jun 22, 2023

Now I realize this PR is in terms of proto3 json decoding. Here we (usually) only have the name of the enum value, so we cannot translate it to the index...

Resetting vs skipping:

  • this only makes a difference when merging messages, right?
  • both behaviors seem questionable under certain circumstances, but so does stuffing the enum names into the encoding - it seems to break some fundamental properties that are upheld by the binary protocol. Not much we can do about that.
  • I guess following java is fine. I cannot remember why I implemented clearing the field.

@osa1
Copy link
Member Author

osa1 commented Jun 29, 2023

this only makes a difference when merging messages, right?

If you mean as in mergeFromMessage, then no, this happens when merging a JSON representing a message. E.g.

myMessage.mergeFromProto3Json(..., ignoreUnknownFields: true)

myMessage can be an empty message (in which case skip vs. reset does not matter, though I should check if the current behavior also resets the hazzer) but it may also be a non-empty message with the enum field already set. In that case skipping keeps the current value (i.e. actually ignores the unknown value), resetting resets the value to the default (does not really ignore the unknown value).

both behaviors seem questionable under certain circumstances, but so does stuffing the enum names into the encoding - it seems to break some fundamental properties that are upheld by the binary protocol. Not much we can do about that.

I think ignoring makes sense. Why do you say it's questionable? The flag says "ignore" so it should ignore.

I agree, proto3 JSON format is a bit strange..

@sigurdm
Copy link
Collaborator

sigurdm commented Jun 29, 2023

myMessage can be an empty message (in which case skip vs. reset does not matter, ...

That's what I meant :) .

I think ignoring makes sense. Why do you say it's questionable? The flag says "ignore" so it should ignore.

Well - one thing is what the flag says, another one is what is ergonomic for actual use. If I have one proto represented by

{
  "enumField": "A"
}

('A' is a known enum value).

And I merge it with another proto

{
  "enumField": "B"
}

If these were messages in the binary protocol, the unknown value would be preserved, and we would be able to see that from the merged proto.

I find it somehow surprising that now m.enumField still returns A. It seems that the fact that the merged field carries a value, although unknown should be reflected in the merge. I cannot think of a use-case where that is the semantics I want.

But again, if that is what other implementations do, I guess we should just follow along.

@osa1
Copy link
Member Author

osa1 commented Jun 29, 2023

I find it somehow surprising that now m.enumField still returns A

proto3 JSON (at least as we implement it in this library, AFAICS this is unspecified) does not support unknown fields, so the behavior you describe is not possible (you can't preserve the unknown value). You have these two options instead: (1) fail (the default) (2) ignore the unknown value (opt-in, with the flag). The default is (1). Binary format has (3) which is store the unknown field in unknown field set.

@sigurdm
Copy link
Collaborator

sigurdm commented Jun 29, 2023

Yeah - I argue that we get closer to option 3 by resetting.

But that is all splitting hairs. I don't hope anyone is actually using the proto3 json format for anything serious 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repeated enum does not honor ignoreUnknownFields
2 participants