Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

protojson: fail to unmarshal JSON with reserved fields #1606

Closed
timofurrer opened this issue Apr 9, 2024 · 19 comments
Closed

protojson: fail to unmarshal JSON with reserved fields #1606

timofurrer opened this issue Apr 9, 2024 · 19 comments
Assignees

Comments

@timofurrer
Copy link

timofurrer commented Apr 9, 2024

What version of protobuf and what language are you using?
Version: Go, v1.5.4

What did you do?

We have a message with a reserved field and we want to unmarshal some JSON that has this field.

message Foo {
  string bar = 1;
  reserved "should_be_ignored";
}
package main

import (
	"fmt"

	"google.golang.org/protobuf/encoding/protojson"
)

func main() {
	input := []byte("{\"bar\": \"teststring\", \"should_be_ignored\": 42}")
	foo := &Foo{};
	err := protojson.Unmarshal(input, foo)
	if err != nil {
		panic(err)
	}

	fmt.Printf("Foo: %v\n", foo)
	return
}

Running this results in an unmarsahling error like unknown field "should_be_ignored".

What did you expect to see?

The JSON unmarshals correctly, but ignores the reserved fields

What did you see instead?

The decoder fails to unmarshal complaining about not knowing the reserved field.

Anything else we should know about your project / environment?

@puellanivis
Copy link
Collaborator

At least adding a basic protobuf definition with input that repros would be helpful for understanding.

@timofurrer
Copy link
Author

@puellanivis I've updated the description.

@puellanivis
Copy link
Collaborator

Hm, it works if you set protojson.UnmarshalOptions{ DiscardUnknown: true }

I suppose the question is, should reserved field names be discarded as “known”. From the text of https://protobuf.dev/programming-guides/proto3/#fieldreserved:

To make sure JSON and TextFormat instances of your message can still be parsed, also add the deleted field name to a reserved list.

This suggests that the JSON should continue to parse with the reserved field name.

@timofurrer
Copy link
Author

Yes, I'm aware that I can discard unknown fields, but that would discard "all" unknown fields - which is not what I want.

This suggests that the JSON should continue to parse with the reserved field name.

I agree - so that means that protojson doesn't behave according to this, right?

@puellanivis
Copy link
Collaborator

This suggests that the JSON should continue to parse with the reserved field name.

I agree - so that means that protojson doesn't behave according to this, right?

We’ll need more input from contributors, but I think from my reading that this is a misbehavior. I’ve also advised someone based on this guidance that I suspected reserving a field name would be that it would be recognized as known but ignored, not breaking like this.

@timofurrer
Copy link
Author

We’ll need more input from contributors

@puellanivis anything we (or I) can do to get that "input" ?

@puellanivis
Copy link
Collaborator

You could work on a gerrit change request yourself. I think the in-house Google devs are otherwise busy with editions support.

@lfolger
Copy link
Contributor

lfolger commented Apr 23, 2024

Thanks for reporting this. I think this should be fixed.
The behavior is documented in the spec and the go implementation is not following it.
I double checked and the C++ implementation for example follows the spec.

I'll try to prepare a change for this and test is in Google first to see if anyone relies on the old behavior (I don't expect anyone to but you never know).

@lfolger lfolger self-assigned this Apr 23, 2024
@timofurrer
Copy link
Author

@lfolger that's good news that you'll have a look at this - it's highly appreciated, thank you 🤝

@lfolger
Copy link
Contributor

lfolger commented Apr 23, 2024

While looking into this I found out that this is already the behavior today. Could you clarify the steps to reproduce?

These are the tests that I added and they succeeded without errors: https://go-review.googlesource.com/c/protobuf/+/581095

@lfolger
Copy link
Contributor

lfolger commented Apr 23, 2024

I just realized this is about protojson. Let me try to add a test for that as well.

@lfolger
Copy link
Contributor

lfolger commented Apr 23, 2024

Unfortunately, the C++ implementation has the same issue. Usually the C++ implementation is right and diverging from it means in most cases diverging from the spec. I reached out to the core protobuf team to clarify. I'll report back once I know more.

Sorry for the back and forth.

@lfolger
Copy link
Contributor

lfolger commented Apr 26, 2024

Short update. There is still an ongoing discussion internally about what the behavior should be. I will come back to this once I know more.

@lfolger
Copy link
Contributor

lfolger commented May 2, 2024

AFter discussion it turned out that the current behavior is a "bug" in the Text proto parser and intended behavior in the JSON format. I call this "bug" because the implementation is more permissive that it has to be which most users don't mind.

A documentation update to the official documentation will follow shortly.

If you like the behavior to be changed, you would need to file a feature request against the protobuf.

Just to manage expexctations, I'm not very hopeful that they will accept this but you should give it a try nonetheless. Maybe there will be enough users and use cases that need this and they will accept it. If that happens, we are happy to add this to the Go implementation.

I'm closing this issue for now but feel free to reopen it if you have more questions.

@lfolger lfolger closed this as completed May 2, 2024
@ash2k
Copy link

ash2k commented May 2, 2024

Hi all! I've been following this issue. I don't think this has been mentioned, I'd like to point out an inconsistency between binary proto unmarshaling and JSON->proto message unmarshling. Reserved fields are ignored in the binary format and you don't get an error. I think this makes sense. I wonder why is it not the same for any other format that can be unmarshaled into a proto message?

If it is how it is and the behavior is not going to change, then:

  • what is the suggested approach to read the previously written JSON messages when a field has been removed?
  • "Text proto parser has a bug" - perhaps this issue can be used to fix it.
  • https://protobuf.dev/programming-guides/proto3/#fieldreserved needs to be updated to state that the user will get an error. This means we are changing the spec because of how implementations behave (not the other way around) 🤔

@ash2k
Copy link

ash2k commented May 3, 2024

Yet another way to think about it: currently reserved is a no-op for JSON unmarshaling. To me it is a signal that something is not right - either the spec defines something useless (if we forget about the binary format, with which JSON is not consistent) or the implementation doesn't follow the spec.

p.s. Would be great to hear some of the counter-arguments from the internal discussion. To be honest, it's a bit annoying that this is being discussed behind closed doors. There are lots and lots of users of the format and libraries and there is an open issue tracker and this project is open source in the first place. Why all that if not for transparency and collaboration? So, I'm sure a bit more transparency would be greatly appreciated by the community. Thank you.

@puellanivis
Copy link
Collaborator

The semantics of it has pretty much always been documentation of fields to not re-use because they’ve been removed. I believe it is an error during the compilation if a field attempts to reuse either a field number or field name, so it is not strictly a fully no-op.

@timofurrer
Copy link
Author

timofurrer commented May 3, 2024

The spec literally mentions that reserved can be used so that JSON can still be parsed (from last sentence in second paragraph here):

If you update a message type by entirely deleting a field, or commenting it out, future developers can reuse the field number when making their own updates to the type. This can cause severe issues, as described in Consequences of Reusing Field Numbers.

To make sure this doesn’t happen, add your deleted field number to the reserved list. To make sure JSON and TextFormat instances of your message can still be parsed, also add the deleted field name to a reserved list.

I agree with @ash2k here that it's kinda frustrating to fix a bug by changing the spec.

Assuming there is NO way that this gets fixed (which would be really sad), would you be open to add an opt-in config to the json marshaller to enable this behavior? .. and make json parsing align to the behavior to binary parsing?

@lfolger
Copy link
Contributor

lfolger commented May 3, 2024

"Text proto parser has a bug" - perhaps this issue can be used to fix it

First of all, when I said the Go implementation has a bug in the text parser, I meant that it accepts messages with unkown but reserved fields. To be clear only C++ and Go do this. All other implementation we know don't do this. This means JSON and text parsers behave the same (except for C++/Go).

Would be great to hear some of the counter-arguments from the internal discussion. To be honest, it's a bit annoying that this is being discussed behind closed doors.

About the discussion not being public, I was not even part of that discussion. I think if you want a public discussion, you need to file a feature request against the protobuf main repository which handles changes to the spec. It was partially me fault in that I reached out to them internally rather than just asking you to file a bug against the proto repo directly. Sorry about that.

I'd like to point out an inconsistency between binary proto unmarshaling and JSON->proto message unmarshling. Reserved fields are ignored in the binary format and you don't get an error

The accepting of unknown field in the binary format works very different. It accepts all unknown fields. It is completely independent of the reserved fields. You can get the same behavior by using the appropriate unmarshaloptions for protojson and prototext.

Again, there is no point in discussion this here. We, the Go protobuf maintainers, are not responsible for the spec. If you would like to see the spec changed, file a feature request against the proto buf repository. All we can do is to follow the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants