Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/protojson: serialization of FieldMask does not roundtrip #1141

Closed
dsnet opened this issue Jun 4, 2020 · 3 comments
Closed

encoding/protojson: serialization of FieldMask does not roundtrip #1141

dsnet opened this issue Jun 4, 2020 · 3 comments
Assignees

Comments

@dsnet
Copy link
Member

dsnet commented Jun 4, 2020

Serialization should preserve these properties:

  • Successful unmarshal should result in a successful marshal
  • Successful marshal should result in a successful unmarshal
  • Marshaling and unmarshaling should result in semantically equal messages

These properties seem broken for FieldMask where marshal permits marshaling a FieldMask with an empty path where it becomes impossible to distinguish during unmarshaling:

m1 := &fmpb.FieldMask{Paths: []string{}}
b1, err := protojson.Marshal(m1)
fmt.Printf("%s %v\n", b1, err) // "", nil

m2 := &fmpb.FieldMask{Paths: []string{""}}
b2, err := protojson.Marshal(m2)
fmt.Printf("%s %v\n", b2, err) // "", nil

fmt.Println(bytes.Equal(b1, b2)) // true
fmt.Println(proto.Equal(m1, m2)) // false

m := new(fmpb.FieldMask)
err = protojson.Unmarshal([]byte(`""`), m)
fmt.Printf("%q %v\n", m.Paths, err) // [], nil

Also, when deserializing, we fail to reject inputs that are already in snake case:

m := new(fmpb.FieldMask)
err := protojson.Unmarshal([]byte(`"snake_case"`), m)
fmt.Printf("%q %v\n", m.Paths, err) // ["snake_case"], nil

This should result in an error.

Concretely, we should:

  • During marshal, reject any path that is empty.
  • During unmarshal, reject an input that contains underscores.

\cc @cybrcodr

@cybrcodr cybrcodr self-assigned this Jun 8, 2020
@cybrcodr
Copy link
Contributor

cybrcodr commented Jun 9, 2020

I agree, marshaling an empty string does not make sense. I overlooked that one.

As for unmarshaling path with _, I had intended to make it lenient as it wasn't clear if it will break existing usages. I've observed that C++ implementation is not consistent with regards to the specs - protocolbuffers/protobuf#5913. Returning an error on paths with _ character will, however, make the recommended conformance test Recommended.Proto3.JsonInput.FieldMaskInvalidCharacter pass. So, I think I'm ok with that as well.

Our current implementation already does validation on marshaling to ensure that it can round-trip. Perhaps it should be more strict in doing validation for both marshaling and unmarshaling. I further propose that when marshaling, each path should be a valid protobuf name, and it should also be reversible (not all valid protobuf name are reversible). When unmarshaling, the resulting repeated field paths should only contain valid protobuf name and input should not contain _. Wdyt?

This will only be done for JSON serialization as textproto does not do any validation for well known types. While I am not a fan of doing validation in general during serialization, the JSON description for these well-known types have specified such, just that some are more precise than others.

I'll send a CL out for review.

@sunnyagg
Copy link

sunnyagg commented Sep 17, 2021

@cybrcodr Wanted to understand the reasoning on not allowing _ in the field mask lists values.

This is causing issues in patch requests, where attributes of resource are allowed to be reffered in snake_case but same keyword throws errors if referred in field_mask

for example patch like this

{
   "devices":[
      {
         "name":"aname",
         "display_name":"newdisplayname"
      }
   ],
   "update_mask":"name,display_name"
}

results in error

{
    "code": 3,
    "message": "proto: (line 8:18): google.protobuf.FieldMask.paths contains invalid path: \"display_name\"",
    "details": []
}

This seems a bit contradictory.

@cybrcodr
Copy link
Contributor

The serialized JSON path values in FieldMask should be a converted value and not the proto field name.

While the current developer guide for JSON encoding of Field Masks simply shows conversion of proto field name to lower-camel naming convention, the actual details of the conversion rules are in the conformance tests like Recommended.Proto3.JsonInput.FieldMaskInvalidCharacter implies that _ is not allowed. There are other tests in there, but basically the rules are as follows.

The conversion from field name to its camelCase version is:

  • All letters after “_” are turned to uppercase.
  • All “_”s are removed.

The conversion from camelCase to field name is:

  • All uppercase letters are converted to lowercase with a preceding “_” added.

The conversion should fail if the name would end up differently after a round-trip. More specifically, the conversion from field name to camelCase should fail if:

  • There are uppercase letters in the name.
  • Any letter after a “_” is not a lowercase letter.

The conversion from camelCase to field name should fail if there are “_”s in the name.

If you think this is wrong, I suggest taking this up with the Protobuf team as the Go implementation is simply following the rules here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants