Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner): add support for JSON data type #4104

Merged
merged 15 commits into from Aug 24, 2021

Conversation

hengfengli
Copy link
Contributor

@hengfengli hengfengli commented May 14, 2021

Add support for JSON data type.

What:

  • add a new JSON data type, NullJSON, and its array support
  • encode a NullJSON to a json proto value
  • decode a json proto value to a specified type
  • add custom types support for NullJSON and its array support

Issues:

  • Array<JSON>: Unable to encode []Message{msg, msg} to Array<JSON> because we can't decide whether it should be mapped to a JSON or a Array<JSON>. (Resolved by only allowing to encode NullJSON so the Go types, NullJSON and Array<NullJSON>, will be encoded as Cloud Spanner types, JSON and Array<JSON>.)
  • Encode & decode NullJSON: when we encode NullJSON, we can store a struct, e.g., Message, in its Value field. However, when we decode to a NullJSON, the type of Value field would be map[string]interface{} or []interface{}. This inconsistency is confusing.
  • encodeStruct: it seems that the json support conflicts with the current encodeStruct method. (Resolved by only allowing to encode NullJSON. No changes to the default behaviour.)

@hengfengli hengfengli added api: spanner Issues related to the Cloud Spanner API. do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels May 14, 2021
@hengfengli hengfengli requested a review from olavloite May 14, 2021
@hengfengli hengfengli self-assigned this May 14, 2021
@hengfengli hengfengli requested review from skuruppu and a team as code owners May 14, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label May 14, 2021
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
spanner/value.go Outdated Show resolved Hide resolved
@olavloite
Copy link
Contributor

olavloite commented May 14, 2021

Array: Unable to encode []Message{msg, msg} to Array because we can't decide whether it should be mapped to a JSON or a Array.

I ran into the same problem in NodeJS, and there I made the following choice:

  1. Top-level arrays that contain something that should be encoded to JSON, will be encoded as ARRAY<JSON>.
  2. If you want a top-level array to be encoded to JSON instead of ARRAY<JSON>, then you have to specify the value as a string.

@olavloite
Copy link
Contributor

olavloite commented May 14, 2021

encodeStruct: it seems that the json support conflicts with the current encodeStruct method.

Could you elaborate a little on what you mean with this? If I remember correctly, the Go client will try to encode a Go struct type to a Spanner STRUCT. Is that what is now conflicting with this implementation, as Go struct types will now by default be encoded as JSON?

@hengfengli
Copy link
Contributor Author

hengfengli commented May 15, 2021

  1. Top-level arrays that contain something that should be encoded to JSON, will be encoded as ARRAY<JSON>.

So you mean you check whether or not every top-level element in an array can be encode to JSON. If yes, then it will be encoded as ARRAY<JSON>.

  1. If you want a top-level array to be encoded to JSON instead of ARRAY<JSON>, then you have to specify the value as a string.

The definition of encodeValue is:

func encodeValue(v interface{}) (*proto3.Value, *sppb.Type, error) {
}

If you specify the value to a string, then it will be encoded to a string, right? How can you tell encodeValue that this string should be encoded as a JSON?

@hengfengli
Copy link
Contributor Author

hengfengli commented May 15, 2021

encodeStruct: it seems that the json support conflicts with the current encodeStruct method.

Could you elaborate a little on what you mean with this? If I remember correctly, the Go client will try to encode a Go struct type to a Spanner STRUCT. Is that what is now conflicting with this implementation, as Go struct types will now by default be encoded as JSON?

Yes, correct.

// Check if it can be marshalled to a json string.
b, err := json.Marshal(v)
if err == nil {
pb.Kind = stringKind(string(b))
pt = jsonType()
return pb, pt, nil
}
if !isStructOrArrayOfStructValue(v) {
return nil, nil, errEncoderUnsupportedType(v)
}
typ := reflect.TypeOf(v)
// Value is a Go struct value/ptr.
if (typ.Kind() == reflect.Struct) ||
(typ.Kind() == reflect.Ptr && typ.Elem().Kind() == reflect.Struct) {
return encodeStruct(v)
}
// Value is a slice of Go struct values/ptrs.
if typ.Kind() == reflect.Slice {
return encodeStructArray(v)
}

As you can see, when value is a Go struct value, it can be encoded as a JSON. The code to convert Go struct can never be reached any more. It tries to encode a struct to json first. But I can't put encodeStruct first as well because the json code can never reached.

This issue leads to TestBindParamsDynamic fails.

The issue becomes how encodeValue can tell when converting to a Spanner struct or a JSON data type. The input is always a Go struct. Unless we give up to convert a Go struct to JSON directly. Instead, we always ask users to embed the Go struct into NullJSON and then encode it, which can avoid this issue.

@olavloite
Copy link
Contributor

olavloite commented May 17, 2021

  1. Top-level arrays that contain something that should be encoded to JSON, will be encoded as ARRAY<JSON>.

So you mean you check whether or not every top-level element in an array can be encode to JSON. If yes, then it will be encoded as ARRAY<JSON>.

No, not exactly that. What I did in NodeJS was to assume that if the type is array, then the value should be encoded as some type of Cloud Spanner array. The array element type is then determined by the same encode function when the individual array elements are passed in to it. If the user has created an array with different types in it, then this will fail, but it is not checked beforehand.

  1. If you want a top-level array to be encoded to JSON instead of ARRAY<JSON>, then you have to specify the value as a string.

The definition of encodeValue is:

func encodeValue(v interface{}) (*proto3.Value, *sppb.Type, error) {
}

If you specify the value to a string, then it will be encoded to a string, right? How can you tell encodeValue that this string should be encoded as a JSON?

Hmmm... Yeah, that will probably not work for Go in that case (at least not without some extensive refactoring). In Node, the user specifies the Cloud Spanner type themselves, so they pass in both a value and the Cloud Spanner type. So that made it possible for the user to pass in a string value and still set the type to be JSON. The encoded value (that is; the proto value) is encoded as string for both 'normal' strings and JSON, so that does not make any difference.

@olavloite
Copy link
Contributor

olavloite commented May 17, 2021

encodeStruct: it seems that the json support conflicts with the current encodeStruct method.

Could you elaborate a little on what you mean with this? If I remember correctly, the Go client will try to encode a Go struct type to a Spanner STRUCT. Is that what is now conflicting with this implementation, as Go struct types will now by default be encoded as JSON?

Yes, correct.

// Check if it can be marshalled to a json string.
b, err := json.Marshal(v)
if err == nil {
pb.Kind = stringKind(string(b))
pt = jsonType()
return pb, pt, nil
}
if !isStructOrArrayOfStructValue(v) {
return nil, nil, errEncoderUnsupportedType(v)
}
typ := reflect.TypeOf(v)
// Value is a Go struct value/ptr.
if (typ.Kind() == reflect.Struct) ||
(typ.Kind() == reflect.Ptr && typ.Elem().Kind() == reflect.Struct) {
return encodeStruct(v)
}
// Value is a slice of Go struct values/ptrs.
if typ.Kind() == reflect.Slice {
return encodeStructArray(v)
}

As you can see, when value is a Go struct value, it can be encoded as a JSON. The code to convert Go struct can never be reached any more. It tries to encode a struct to json first. But I can't put encodeStruct first as well because the json code can never reached.

This issue leads to TestBindParamsDynamic fails.

The issue becomes how encodeValue can tell when converting to a Spanner struct or a JSON data type. The input is always a Go struct. Unless we give up to convert a Go struct to JSON directly. Instead, we always ask users to embed the Go struct into NullJSON and then encode it, which can avoid this issue.

@zoercai @skuruppu I think you need to chime in here as well. This is going to be a bit of a problem in the Go client, unless we accept doing a breaking change here. The short description of the problem is:

  1. Go has a built-in struct type. So you can easily create a type like for example this:
type User struct {
  UserId string
  UserName string
}
  1. If an instance of a struct value is passed in to the encodeValue function in the current Go client, it will automatically be encoded as a Cloud Spanner STRUCT type. The user (currently) cannot specify the type that they want the value to be encoded to, it is always determined automatically purely based on the type of the input value.
  2. The introduction of the JSON type in Cloud Spanner would change the behavior above from automatically encoding such a value to a STRUCT to encoding it to JSON.

What do we want to do about this? I can think of a couple of possibilities:

  1. Accept the above and mark this as a breaking change. This would make it impossible to encode a Go struct to a Cloud Spanner STRUCT.
  2. Change the default behavior so a struct will by default be encoded as a JSON value, but also add an additional function that allows the user to specify the type that the value should be encoded to. This would also be a breaking change, as existing users that rely on Go struct values being encoded as STRUCT would need to change their code.
  3. Keep the current default behavior so a Go struct will still by default be encoded as a STRUCT, and add an additional function that allows the user to specify the type that the value should be encoded to.

@hengfengli Please feel free to add any other option you might see that I've missed, or to correct me if I misunderstood anything here.

I'm not sure how easy/hard it would be to implement either of options 2 or 3. I think that either of these options could also be used to solve the problem of encoding top-level JSON arrays. As also noted in the NodeJS PR: It is not possible to determine based purely on the value whether a top-level JSON array should be encoded as JSON or ARRAY<JSON>.

@zoercai
Copy link
Contributor

zoercai commented May 18, 2021

I like the sound of option 3, assuming allowing users to specify the type would allow the STRUCT to be converted into a JSON type, and assuming it's not a breaking change like option 2. But I am not familiar with Go so I wouldn't know if it's a clean solution for implementation.

@skuruppu
Copy link
Contributor

skuruppu commented May 20, 2021

@olavloite thanks for raising this. Unfortunately, when it comes to making breaking changes, our hands are tied. We essentially can't make changes that require users to change their code.

So we have to go with option 3.

@hengfengli
Copy link
Contributor Author

hengfengli commented May 21, 2021

@olavloite Thanks for a lot for summarizing the problem and providing possible options. I would vote for option 3 as well.

The only question is that if we add this additional function, it means that the encoding will not happen automatically and the user has to call this function to encode the value or find a way to pass type info to encode_value.

What if we don't add this additional function, instead, we only support theNullJSON type for JSON. This means users always need to create a NullJSON instead of directly passing a struct for a json column. NullJSON can be treated as a wrapper with type info. This can avoid the breaking change as well.

Also, with NullJSON, the problem to map to JSON or ARRAY<JSON> is solved as well. NullJSON maps to JSON and an array of NullJSON maps to ARRAY<JSON>.

@olavloite
Copy link
Contributor

olavloite commented May 21, 2021

@olavloite Thanks for a lot for summarizing the problem and providing possible options. I would vote for option 3 as well.

The only question is that if we add this additional function, it means that the encoding will not happen automatically and the user has to call this function to encode the value or find a way to pass type info to encode_value.

What if we don't add this additional function, instead, we only support theNullJSON type for JSON. This means users always need to create a NullJSON instead of directly passing a struct for a json column. NullJSON can be treated as a wrapper with type info. This can avoid the breaking change as well.

Also, with NullJSON, the problem to map to JSON or ARRAY<JSON> is solved as well. NullJSON maps to JSON and an array of NullJSON maps to ARRAY<JSON>.

That is certainly also a possibility. I think the only drawback is that it would make JSON the only type that does not have a default mapping to a 'normal' Go type. Besides that, I don't really see any problems with it, and at least it would keep the encode/decode code relatively simple.

@hengfengli
Copy link
Contributor Author

hengfengli commented May 21, 2021

@olavloite Thanks for a lot for summarizing the problem and providing possible options. I would vote for option 3 as well.
The only question is that if we add this additional function, it means that the encoding will not happen automatically and the user has to call this function to encode the value or find a way to pass type info to encode_value.
What if we don't add this additional function, instead, we only support theNullJSON type for JSON. This means users always need to create a NullJSON instead of directly passing a struct for a json column. NullJSON can be treated as a wrapper with type info. This can avoid the breaking change as well.
Also, with NullJSON, the problem to map to JSON or ARRAY<JSON> is solved as well. NullJSON maps to JSON and an array of NullJSON maps to ARRAY<JSON>.

That is certainly also a possibility. I think the only drawback is that it would make JSON the only type that does not have a default mapping to a 'normal' Go type. Besides that, I don't really see any problems with it, and at least it would keep the encode/decode code relatively simple.

We may get more types in the near future. So 1-1 mapping between Cloud Spanner and Go types cannot be held for a long time.

I will go to the NullJSON path. Let me know if you think adding an additional function would be better. But I just don't know how users can use this additional function to pass types in.

Copy link
Contributor

@olavloite olavloite left a comment

This looks mostly good to me, with a couple of nits on adding some tests for some corner cases.

I'm fine with the choice of only supporting NullJSON when encoding JSON types. While reviewing this PR, I did however notice one small thing that could cause a little confusion: The current implementation does support decoding JSON values to Go Struct types, but it does not support encoding Go Struct types to JSON. That asymmetry could cause confusion, so we should be sure to note it in the documentation.

One possible other solution for the problem regarding whether to encode a Go Struct to a Spanner STRUCT or JSON could be to use the same strategy as for the loss of precision check for NUMERIC; configure it through a global setting. I'm not really sure I would be a proponent for such a solution, as adding too many global settings will not make the client easy to use. But in this case I'm also not sure whether the feature of encoding Go Struct to a Spanner STRUCT is something that is used very much, as Spanner STRUCT types cannot be used as the type of a column.

spanner/integration_test.go Show resolved Hide resolved
spanner/integration_test.go Show resolved Hide resolved
spanner/value.go Show resolved Hide resolved
spanner/value_test.go Show resolved Hide resolved
spanner/value_test.go Show resolved Hide resolved
spanner/value_test.go Outdated Show resolved Hide resolved
@hengfengli
Copy link
Contributor Author

hengfengli commented May 24, 2021

This looks mostly good to me, with a couple of nits on adding some tests for some corner cases.

I'm fine with the choice of only supporting NullJSON when encoding JSON types. While reviewing this PR, I did however notice one small thing that could cause a little confusion: The current implementation does support decoding JSON values to Go Struct types, but it does not support encoding Go Struct types to JSON. That asymmetry could cause confusion, so we should be sure to note it in the documentation.

That's my worry as well. To avoid confusion, I'll take a look to see if decoding JSON values to Go Struct can be disallowed.

One possible other solution for the problem regarding whether to encode a Go Struct to a Spanner STRUCT or JSON could be to use the same strategy as for the loss of precision check for NUMERIC; configure it through a global setting. I'm not really sure I would be a proponent for such a solution, as adding too many global settings will not make the client easy to use. But in this case I'm also not sure whether the feature of encoding Go Struct to a Spanner STRUCT is something that is used very much, as Spanner STRUCT types cannot be used as the type of a column.

I agree with you that I would prefer not to add more global settings. I believe users still use STRUCT for query parameter binding but not sure how often it does. But I don't expect a breaking change to change the behaviour from Go Struct to Spanner STRUCT. In the future, we may need to add more types and possibly map Go Struct to another type.

@olavloite
Copy link
Contributor

olavloite commented Jun 1, 2021

@hengfengli Is this ready for another review?

@hengfengli
Copy link
Contributor Author

hengfengli commented Jun 1, 2021

@hengfengli Is this ready for another review?

Yes, please take another review. Thanks a lot.

spanner/value.go Show resolved Hide resolved
@hengfengli hengfengli removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Jun 25, 2021
@zoercai
Copy link
Contributor

zoercai commented Aug 12, 2021

Looks like this PR still needs an owners approval, could an owner please review this? @skuruppu @googleapis/yoshi-go-admins

spanner/value.go Outdated Show resolved Hide resolved
@hengfengli hengfengli merged commit ade8ab1 into googleapis:master Aug 24, 2021
4 checks passed
@hengfengli hengfengli deleted the add-json-support branch Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Cloud Spanner API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants