Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure fields are JSON encoded correctly #1407

Closed
vchudnov-g opened this issue Aug 12, 2022 · 10 comments · Fixed by #1436
Closed

ensure fields are JSON encoded correctly #1407

vchudnov-g opened this issue Aug 12, 2022 · 10 comments · Fixed by #1436
Assignees
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@vchudnov-g
Copy link
Contributor

vchudnov-g commented Aug 12, 2022

The best way of verifying correct behavior is probably to implement integration tests against the Showcase compliance suite, which test various data types.

Context: boolean fields seem to be not serialized correctly. I assigned True to a proto.Field as a kwarg in the message constructor:

request = showcase_v1beta1.EnumRequest(unknown_enum = True)

and the Showcase server returns with an error. The Showcase server logs read:

error reading query params: terminal field "unknownEnum" of field path "unknownEnum" is of type "bool" with value string "True", which could not be parsed: could not parse "True" as a bool

which is mostly due to True being sent capitalized rather than lower-cased, as the JSON spec requires.

This is a blocker for launching REGAPIC.

@vchudnov-g vchudnov-g added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Aug 12, 2022
@atulep atulep self-assigned this Aug 15, 2022
@vam-google
Copy link
Contributor

@vchudnov-g I dont understand this request. Python is a dynamic language, thus it does not check for type and it is expected. Plus the json serialization/deserialization is in protobuf code, and it seems to be a conscious decision made by protobuf team to not do static type check during serialization. This request seems to be agains the whole architecture of protobuf and gapic data model and the dynamic nature of Python language.

I'll close this request for now. Feel free to reopen, but please provide more justification why you expect static type analysis on the client side for Python.

Another thought here: even if we choose to do something about it, it would be super hard, and would require change in protobuf probably, which does not seem feasible.

@vam-google
Copy link
Contributor

vam-google commented Aug 26, 2022

I also checked the serializaiton process on talent-v4 API:

    a = gct_job.Job.ProcessingOptions(disable_street_address_resolution = True)
    json = gct_job.Job.ProcessingOptions.to_json(a)

This produces the following json:

'{
  "disableStreetAddressResolution": true,
  "htmlSanitization": 0
}'

Booleans are serialized properly. this must be the case, otherwise Compute client would be completely broken (there are > 300 boolean fields in compute surface)

@atulep
Copy link
Contributor

atulep commented Aug 27, 2022

I can confirm I get the same error as Victor when testing on showcase GetEnum method.

The bug is somewhere in these lines taken from the generated get_transport transport's function:

            request_kwargs = compliance.EnumRequest.to_dict(request)
            transcoded_request = path_template.transcode(
                http_options, **request_kwargs)

            uri = transcoded_request['uri']
            method = transcoded_request['method']

            # Jsonify the query params
            query_params = json.loads(compliance.EnumRequest.to_json(
                compliance.EnumRequest(transcoded_request['query_params']),
                including_default_value_fields=False,
                use_integers_for_enums=False
            ))

So query_params are wrong. I'll try to play with transcoding logic to reproduce this.

Added integration tests in #1424.

@vchudnov-g
Copy link
Contributor Author

Sorry, just saw that you closed this, @vam-google . Reopening with clarifications: I don't "expect static type analysis on the client side". What I do expect is the JSON encoding to be correct, and it seems like the booleans get capitalized on the query params. You mention good points that we're serializing correctly in the request bodies, so something must be going on in the query params.

@vchudnov-g vchudnov-g reopened this Aug 29, 2022
@atulep
Copy link
Contributor

atulep commented Aug 29, 2022

The issue is json.loads -- the function converts JSON string into a Python dict, so true from JSON becomes True in Python.

(Pdb) compliance.EnumRequest.to_json(compliance.EnumRequest(transcoded_request['query_params']), including_default_value_fields=False,  use_integers_for_enums=False)
'{\n  "unknownEnum": true\n}'
(Pdb) json.loads(compliance.EnumRequest.to_json(compliance.EnumRequest(transcoded_request['query_params'])))
{'unknownEnum': True}

Technically json.loads does exactly what it says. To fix the problem, we can convert True to a string "true".

There are two places to do that:

  1. Client code directly (i.e. template of the method call in the rest transport).
  2. to_json in Protoplus.

I'm leaning towards 1) because currently issue specifically happens in one concrete example, and to_json actually does its job correctly (it's just combination of json.loads and to_json that breaks).

@vchudnov-g let me know your thoughts.

@vchudnov-g
Copy link
Contributor Author

I think json.loads() is doing what it says: loading the JSON into a Python data types. I don't think we want to replace that with a string. It seems to me the problem is here:

            response = getattr(self._session, method)(
                "{host}{uri}".format(host=self._host, uri=uri),
                timeout=timeout,
                headers=headers,
                params=rest_helpers.flatten_query_params(query_params),
                )

flatten_query_params() returns a list of (name, value) tuples, so that should still return a Python bool. So the problem then is when invoking getattr(self._session, method)(…,params=XXX). That method is the one translating Python values to the query string values, and it is there that the value of a Python bool should be lower-cased when writing to the resulting query string. I think that's where we should fix it. Where is this method invocation defined? I haven't looked.

That said, please wait a bit to do non-trivial changes because of the following: this is the query string, not the request body, so technically it is not JSON. Let me check whether Google servers can take a capitalized bool in the query string; if they can, then maybe we can do nothing on the client side. If they can't, then yes, I think we should fix the method() invocation.

WDYT?

@atulep
Copy link
Contributor

atulep commented Aug 30, 2022

Where is this method invocation defined?

It's a part of google_auth -- method of AuthorizedSession: https://googleapis.dev/python/google-auth/1.7.0/_modules/google/auth/transport/requests.html#AuthorizedSession.

So we should open an issue in their repo...

@vchudnov-g
Copy link
Contributor Author

vchudnov-g commented Sep 1, 2022

OK, I looked at the code, and it seems like the request call eventually winds up in yet another library, here, which expects "a dictionary of strings". So I'm assuming that means string→string, and that it uses just the Python string representation of Boolean types. So now I take back what I said earlier: I think the change does belong in flatten_query_params(), and specifically here.

If we want to be strict about backwards compatibility, we can add an optional parameter strict to the top -level flatten_query_params (default False), and pass it to its sub-functions, so that when it gets to _flatten_value() it being set causes the lower-casing of string representations of boolean. That said, one could make an argument that if an API sever accepts capitalized True and False, then it would also already accept lower-cased versions (which are the standard and also what other languages use) so maybe we don't need to complicate things with an extra flag.....

Thoughts? Do you want to make this change, @atulep or do you want me to? I know you've got a bunch of other things on your plate as well, and I don't mind getting experience on the other repos.

@vchudnov-g
Copy link
Contributor Author

FWIW, Google servers can take capitalized booleans, but I still think we should fix this for cross-language consistency, which immediately affects Python.

@vchudnov-g
Copy link
Contributor Author

For reference, the above fix was not quite correct. The correct fix was implemented in #1447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants