Fix a bug that converts int64 to string when converting Protobuf to JSON #5010

lichenran1234 · 2021-11-04T22:39:59Z

Signed-off-by: Chenran Li chenran.li@databricks.com

What changes are proposed in this pull request?

According to issue #4037, the returned JSON of the endpoints has creation_timestamp and last_updated_timestamp as strings, not numbers. It's different from what was documented in the official doc.

The reason is we are calling Google's MessageToJson API to convert protobuf to json, which implicitly converts int64/fixed64/unit64 fields to strings. And they claimed it's a feature not a bug (see the discussion).

According to the bug reporter, this bug doesn't exist in Azure ML mlflow server (which is essentially our Databricks mlflow server). That's because we are using ScalaPB's ToJson() API for all the Databricks endpoints, and it doesn't convert int64 to string.

There is no way to let MessageToJson API not convert int64 to strings. Nor are there any other good Python proto-to-json libraries. So to fix this bug, we have to choose from:

(what I'm doing in this PR) manually converting the int64/uint64/fixed64 fields back to numbers after calling MessageToJson
(too risky so I chose not to do) writing our customized MessageToJson API

How is this patch tested?

unit tests

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Signed-off-by: Chenran Li <chenran.li@databricks.com>

lichenran1234 · 2021-11-04T23:11:57Z

tests/utils/test_proto_json_utils.py

@@ -28,6 +29,75 @@ def test_message_to_json():
        "lifecycle_stage": "active",


The bug wasn't caught by this test case, because there is no int fields here.

Signed-off-by: Chenran Li <chenran.li@databricks.com>

jinzhang21

Thanks for making the change! A couple minor comments.

jinzhang21 · 2021-11-05T00:40:34Z

tests/utils/test_proto_json_utils.py

+            },
+        ],
+    }
+


Add another line to convert the json back to proto and dict and assert it works?

Done. Thanks!

jinzhang21 · 2021-11-05T00:43:16Z

mlflow/utils/proto_json_utils.py

-    return MessageToJson(message, preserving_proto_field_name=True)
+
+    # Google's MessageToJson API converts int64/fixed64/unit64 proto fields to JSON strings.
+    json_dict_with_int64_converted_to_str = json.loads(


json_dict_with_int64_converted_to_str -> json_dict_with_int64_as_str to be consistent with json_dict_with_int64_as_numbers below

Done. Thanks!

jinzhang21 · 2021-11-05T00:43:58Z

mlflow/utils/proto_json_utils.py



 def message_to_json(message):
    """Converts a message to JSON, using snake_case for field names."""
-    return MessageToJson(message, preserving_proto_field_name=True)
+
+    # Google's MessageToJson API converts int64/fixed64/unit64 proto fields to JSON strings.


Add context to why, e.g. citing the comment protocolbuffers/protobuf#2954 (comment)

Good point! Done

jinzhang21 · 2021-11-05T00:47:18Z

mlflow/utils/proto_json_utils.py

+    return json_dict
+
+
+def _merge_json_dicts(from_dict, to_dict):


Would a package work better here? Not sure about its quality though. https://pypi.org/project/jsonmerge/

Unfortunately that package has some weird bugs: it dropped lots of fields when merging.

Np. Just to clarify options. Custom implementation is fine.

jinzhang21 · 2021-11-05T00:48:32Z

mlflow/utils/proto_json_utils.py

+        if field.label == FieldDescriptor.LABEL_REPEATED:
+            json_value = []
+            for v in value:
+                json_value.append(ftype(v))
+        else:
+            json_value = ftype(value)


json_value = [ftype(v) for v in value] if field.label == FieldDescriptor.LABEL_REPEATED else ftype(value)

Done, thanks!

jinzhang21 · 2021-11-05T00:52:13Z

mlflow/utils/proto_json_utils.py

+            FieldDescriptor.TYPE_INT64,
+            FieldDescriptor.TYPE_UINT64,
+            FieldDescriptor.TYPE_FIXED64,


What about these types?

TYPE_FIXED32 = 7
TYPE_INT32 = 5
TYPE_SFIXED32 = 15
TYPE_SFIXED64 = 16
TYPE_SINT32 = 17
TYPE_SINT64 = 18
TYPE_UINT32 = 13
TYPE_UINT64 = 4

And CPP types:
CPPTYPE_INT32 = 1
CPPTYPE_INT64 = 2
CPPTYPE_UINT32 = 3
CPPTYPE_UINT64 = 4

https://googleapis.dev/python/protobuf/latest/google/protobuf/descriptor.html

For int32 types, according to the doc, they won't be converted to JSON strings so we don't need to add them here.

For CPP types, they are from a difference enum FieldDescriptor::CppType rather than FieldDescriptor::Type. They are used for field.cpp_type(). But here we only care about field.type().

I added two more int64 types (TYPE_SFIXED64 and TYPE_SINT64) to cover all the int64 types.

Signed-off-by: Chenran Li <chenran.li@databricks.com>

jinzhang21

LGTM! Thanks for the change, Chenran! There are some tests are failing. Please fix those.

jinzhang21 · 2021-11-05T03:15:24Z

mlflow/utils/proto_json_utils.py

+    converted from proto messages
+    """
+
+    for key in from_dict:


for key in from_dict: -> for key, value in from_dict.items():

Done, thanks!

jinzhang21 · 2021-11-05T03:15:55Z

mlflow/utils/proto_json_utils.py

+        value = to_dict[key]
+        if isinstance(value, dict):
+            _merge_json_dicts(from_dict[key], to_dict[key])
+        elif isinstance(value, list):


Could value be a tuple?

Good question! According to this page, Python dict constructed from a JSON string cannot have tuples:

jinzhang21 · 2021-11-05T03:19:34Z

mlflow/utils/proto_json_utils.py

+            for i in range(len(value)):
+                if isinstance(value[i], dict):
+                    _merge_json_dicts(from_dict[key][i], to_dict[key][i])
+                else:
+                    to_dict[key][i] = from_dict[key][i]


Enumerate seems to be simpler here:

for i, v in enumerate(value):
if isinstance(v, dict):
_merge_json_dicts(v, to_dict[key][i])
else:
to_dict[key][i] = v

Good point! Done

dbczumar · 2021-11-05T04:22:34Z

@harupy FYI

harupy · 2021-11-05T06:30:42Z

mlflow/utils/proto_json_utils.py

+    return json_dict
+
+
+def _merge_json_dicts(from_dict, to_dict):


Hi @lichenran1234, can we add a test for _merge_json_dicts to make sure it works properly?

Thanks for the comment, Harupy! Usually we don't write unit tests for private functions. Especially for this _merge_json_dicts function: the code inside it should be embedded inside message_to_json function, but I'm extracting it out for readability. So I guess we should follow the Test Behavior, Not Implementation principle.

Can you think of more test cases for message_to_json if you are worried that _merge_json_dicts may not work properly?

I agree with @lichenran1234 that don't test private function. But perhaps add more types of field to the unit test to verify that the public API works for all kinds of fields? e.g. we might want to verify that the following aspects are correctly translated:

default values (I worry about this one)

extensions (and this one)

proto maps

oneof

enums

@lichenran1234 Makes sense to not test _merge_json_dicts, thanks for the knowledge!

Thanks @harupy and @jinzhang21 ! I added a new test proto message so I can test this function extensively. I also added support for proto maps.

Signed-off-by: Chenran Li <chenran.li@databricks.com>

harupy · 2021-11-06T04:39:09Z

mlflow/protos/protos_for_test/test_message.proto

@@ -0,0 +1,55 @@
+syntax = "proto2";


Is it possible to move this file under tests directory?

Good point! Done!

jinzhang21

Thanks a lot for addressing this issue, @lichenran1234 ! Could you please move the test protos to mlflow/tests/protos as @harupy suggested? The rest LGTM! Feel free to merge after the change.

Signed-off-by: Chenran Li <chenran.li@databricks.com>

harupy

LGTM!

Fix an issue with the timestamps in the endpoint response

65aef60

Signed-off-by: Chenran Li <chenran.li@databricks.com>

github-actions bot added area/model-registry Model registry, model registry APIs, and the fluent client calls for model registry rn/none List under Small Changes in Changelogs. labels Nov 4, 2021

lichenran1234 changed the title ~~Fix an issue with the timestamps in the endpoint response~~ Fix a bug that converts int64 to string when converting Protobuf to JSON Nov 4, 2021

lichenran1234 commented Nov 4, 2021

View reviewed changes

minor

3d77588

Signed-off-by: Chenran Li <chenran.li@databricks.com>

lichenran1234 requested review from dbczumar and jinzhang21 November 4, 2021 23:19

lint

fc49861

Signed-off-by: Chenran Li <chenran.li@databricks.com>

jinzhang21 reviewed Nov 5, 2021

View reviewed changes

lichenran1234 added 2 commits November 4, 2021 17:58

lint

51c4fe7

Signed-off-by: Chenran Li <chenran.li@databricks.com>

address comments

17a39a6

Signed-off-by: Chenran Li <chenran.li@databricks.com>

lichenran1234 requested a review from jinzhang21 November 5, 2021 02:50

jinzhang21 reviewed Nov 5, 2021

View reviewed changes

dbczumar requested a review from harupy November 5, 2021 04:22

harupy reviewed Nov 5, 2021

View reviewed changes

harupy mentioned this pull request Nov 5, 2021

[Proxied artifact operations] Implement REST API endpoints and artifact repository #4946

Merged

27 tasks

address comments

39c628f

Signed-off-by: Chenran Li <chenran.li@databricks.com>

lichenran1234 requested review from harupy and jinzhang21 November 5, 2021 18:05

lichenran1234 added 5 commits November 5, 2021 11:08

fix unit tests

b724829

Signed-off-by: Chenran Li <chenran.li@databricks.com>

add a test proto, also add support for proto maps

269ddd4

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

833ae5f

Signed-off-by: Chenran Li <chenran.li@databricks.com>

update comments

d051227

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

a6ec42e

Signed-off-by: Chenran Li <chenran.li@databricks.com>

harupy reviewed Nov 6, 2021

View reviewed changes

jinzhang21 approved these changes Nov 10, 2021

View reviewed changes

lichenran1234 added 2 commits November 10, 2021 11:13

Merge remote-tracking branch 'origin' into timestamp

45bd2e0

Signed-off-by: Chenran Li <chenran.li@databricks.com>

resolve merge conflicts

720652e

Signed-off-by: Chenran Li <chenran.li@databricks.com>

lichenran1234 added 6 commits November 10, 2021 11:28

move proto files to tests/

47695e4

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

c9a3470

Signed-off-by: Chenran Li <chenran.li@databricks.com>

fix tests

90c85c3

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

4707fdb

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

a8d2f64

Signed-off-by: Chenran Li <chenran.li@databricks.com>

minor

cf6ca42

Signed-off-by: Chenran Li <chenran.li@databricks.com>

harupy approved these changes Nov 10, 2021

View reviewed changes

lichenran1234 merged commit a20bb49 into mlflow:master Nov 11, 2021

lichenran1234 deleted the timestamp branch November 11, 2021 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a bug that converts int64 to string when converting Protobuf to JSON #5010

Fix a bug that converts int64 to string when converting Protobuf to JSON #5010

lichenran1234 commented Nov 4, 2021 •

edited

Loading

lichenran1234 Nov 4, 2021

jinzhang21 left a comment

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 left a comment

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

lichenran1234 Nov 5, 2021

dbczumar commented Nov 5, 2021

harupy Nov 5, 2021

lichenran1234 Nov 5, 2021

jinzhang21 Nov 5, 2021

harupy Nov 6, 2021

lichenran1234 Nov 6, 2021

harupy Nov 6, 2021 •

edited

Loading

lichenran1234 Nov 10, 2021

jinzhang21 left a comment

harupy left a comment

		@@ -28,6 +29,75 @@ def test_message_to_json():
		"lifecycle_stage": "active",

Fix a bug that converts int64 to string when converting Protobuf to JSON #5010

Fix a bug that converts int64 to string when converting Protobuf to JSON #5010

Conversation

lichenran1234 commented Nov 4, 2021 • edited Loading

What changes are proposed in this pull request?

How is this patch tested?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Choose a reason for hiding this comment

jinzhang21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinzhang21 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar commented Nov 5, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harupy Nov 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jinzhang21 left a comment

Choose a reason for hiding this comment

harupy left a comment

Choose a reason for hiding this comment

lichenran1234 commented Nov 4, 2021 •

edited

Loading

harupy Nov 6, 2021 •

edited

Loading