New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proto2 protobuf cannot handle enums with value higher than 31 during serialization in Python #10950
Comments
Thank you for the bug report. This sounds like a potentially serious bug. Would you mind providing the reproducible example? |
Yes a repro would be very helpful here. I was not able to reproduce it: // test.proto
syntax = "proto2";
enum E {
ONE = 1;
THIRTY_ONE = 31;
THIRTY_TWO = 32;
ONE_HUNDRED = 100;
}
message M {
optional E e = 1;
} # test.py
from test_pb2 import M, E
import unittest
class TestEnumValueSerialize(unittest.TestCase):
def do_test_val(self, val):
m1 = M(e = val)
serialized = m1.SerializeToString()
m2 = M()
m2.ParseFromString(serialized)
self.assertEqual(val, m2.e)
def test_enum_values(self):
self.do_test_val(E.ONE)
self.do_test_val(E.THIRTY_ONE)
self.do_test_val(E.THIRTY_TWO)
self.do_test_val(E.ONE_HUNDRED)
if __name__ == '__main__':
unittest.main() Output:
|
Hello, here is the repro with output. Note that I have also seen the issue with holes in the numeric ranges, but the total number of values was still larger than 31. // test.proto
syntax = "proto2";
package Test;
message Person {
enum PhoneType {
blaa = 0;
HOME = 1;
WORK = 2;
a = 3;
b = 4;
c = 5;
d = 6;
e = 7;
f = 8;
g = 9;
h = 10;
i = 11;
j = 12;
k = 13;
l = 14;
m = 15;
n = 16;
o = 17;
p = 18;
q = 19;
r = 20;
s = 21;
t = 22;
u = 23;
v = 24;
w = 25;
x = 26;
y = 27;
z = 28;
aa = 29;
ab = 30;
ac = 31;
ad = 32;
ae = 33;
}
optional PhoneType TaskType = 1;
optional string Name = 2;
} # main.py
import test_pb2
person = test_pb2.Person()
person.TaskType = 32
person.Name = "Name1"
serialized = person.SerializeToString()
personDe = test_pb2.Person()
print("a", person)
print("---")
personDe.ParseFromString(serialized)
print("b", personDe)
print("---")
person2 = test_pb2.Person()
person2.TaskType = 31
person2.Name = "Name2"
serialized2 = person2.SerializeToString()
personDe2 = test_pb2.Person()
print("a", person2)
print("---")
personDe2.ParseFromString(serialized2)
print("b", personDe2) Ouput:
|
This must be a Windows-specific bug. When I try your repro on Linux, I get:
I suspect this is related to |
I also cannot reproduce the bug on (64-bit) Linux. |
I can also confirm that I cannot reproduce it inside WSL. It does indeed seem to be Windows specific (both machines I tried this on were either Windows 11 or Windows 10). |
Can you print out (I don't have quick access to a Windows machine to verify). |
len(serialized): 9 |
Great, thanks. It appears to be serializing correctly. When we parse it though, the decoder thinks the value (32) is not in the enum for some reason. In proto2, when you parse an unrecognized enum value, it goes into the unknown field set, which is what appears to be happening here. The only question now is why the decoder thinks that 32 is an unknown value for this enum. |
This is the code performing the check. The bug is likely here or nearby, but I can't see it: https://github.com/protocolbuffers/upb/blob/f45eeec625a1b5b4514683aba4b38d1864f0e211/upb/decode.c#L410-L440 In particular, this check looks correct to me. We cast |
Another helpful thing I can point out is that this is an regression introduced in 4.21.x. Protobuf 3.20.3 deserializes the message correctly. |
That makes perfect sense, as 4.21.x introduced a completely rewritten native extension. More info here: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates |
I would be mildly surprised if the bug turned out to be in decode.c since that code has not changed much at all. I'm wondering whether this may be related to the mini descriptors. |
It's true that decode.c has not changed much, but we also have much less usage on Windows than other platforms. Keep in mind that we're releasing from the 21.x branch, which has not gotten any of the recent changes around MiniDescriptors: https://github.com/protocolbuffers/upb/tree/21.x/upb |
I have a root cause and a fix for the bug. The problem was in the The fix is to cast the argument to --- a/upb/port_def.inc
+++ b/upb/port_def.inc
@@ -96,8 +96,8 @@
/* Hints to the compiler about likely/unlikely branches. */
#if defined (__GNUC__) || defined(__clang__)
-#define UPB_LIKELY(x) __builtin_expect((x),1)
-#define UPB_UNLIKELY(x) __builtin_expect((x),0)
+#define UPB_LIKELY(x) __builtin_expect((bool)(x),1)
+#define UPB_UNLIKELY(x) __builtin_expect((bool)(x),0)
#else
#define UPB_LIKELY(x) (x)
#define UPB_UNLIKELY(x) (x) |
protocolbuffers/protobuf#10950 PiperOrigin-RevId: 488818173
…buf commit protocolbuffers/protobuf#10950 PiperOrigin-RevId: 488818173
…buf commit protocolbuffers/protobuf#10950 PiperOrigin-RevId: 488818173
…buf commit protocolbuffers/protobuf#10950 PiperOrigin-RevId: 488818173
…buf commit protocolbuffers/protobuf#10950 PiperOrigin-RevId: 489095346
A fix for this has just been pushed so I am closing. |
…buf commit protocolbuffers/protobuf#10950 PiperOrigin-RevId: 489095346
What version of protobuf and what language are you using?
Version: protoc 3.21.9
Language: Python - protobuf 4.21.9
OS: Windows 11
Interpreter: Python 3.11.0
What did you do?
What did you expect to see
The deserialized protobuf has the enum field set to the value larger than 31
What did you see instead?
The deserialized protobuf has the field clear (the serialized form does not contain the value at all).
Anything else we should know about your project / environment
I can provide minimal reproduction example if desired, but it was pretty simple to reproduce. The issue was also reproduceable on protobuf lib versions as low as 4.21.3 and with grpcio-tools as low as 1.49.1. Lastly, switching the syntax to proto3 does fix the issue, however it is currently not viable for us.
The text was updated successfully, but these errors were encountered: