-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace domain model with Protobuf/gogo-generated model #856
Conversation
model/keyvalue.go
Outdated
// Key string `json:"key"` | ||
// VType ValueType `json:"vType"` | ||
// VStr string `json:"vStr,omitempty"` | ||
// VNum int64 `json:"vNum,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the KeyValue struct is now less memory efficient because vNum
or split into vBool/vInt64/vFloat64
cc4e0e4
to
bbb46db
Compare
Makefile
Outdated
@@ -92,7 +92,8 @@ cover: nocover | |||
@echo pre-compiling tests | |||
@time go test -i $(ALL_PKGS) | |||
@./scripts/cover.sh $(shell go list $(TOP_PKGS)) | |||
go tool cover -html=cover.out -o cover.html | |||
egrep -v 'jaeger.pb.*.go' cover.out > cover-nogen.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
egrep
is deprecated on many platforms. Use grep -E
instead.
examples/proto/main.go
Outdated
s := grpc.NewServer( | ||
// grpc.Creds(credentials.NewServerTLSFromCert(&insecure.Cert)), | ||
// grpc.UnaryInterceptor(grpc_validator.UnaryServerInterceptor()), | ||
// grpc.StreamInterceptor(grpc_validator.StreamServerInterceptor()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example is WIP, I may remove it from the final PR, just experimenting with service JSON from grpc endpoint.
examples/proto/main.go
Outdated
context.Background(), | ||
dialAddr, | ||
grpc.WithInsecure(), | ||
// grpc.WithTransportCredentials(credentials.NewClientTLSFromCert(insecure.CertPool, "")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
examples/proto/main.go
Outdated
// err = serveOpenAPI(mux) | ||
// if err != nil { | ||
// log.Fatalln("Failed to serve OpenAPI UI") | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question.
examples/proto/server/server.go
Outdated
pbJaeger.Span{ | ||
TraceID: pbJaeger.TraceID{Low: 123}, | ||
// SpanID: []byte{1, 2, 3, 4, 5, 6, 7, 8}, | ||
// SpanID: 456, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why all of this repetition of SpanID
?
{ | ||
"traceID": "1", | ||
"spanID": "2", | ||
"operationName": "test-general-conversion", | ||
"startTime": "2017-01-26T16:46:31.639875139-05:00", | ||
"duration": 5000, | ||
"duration": "5000ns", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge fan of strings for duration. double
/float64
is good.
{ | ||
"key": "peer.service", | ||
"vType": "BINARY", | ||
"vBinary": "AAAAAAAAMDk=" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of random question: Is all binary base64 encoded or can it be complete junk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always a base64 string according to proto3-lang. I still prefer usage of oneof
which should remove the vType
field from JSON since the (de-)serializer knows that there can only be one field active.
model/keyvalue_test.go
Outdated
err = json.Unmarshal([]byte(`{"key":"x","vType":"BAD","vNum":123}`), &kv3) | ||
assert.EqualError(t, err, "not a valid ValueType string BAD") | ||
} | ||
// func TestValueTypeToFromJSON(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be uncommented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to see this replace thrift in the long run :) Having to vendor thrift is pretty ugly right now since there is no Apache-managed release for C#, only for 0.9.3 which does not support .NET Core.
BINARY = 4; | ||
}; | ||
|
||
message KeyValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me somewhat of Value from struct.proto. Please use the oneof variant. It's easier to maintain and a lot better to work with in most languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oneof is terrible in Go in both usability and performance. Why do you find them easier to use? You need a switch statement either way, and in most cases you need an additional cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C# you have the switch in both cases, either doing your own enum or using the oneof, and you get the enum generated and maintained by the code generator. Performance-wise it's the same as every other non-oneof field (also on the wire), just that the code sets you a second field which enum was selected, which you would have set yourself either way.
What casts do you need in Go? This is how it looks in C#: https://developers.google.com/protocol-buffers/docs/reference/csharp-generated#oneof
You get an additional enum AvatarOneofCase
, you get the property AvatarCase
which is baiscally your vType
, you get ImageUrl
and ImageData
as if they would have been regular fields (no difference at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Go the oneof is modeled as an interface field (example: gogo/protobuf#168), which means it requires memory allocation even for primitive values, with a fairly ugly syntax like
kv := KeyValue{
Key: "key",
Value: Value_Int{Int: 123},
}
and on reading you need to switch on the type (I was wrong, it does not require additional cast).
The memory allocations are the main turn-off of oneof for me, especially for such frequently used thing as KeyValue type. For ProcessID/ProcessMap I could see us using a oneof, but again the syntax is quite ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, in C# it looks like this:
var kv = new KeyValue{
Key = "key",
Int = 123,
}
For reading:
switch(kv.ValueCase) {
case ValueOneofCase.Int:
var int = kv.Int;
...
}
No extra types or anything :) Would be interesting how other languages handle this.
model/proto/jaeger.proto
Outdated
// message KeyValue2 { | ||
// string key = 1; | ||
// oneof value { | ||
// string vStr = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme. proto uses snake_case see documentation. I would also drop the v and just use speaking names. The json-gateway should make them camelCase automatically I assume. Also all proto-implementations to naming automatically. But it's best to stay with the default for proto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the naming scheme defined? I only see examples in snake case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the link in the documentation. Everything from Google uses snake_case (with underscores). vStr is camelCase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
model/proto/jaeger.proto
Outdated
}; | ||
|
||
message SpanRef { | ||
TraceID traceID = 1 [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme: trace_id
model/proto/jaeger.proto
Outdated
TraceID traceID = 1 [ | ||
(gogoproto.nullable) = false | ||
]; | ||
uint64 spanID = 2 [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme: span_id
Maybe to it as message? I don't know if other languages also have the customtype notation.
model/proto/jaeger.proto
Outdated
(gogoproto.nullable) = false, | ||
(gogoproto.customtype) = "SpanID" // alias to uint64 | ||
]; | ||
SpanRefType refType = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme: ref_type
]; | ||
} | ||
|
||
message Span { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it's all mixy with the naming :) trace_id, span_id, start_time and process_id. operation_name is the only outlier here, but the correct one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
model/proto/jaeger.proto
Outdated
|
||
message Trace { | ||
message ProcessMapping { | ||
string processID = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme: process_id
model/proto/jaeger.proto
Outdated
]; | ||
} | ||
repeated Span spans = 1; | ||
repeated ProcessMapping processMap = 2 [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not compliant to naming scheme: process_map
repeated Log logs = 9 [ | ||
(gogoproto.nullable) = false | ||
]; | ||
Process process = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process
and processID
are XOR, so it should be a oneof. Has no impact on JSON but makes using them easier.
If we end up using this in the clients, we need a better way to bundle the new protobuf dependencies. The so-called "well-known types" are not included in the default protobuf package. Installing Go packages is never ideal for C# or any other language. We should provide them ourselves seeing as that is what all these Go packages are doing anyway. |
Is gogoproto a separate compiler that is compatbile to regular proto or is it a incompatible fork? In C# and Java, the well-knowns are included in the base package. The protos are only in the Very interesting that that's not the case for all languages. They sould be at least part of the build chain. |
@Falco20019 appreciate the review. Left a couple questions for you. I don't mind using snake_case in protobuf there's an established convention. I just need to see how that would affect the json, I'd rather avoid major changes there if we have a chance of using it with the UI. |
I would assume that you can use options to tell the JSON name when using grpc-gateway. Haven't used it though, only the C# implementation of the JSON serializer... Just had a look at the go oneof implementation. Looks like it generates a separate struct for each field, which is, I assume, what you meant with worse performance. From the usability and readability perspective, it seems not too much worse I think. You would have to do the switch either way (using oneof or an enum) and since it generates the getter functions, you can access it just like you would have before. Could it be, that gogoproto is just doing this differently? |
model/proto/jaeger.proto
Outdated
option (gogoproto.compare) = true; | ||
|
||
string key = 1; | ||
ValueType vType = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need valueType
? Could we simply check which field is set when deserializing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably depends on the language. Have we considered using the Google well known protobuf type Any
? I assume that is an optimized format.
Any serializes as byte array with type string. So could be a lot bigger for small data like int |
I wonder if the builtin types would need a url. Maybe it is optimized. |
Sorry I guess I should have said Value. Our spans can just contain an instance of Struct for tags and that would work as well. List of the well-known types: https://developers.google.com/protocol-buffers/docs/reference/google.protobuf. |
Yep, I mentioned in one review comment that it reminds me a lot of that proto. But that’s a oneof too, so same „problem“ with performance and usability in Go. Not sure how costly those structs are since Google uses Go internally. They build the language and the format, so I would guess it’s not that memory heavy. But I don’t know too much about Go to judge that |
@Falco20019 Go tries to avoid unsafe language constructs, so there are no unions. Hence the use of multiple struct fields. However, I surprised gogoprotobof didn't generate pointers to the field types. Maybe the premise is that allocation is more expensive than unused space? IDK. |
Also, Google's support for Go using Protobuf is questionable at best. The standard Protobuf package leaves a lot to be desired. That's the whole reason gogoprotobuf, a fork of the original Go Protobuf package, has become more popular than the original. If you look at the Protobuf code, it is clear that the real performance optimizations occur in C++ clients (best example is arena allocation). |
82bb5f9
to
17f0397
Compare
Never mind, I see the actual code is more complex to mimic a union. Sort of resembles the pointer solution. |
Too many comments about I changed the naming to snake_case, one negative side effect is that traceID/spanID in JSON are now camelCase traceId/spanId (more UI impact). But at least it's canonical proto JSON format. The bigger open question for me is using |
IMHO it is worth reviewing the options available to generate efficient code for I don't think using bytes makes sense for trace ID because the type is fixed. Bytes makes sense for binary strings. EDIT: Reread and now understand the issue with integers slightly better. Again worth looking into options. |
@isaachier not following you. What do you propose about oneof? What do you mean by "bytes ... are fixed"? I was going to change Span like this:
and // TraceID is a random 128bit identifier for a trace
type TraceID [16]byte
// SpanID is a random 64bit identifier for a span.
type SpanID [8]byte This avoids extra memory allocations. Unfortunately, it also requires me to re-implement marshaling methods that are otherwise available for |
Codecov Report
@@ Coverage Diff @@
## master #856 +/- ##
=====================================
Coverage 100% 100%
=====================================
Files 121 121
Lines 5997 5959 -38
=====================================
- Hits 5997 5959 -38
Continue to review full report at Codecov.
|
Ready for review |
Only checked the |
Somewhat separate issue, but if we use this format, would it be possible to have the backend accept both Protobuf and JSON? That way a new client can choose to use Protobuf as a dependency or just to use pure JSON. |
Also, is there any reason not to move |
Makefile
Outdated
--go_out=$$GOPATH/src/github.com/jaegertracing/jaeger/model/prototest/ \ | ||
model/proto/jaeger_test.proto | ||
|
||
proto-install: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.PHONY
?
go install \ | ||
./vendor/github.com/gogo/protobuf/protoc-gen-gogo \ | ||
./vendor/github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway \ | ||
./vendor/github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get this on my Debian desktop when trying this command:
$ make proto-install
go install \
./vendor/github.com/gogo/protobuf/protoc-gen-gogo \
./vendor/github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway \
./vendor/github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger
vendor/github.com/grpc-ecosystem/grpc-gateway/protoc-gen-grpc-gateway/main.go:18:2: cannot find package "github.com/golang/glog" in any of:
/home/ihier/proj/go/src/github.com/jaegertracing/jaeger/vendor/github.com/golang/glog (vendor tree)
/usr/local/go/src/github.com/golang/glog (from $GOROOT)
/home/ihier/proj/go/src/github.com/golang/glog (from $GOPATH)
make: *** [Makefile:352: proto-install] Error 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After finally getting it to work and generating Python code, I get this clearly invalid import:
from github.com.gogo.protobuf.gogoproto import gogo_pb2 as github_dot_com_dot_gogo_dot_protobuf_dot_gogop
roto_dot_gogo__pb2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with C++:
#include "github.com/grpc-ecosystem/grpc-gateway/protoc-gen-swagger/options/annotations.pb.h"
#include "github.com/gogo/protobuf/gogoproto/gogo.pb.h"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may have to strip them post-generation. E.g. in python these two extra imports are not actually used.
That is the intent.
In the future, too early for it now, introduces unnecessary dev friction. I would like to get a gRPC channel between agent & collector working first before moving proto files to idl. |
As long as this works for Go, LGTM. Other languages can be solved later or use JSON. |
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Yuri Shkuro <ys@uber.com>
4094b14
to
29a1709
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR was too small
For #773
There are a lot of changes to the JSON files that contain the representation of previous domain model. Because gogo/protpbuf/jsonpb renders the model slightly differently, they had to be changed. However, I kept the model/json fixtures intact everywhere, so that this PR does not have any impact on the UI. In other words we can continue developing the gRPC & REST API using the new model, and once that's available the UI can be upgraded to the new model, then the
model/json
can be moved inside Elasticsearch storage plugin (I figured it's safer not to change that format and keep the converters, but scope them to ES only).TODO:
jaeger.pb.go
file is placed undermodel
, but it has a lot less coverage. Makefile can grep its lines out of the coverage summary.