Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks in marshal/unmarshal for presence of required fields #48

Merged
merged 18 commits into from May 15, 2015

Conversation

Projects
None yet
4 participants
@jmtuley
Copy link
Contributor

commented Apr 29, 2015

  • Whitespace changes are the result of running make all
  • Fixes #26
  • Based on work previously started by @anton-povarov
  • Includes tests in test/required

This will help Cloud Foundry's Dropsonde library to use the generated, non-reflective Marshal/Unmarshal functions safely.

Dwayne Schultz added some commits Apr 29, 2015

Dwayne Schultz John Tuley
Add checks in marshal/unmarshal for presence of required fields
* Whitespace changes are the result of running `make all`
* Fixes #26
* Based on work previously started by @anton-povarov
* Includes tests in test/required

Signed-off-by: John Tuley <jtuley@pivotal.io>
Dwayne Schultz John Tuley
Improve compatibility
* Remove protobuf.NewRequiredNotSetError() and instantiate inline instead
* Use protoPkg.Use() instead of hardcoded import

Signed-off-by: John Tuley <jtuley@pivotal.io>
Dwayne Schultz John Tuley
Use import helper
Signed-off-by: John Tuley <jtuley@pivotal.io>

jmtuley pushed a commit to cloudfoundry/dropsonde that referenced this pull request Apr 29, 2015

John Tuley
Revert "Update dropsonde-protocol, use faster Marshal/Unmarshal"
Forking Go projects requires too much manual intervention; will restore
when gogo/protobuf#48 is merged.

This reverts commit 1339fa2.
@@ -64,6 +66,11 @@ func (m *Foo) Unmarshal(data []byte) error {
}
fieldNum := int32(wire >> 3)
wireType := int(wire & 0x7)

if fn := int(fieldNum - 1); fn <= 0 {

This comment has been minimized.

Copy link
@anton-povarov

anton-povarov Apr 30, 2015

Contributor

generated code seems incorrect here, as fn < 0 would lead to panic on next line.
this should never happen normally (protobuf wouldn't allow field id == 0), but it might be possible if a malicious message is constructed by hand ?

This comment has been minimized.

Copy link
@jmtuley

jmtuley Apr 30, 2015

Author Contributor

@anton-povarov could you help us fix that, please? We copied the Unmarshal changes from your suggestion, as found in #26. I don't think I completely understand what's going on here. It looks like that line is being written by this line.

Thank you!

This comment has been minimized.

Copy link
@anton-povarov

anton-povarov May 2, 2015

Contributor

Just as a quick fix, i would generate something like (fn <= 0 && fn >=0), this is somewhat ugly, but should be enough to make it safe.

so the line becomes

p.P(if fn := int(fieldNum-, strconv.Itoa(rfMin), ); (fn >= 0 && fn <=, strconv.Itoa(rfMax-rfMin), ) {)

To explain what's going on: the code makes a bitmask of all fields, and sets bits to 1 for the fields that are present. Also tries to save on the size of a bitmask, by finding required fields with lowest id and highest id. rfMin/rfMax are those ids. Since array indexes start with 0, the code then adjusts incoming field ids into array offsets + bit offsets.

In hindsight this "optimization" seems unnecessary for 99.9% of uses (i.e. overengineering), so it might be simpler to just remove it, and the code would become much simpler.
Another suggestion: move code that sets "this field is present" bits, into specific "case" statements, and generate the code doing it only for required fields and do all the math in advance, this should speed up the parsing code.

This comment has been minimized.

Copy link
@anton-povarov

anton-povarov May 2, 2015

Contributor

Or we could do even better!

  1. count all required fields + get their names
  2. generate a bitmask each function with specific bits for each required field
  3. generate code specific to the field to set the corresponding bit in that bitmask
  4. generate code that check that all bits are set after parsing is done

This way - the bitmask would grow with the number of required fields, not max field id in the message.

Sorry that i'm just making suggestions here, and showing no code, got carried away :)

This comment has been minimized.

Copy link
@jmtuley

jmtuley May 2, 2015

Author Contributor

I think I get it. I'm going to try to state the full logic, to see if I'm right:

  1. Generate a bitmask for keeping track of which required fields are present.
  2. Unmarshal. At each field, fill out the mask if that field is required.
  3. After unmarshaling, check the mask for any missing fields. If found, return an error.

Is that correct? If it is, may I ask what is the benefit of fully unmarshalling, instead of bailing the moment an error is found? The latter is the approach we took in our changes to MarshalTo, but maybe that was naïve.

This comment has been minimized.

Copy link
@anton-povarov

anton-povarov May 2, 2015

Contributor

Yes, you understand correctly.

As for unmarshalling: i see no other way honestly. At the time we start parsing - we have no idea what's in the message. As fields come in - we get more info, but still can only answer the question "is this specific field present", but if it's required - that's what we want, and if it's not - we don't care.
And since fields are not always sent in order on the wire - we need to make sense of the whole message before checking.
An alternative approach might be to make 1st pass over the message - to get field types+offsets (handling sub-messages properly) then check and then 2nd pass to copy the data.
But it's much more complex and i would assume - slower for the common case when message is fully correct.

Anyway.
I've updated my implementation with the suggestion given in previous message (simpler code, should be more efficient when parsing as well, since we do more stuff at codegen time). Feel free to use it if you like.
anton-povarov@c65b714

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 4, 2015

Sorry I have been a bit under the weather. It seems like this is on the right track.

@anton-povarov that is super over optimised, but since one of gogoprotobuf's goals is being fast, I like, I like it a lot :)

@jmtuley could you please just call proto.Marshal and proto.Unmarshal in your tests? These will call the generated code for you and malloc the appropriately sized byte slice.
Thank you for getting this rolling again :)

anton-povarov and others added some commits Feb 9, 2015

Update artifacts from `make all`
Based partially on previous commit, `make all` generated new code.
Checking that in.
Use `proto.Marshal`/`proto.Unmarshal`
Uses generic/reflective methods from `proto` package instead of
generated methods on struct itself.
@jmtuley

This comment has been minimized.

Copy link
Contributor Author

commented May 5, 2015

@anton-povarov I cherry-picked your commit into my fork, so please take a quick look and verify that I did that correctly. 😄 Thank you for your help in getting that corrected and optimized!

@awalterschulze I've made the change you requested. I hope you're feeling better!

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 5, 2015

LGTM :)

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 5, 2015

The big picture LGTM.
Now I am just going to be a little pedantic about the tests and then we can merge :)

@@ -59,6 +59,10 @@ func (e *RequiredNotSetError) Error() string {
return fmt.Sprintf("proto: required field %q not set", e.field)
}

func NewRequiredNotSetError(field string) *RequiredNotSetError {

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

Please move this to encode_gogo.go
I try to gogoprotobuf's extra code as seperate as possible from the original goprotobuf code.

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 5, 2015

The filemode should not need to change protoc-gen-gogo/main.go 100644 → 100755

@@ -827,6 +851,28 @@ func (p *unmarshal) Generate(file *generator.FileDescriptor) {
p.P(`}`)
p.Out()
p.P(`}`)

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

Please remove all these whitespace newlines below.

@@ -697,6 +711,7 @@ func (p *unmarshal) Generate(file *generator.FileDescriptor) {
if len(message.Field) > 0 {
p.P(`wireType := int(wire & 0x7)`)
}
p.P()

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

remove this line please

if !ok {
panic("field is required, but no bit registered")
}

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

remove this line please

@@ -827,6 +851,28 @@ func (p *unmarshal) Generate(file *generator.FileDescriptor) {
p.P(`}`)
p.Out()
p.P(`}`)

p.P()

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

remove this line please

@@ -305,7 +308,8 @@ func (m *B) Unmarshal(data []byte) error {
if postIndex > l {
return io.ErrUnexpectedEOF
}
m.G = append(m.G, github_com_gogo_protobuf_test_custom.Uint128{})

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

This is weird, did I not regenerate example before.
This seems like an old change?

This comment has been minimized.

Copy link
@jmtuley

jmtuley May 5, 2015

Author Contributor

I'm not sure if it's an old change or not, but this is the code generated by make all on my machine (OS X 10.10.3, protoc 2.6.1 installed via homebrew). I can revert it.

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

I regenerated the code on master now.
So if you pull in master again, this diff should go away.
My bad.

required string theRequiredString = 1;
optional string theOptionalString = 2;
repeated string theRepeatedStrings = 3;
}

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

Please copy test.NidOptNative and test.NinOptNative and change optional to required.

Create a test that imports test.NidOptNative and calls test.NewPopulatedNidOptNative and Marshal the message.
Then take those bytes and Unmarshal them into required.NidOptNative.
This should never return an error.

Create another test that imports test.NinOptNative and calls test.NewPopulatedNinOptNative.
Now this might be a challenge.
I don't know how familiar you are with reflect, but I will write this test myself if you say that you don't have enough time.
Using reflect set a random field to Nil.
Then Marshal test.NinOptNative and Unmarshal the bytes into required.NinOptNative.
This should always return an error.

Finally Create a test using required.NinOptNative and required.NewPopulatedNinOptNative.
Again set a random field to Nil, easiest with reflect.
Then Marshal, this should always return an error.

This comment has been minimized.

Copy link
@jmtuley

jmtuley May 5, 2015

Author Contributor

I'm unable to do exactly as you ask.

When I add a message

message NidOptNative {
    required double Field1 = 1 [(gogoproto.nullable) = false];
}

to requiredexample.proto, I get the following compile error:

 make
(protoc --proto_path=../../../../../:../../protobuf/:. --gogo_out=. requiredexample.proto)
2015/05/05 08:18:56 protoc-gen-gogo: error:bad Go source code was generated: 1070:57: expected operand, found ']' (and 2 more errors)
--gogo_out: protoc-gen-gogo: Plugin failed with status code 1.
make: *** [regenerate] Error 1

Should I just remove the [(gogoproto.nullable) = false] tags (which would make required.NidOptNative identical to required.NinOptNative), or do you want to figure out why required fields clash with setting nullable to false?

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

I just tried that.
This seems to only be an error on this branch.
So I suspect it has to do with the newly generated code.
I suspect we have found a bug :)

This comment has been minimized.

Copy link
@jmtuley

jmtuley May 5, 2015

Author Contributor

Since the generated Go source is discarded, do you have any recommendation for how to debug this bug?

Thanks!

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

I think I might have enabled this on the proto3 branch.

    rem := g.Buffer
    g.Buffer = new(bytes.Buffer)
    g.generateHeader()
    g.generateImports()
    g.Write(rem.Bytes())

    // Reformat generated code.
    fset := token.NewFileSet()
    ast, err := parser.ParseFile(fset, "", g, parser.ParseComments)
    if err != nil {
        g.Fail("bad Go source code was generated:", err.Error())
        return
    }

Simply print out g.Buffer.Bytes() if the error occurs.

dataOut := RequiredExample{
TheRequiredString: proto.String("present"),
}
encodedMessage, _ := dataOut.Marshal()

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

please check the error

"testing"
)

func TestMarshalToErrorsWhenRequiredFieldIsNotPresent(t *testing.T) {

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

less newlines would be nice :)

This comment has been minimized.

Copy link
@jmtuley

jmtuley May 5, 2015

Author Contributor

Sorry, I'm used to a coding style where we break up the code more. I apologize for not doing a better job conforming to the style in the library.

This comment has been minimized.

Copy link
@awalterschulze

awalterschulze May 5, 2015

Member

Its not really a big problem at all :)

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 5, 2015

Thank you this really looks like it is going to work :)

jmtuley and others added some commits May 5, 2015

Add test for unmarshalling populated optional fields as required
Produce correct output for marshalTo with required fields
Change bitmask to boolean map for required field missing detection

Signed-off-by: Georg Apitz <gapitz@pivotal.io>
@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 8, 2015

How is it going?

@geapi

This comment has been minimized.

Copy link
Contributor

commented May 8, 2015

We are about to finish the last two tests you requested.

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 8, 2015

cool I am very excited :)

Add tests for required fields
* Marshal with missing optional field, unmarshal to type where missing field is required
* Marshal with missing required field

Signed-off-by: Georg Apitz <gapitz@pivotal.io>
@geapi

This comment has been minimized.

Copy link
Contributor

commented May 8, 2015

@awalterschulze

Thanks for your help and direction. With the last commit, we've added the requested tests. Is there anything else you'd like from us, or is this ready to merge?

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 9, 2015

The tests look great :)

I am just not sure about the new map, instead of the previous probably more efficient bit implementation.
I don't think it needs to be that efficient, but a map is very inefficient.
Why did you change it?

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 9, 2015

Same question here, adding more allocations and hash lookups to a function that should be fast is counter intuitive imo.

@jmtuley

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2015

We changed it due to a bug in the other implementation that we couldn't
figure out.

As we wrote the tests, we found that after field 6, all the masks were 0,
which obviously want going to work. Based on Walter's comment that a bit
mask was probably over optimized, we chose to solve the issue with
straightforward code, i.e. the map implementation, that we could understand
and didn't get in the way if getting the tests green.

There's also not that much extra allocation, since we allocate the whole
hash at once. Using a map allowed us to range at the final check, which
avoids the bound checking that happens in array lookups.

I don't think we are likely to correctly implement the mask logic, but
won't be offended if someone does manage it and adds the commit to our pull
request. I'm sorry that we were unable to correct the initial
implementation.

On Sat, May 9, 2015, 01:30 Anton Povarov notifications@github.com wrote:

Same question here, adding more allocations and hash lookups to a function
that should be fast is counter intuitive imo.


Reply to this email directly or view it on GitHub
#48 (comment).

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 9, 2015

Ok no problem. I'm sure I (if Anton doesn't have time) can optimise this.

I was wondering if you could add one more test please.
I want to test nested messages.

So just add a new message which has a repeated NinOptNative.
And have one of those messages not the last or the first, have a nil field which is required.

Marshal and/or Unmarshal should give an error, which ever you have time for.

If you add the final test, I (if Anton doesn't have time) wil optimise the map.

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 9, 2015

Please do point at the test that is failing, i'll fix.

@jmtuley

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2015

I'm afraid I'm not in a position to do much right now; I just had wrist
surgery and so can't type effectively. (This is being painfully tapped out
on a phone.)

I'll do more on Thursday when the bandages come off.

On Sat, May 9, 2015, 12:56 Anton Povarov notifications@github.com wrote:

Please do point at the test that is failing, i'll fix.


Reply to this email directly or view it on GitHub
#48 (comment).

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 12, 2015

Righto, this one turned out to be pretty nasty, but easier than figuring out github for a newbie like me :)
The fix: https://gist.github.com/anton-povarov/c485ded98b51cb6605a0

The issue was with operator precedence, << has higher precedence than %.
so for fields 6 and above it would first shift left (making the value a multiple of 64 and always larger) and then take remainder, making it 0.

This diff also changes number formatting for bits set/check from decimal to hex, i find it easier to read and the code is less cluttered with parens.

NB @awalterschulze : all_test.go:1283 fails go vet check on master :)

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 12, 2015

As to performance comparison, figured i'd just benchmark it the regular way.
Can't say i've spent too much time on it, just kept on copy-pasting code from test/required/requiredexample.pb.go till it stopped complaining.

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> go test -bench . -benchtime 10s ./1_test.go
testing: warning: no tests to run
PASS
BenchmarkMap     2000000              6644 ns/op
BenchmarkBitmask        20000000               989 ns/op
ok      command-line-arguments  41.237s

full code: https://gist.github.com/anton-povarov/6a3733d77da9ac837ba9

@jmtuley

This comment has been minimized.

Copy link
Contributor Author

commented May 12, 2015

Well, that's a little faster. :-) Good work!

On Tue, May 12, 2015, 12:10 Anton Povarov notifications@github.com wrote:

As to performance comparison, figured i'd just benchmark it the regular
way.
Can't say i've spent too much time on it, just keps on copy-pasting code
from test/required/requiredexample.pb.go till it stopped complaining.

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> go test -bench . -benchtime 10s ./1_test.go
testing: warning: no tests to run
PASS
BenchmarkMap 2000000 6644 ns/op
BenchmarkBitmask 20000000 989 ns/op
ok command-line-arguments 41.237s

full code: https://gist.github.com/anton-povarov/6a3733d77da9ac837ba9


Reply to this email directly or view it on GitHub
#48 (comment).

geapi added some commits May 14, 2015

Apply @anton-povarov's patch for bitmasks for missing required fields
Signed-off-by: John Tuley <jtuley@pivotal.io>
Add test for nested NinOptNative
Signed-off-by: John Tuley <jtuley@pivotal.io>
@geapi

This comment has been minimized.

Copy link
Contributor

commented May 14, 2015

@awalterschulze: We applied Anton's patch and added a test for the nested NinOptNative, please let us know if there is anything else we can do.

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 15, 2015

Ok this looks perfect! :)

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 15, 2015

@anton-povarov my go vet works, which go version are you running?

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 15, 2015

I am running go version go1.4 darwin/amd64

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 15, 2015

1.4.2

antoxa@antoxa-suse:~/_Dev/go/src/github.com/gogo/protobuf> go vet ./...
proto/all_test.go:1283: result of fmt.Sprintf call not used
exit status 1
antoxa@antoxa-suse:~/_Dev/go/src/github.com/gogo/protobuf> go version
go version go1.4.2 linux/amd64
@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 15, 2015

That should also be an issue in golang/protobuf.
You could report it there and I'll merge it when they fix it or even if they refuse to fix it.

awalterschulze added a commit that referenced this pull request May 15, 2015

Merge pull request #48 from jmtuley/master
Add checks in marshal/unmarshal for presence of required fields

@awalterschulze awalterschulze merged commit b9e369e into gogo:master May 15, 2015

@awalterschulze

This comment has been minimized.

Copy link
Member

commented May 15, 2015

Awesome Thank you for everyone's work.
I really appreciate it.
This was great team work 👍

@anton-povarov

This comment has been minimized.

Copy link
Contributor

commented May 15, 2015

Glad to see this merged, thanks guys :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.