Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checks in marshal/unmarshal for presence of required fields #48

Merged
merged 18 commits into from May 15, 2015
Merged

Add checks in marshal/unmarshal for presence of required fields #48

merged 18 commits into from May 15, 2015

Conversation

@jmtuley
Copy link
Contributor

@jmtuley jmtuley commented Apr 29, 2015

  • Whitespace changes are the result of running make all
  • Fixes #26
  • Based on work previously started by @anton-povarov
  • Includes tests in test/required

This will help Cloud Foundry's Dropsonde library to use the generated, non-reflective Marshal/Unmarshal functions safely.

Dwayne Schultz added 4 commits Apr 29, 2015
* Whitespace changes are the result of running `make all`
* Fixes #26
* Based on work previously started by @anton-povarov
* Includes tests in test/required

Signed-off-by: John Tuley <jtuley@pivotal.io>
* Remove protobuf.NewRequiredNotSetError() and instantiate inline instead
* Use protoPkg.Use() instead of hardcoded import

Signed-off-by: John Tuley <jtuley@pivotal.io>
Signed-off-by: John Tuley <jtuley@pivotal.io>
jmtuley pushed a commit to cloudfoundry/dropsonde that referenced this issue Apr 29, 2015
Forking Go projects requires too much manual intervention; will restore
when gogo/protobuf#48 is merged.

This reverts commit 1339fa2.
@@ -64,6 +66,11 @@ func (m *Foo) Unmarshal(data []byte) error {
}
fieldNum := int32(wire >> 3)
wireType := int(wire & 0x7)

if fn := int(fieldNum - 1); fn <= 0 {
Copy link
Contributor

@anton-povarov anton-povarov Apr 30, 2015

generated code seems incorrect here, as fn < 0 would lead to panic on next line.
this should never happen normally (protobuf wouldn't allow field id == 0), but it might be possible if a malicious message is constructed by hand ?

Copy link
Contributor Author

@jmtuley jmtuley Apr 30, 2015

@anton-povarov could you help us fix that, please? We copied the Unmarshal changes from your suggestion, as found in #26. I don't think I completely understand what's going on here. It looks like that line is being written by this line.

Thank you!

Copy link
Contributor

@anton-povarov anton-povarov May 2, 2015

Just as a quick fix, i would generate something like (fn <= 0 && fn >=0), this is somewhat ugly, but should be enough to make it safe.

so the line becomes

p.P(if fn := int(fieldNum-, strconv.Itoa(rfMin), ); (fn >= 0 && fn <=, strconv.Itoa(rfMax-rfMin), ) {)

To explain what's going on: the code makes a bitmask of all fields, and sets bits to 1 for the fields that are present. Also tries to save on the size of a bitmask, by finding required fields with lowest id and highest id. rfMin/rfMax are those ids. Since array indexes start with 0, the code then adjusts incoming field ids into array offsets + bit offsets.

In hindsight this "optimization" seems unnecessary for 99.9% of uses (i.e. overengineering), so it might be simpler to just remove it, and the code would become much simpler.
Another suggestion: move code that sets "this field is present" bits, into specific "case" statements, and generate the code doing it only for required fields and do all the math in advance, this should speed up the parsing code.

Copy link
Contributor

@anton-povarov anton-povarov May 2, 2015

Or we could do even better!

  1. count all required fields + get their names
  2. generate a bitmask each function with specific bits for each required field
  3. generate code specific to the field to set the corresponding bit in that bitmask
  4. generate code that check that all bits are set after parsing is done

This way - the bitmask would grow with the number of required fields, not max field id in the message.

Sorry that i'm just making suggestions here, and showing no code, got carried away :)

Copy link
Contributor Author

@jmtuley jmtuley May 2, 2015

I think I get it. I'm going to try to state the full logic, to see if I'm right:

  1. Generate a bitmask for keeping track of which required fields are present.
  2. Unmarshal. At each field, fill out the mask if that field is required.
  3. After unmarshaling, check the mask for any missing fields. If found, return an error.

Is that correct? If it is, may I ask what is the benefit of fully unmarshalling, instead of bailing the moment an error is found? The latter is the approach we took in our changes to MarshalTo, but maybe that was naïve.

Copy link
Contributor

@anton-povarov anton-povarov May 2, 2015

Yes, you understand correctly.

As for unmarshalling: i see no other way honestly. At the time we start parsing - we have no idea what's in the message. As fields come in - we get more info, but still can only answer the question "is this specific field present", but if it's required - that's what we want, and if it's not - we don't care.
And since fields are not always sent in order on the wire - we need to make sense of the whole message before checking.
An alternative approach might be to make 1st pass over the message - to get field types+offsets (handling sub-messages properly) then check and then 2nd pass to copy the data.
But it's much more complex and i would assume - slower for the common case when message is fully correct.

Anyway.
I've updated my implementation with the suggestion given in previous message (simpler code, should be more efficient when parsing as well, since we do more stuff at codegen time). Feel free to use it if you like.
anton-povarov@c65b714

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 4, 2015

Sorry I have been a bit under the weather. It seems like this is on the right track.

@anton-povarov that is super over optimised, but since one of gogoprotobuf's goals is being fast, I like, I like it a lot :)

@jmtuley could you please just call proto.Marshal and proto.Unmarshal in your tests? These will call the generated code for you and malloc the appropriately sized byte slice.
Thank you for getting this rolling again :)

anton-povarov and others added 4 commits May 5, 2015
Based partially on previous commit, `make all` generated new code.
Checking that in.
Uses generic/reflective methods from `proto` package instead of
generated methods on struct itself.
@jmtuley
Copy link
Contributor Author

@jmtuley jmtuley commented May 5, 2015

@anton-povarov I cherry-picked your commit into my fork, so please take a quick look and verify that I did that correctly. 😄 Thank you for your help in getting that corrected and optimized!

@awalterschulze I've made the change you requested. I hope you're feeling better!

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 5, 2015

LGTM :)

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 5, 2015

The big picture LGTM.
Now I am just going to be a little pedantic about the tests and then we can merge :)

@@ -59,6 +59,10 @@ func (e *RequiredNotSetError) Error() string {
return fmt.Sprintf("proto: required field %q not set", e.field)
}

func NewRequiredNotSetError(field string) *RequiredNotSetError {
Copy link
Member

@awalterschulze awalterschulze May 5, 2015

Please move this to encode_gogo.go
I try to gogoprotobuf's extra code as seperate as possible from the original goprotobuf code.

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 5, 2015

The filemode should not need to change protoc-gen-gogo/main.go 100644 → 100755

@@ -827,6 +851,28 @@ func (p *unmarshal) Generate(file *generator.FileDescriptor) {
p.P(`}`)
p.Out()
p.P(`}`)

Copy link
Member

@awalterschulze awalterschulze May 5, 2015

Please remove all these whitespace newlines below.

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 5, 2015

Thank you this really looks like it is going to work :)

jmtuley and others added 2 commits May 5, 2015
Produce correct output for marshalTo with required fields
Change bitmask to boolean map for required field missing detection

Signed-off-by: Georg Apitz <gapitz@pivotal.io>
@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 8, 2015

How is it going?

@geapi
Copy link
Contributor

@geapi geapi commented May 8, 2015

We are about to finish the last two tests you requested.

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 8, 2015

cool I am very excited :)

* Marshal with missing optional field, unmarshal to type where missing field is required
* Marshal with missing required field

Signed-off-by: Georg Apitz <gapitz@pivotal.io>
@geapi
Copy link
Contributor

@geapi geapi commented May 8, 2015

@awalterschulze

Thanks for your help and direction. With the last commit, we've added the requested tests. Is there anything else you'd like from us, or is this ready to merge?

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 9, 2015

The tests look great :)

I am just not sure about the new map, instead of the previous probably more efficient bit implementation.
I don't think it needs to be that efficient, but a map is very inefficient.
Why did you change it?

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 9, 2015

Same question here, adding more allocations and hash lookups to a function that should be fast is counter intuitive imo.

@jmtuley
Copy link
Contributor Author

@jmtuley jmtuley commented May 9, 2015

We changed it due to a bug in the other implementation that we couldn't
figure out.

As we wrote the tests, we found that after field 6, all the masks were 0,
which obviously want going to work. Based on Walter's comment that a bit
mask was probably over optimized, we chose to solve the issue with
straightforward code, i.e. the map implementation, that we could understand
and didn't get in the way if getting the tests green.

There's also not that much extra allocation, since we allocate the whole
hash at once. Using a map allowed us to range at the final check, which
avoids the bound checking that happens in array lookups.

I don't think we are likely to correctly implement the mask logic, but
won't be offended if someone does manage it and adds the commit to our pull
request. I'm sorry that we were unable to correct the initial
implementation.

On Sat, May 9, 2015, 01:30 Anton Povarov notifications@github.com wrote:

Same question here, adding more allocations and hash lookups to a function
that should be fast is counter intuitive imo.


Reply to this email directly or view it on GitHub
#48 (comment).

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 9, 2015

Ok no problem. I'm sure I (if Anton doesn't have time) can optimise this.

I was wondering if you could add one more test please.
I want to test nested messages.

So just add a new message which has a repeated NinOptNative.
And have one of those messages not the last or the first, have a nil field which is required.

Marshal and/or Unmarshal should give an error, which ever you have time for.

If you add the final test, I (if Anton doesn't have time) wil optimise the map.

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 9, 2015

Please do point at the test that is failing, i'll fix.

@jmtuley
Copy link
Contributor Author

@jmtuley jmtuley commented May 9, 2015

I'm afraid I'm not in a position to do much right now; I just had wrist
surgery and so can't type effectively. (This is being painfully tapped out
on a phone.)

I'll do more on Thursday when the bandages come off.

On Sat, May 9, 2015, 12:56 Anton Povarov notifications@github.com wrote:

Please do point at the test that is failing, i'll fix.


Reply to this email directly or view it on GitHub
#48 (comment).

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 12, 2015

Righto, this one turned out to be pretty nasty, but easier than figuring out github for a newbie like me :)
The fix: https://gist.github.com/anton-povarov/c485ded98b51cb6605a0

The issue was with operator precedence, << has higher precedence than %.
so for fields 6 and above it would first shift left (making the value a multiple of 64 and always larger) and then take remainder, making it 0.

This diff also changes number formatting for bits set/check from decimal to hex, i find it easier to read and the code is less cluttered with parens.

NB @awalterschulze : all_test.go:1283 fails go vet check on master :)

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 12, 2015

As to performance comparison, figured i'd just benchmark it the regular way.
Can't say i've spent too much time on it, just kept on copy-pasting code from test/required/requiredexample.pb.go till it stopped complaining.

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> go test -bench . -benchtime 10s ./1_test.go
testing: warning: no tests to run
PASS
BenchmarkMap     2000000              6644 ns/op
BenchmarkBitmask        20000000               989 ns/op
ok      command-line-arguments  41.237s

full code: https://gist.github.com/anton-povarov/6a3733d77da9ac837ba9

@jmtuley
Copy link
Contributor Author

@jmtuley jmtuley commented May 12, 2015

Well, that's a little faster. :-) Good work!

On Tue, May 12, 2015, 12:10 Anton Povarov notifications@github.com wrote:

As to performance comparison, figured i'd just benchmark it the regular
way.
Can't say i've spent too much time on it, just keps on copy-pasting code
from test/required/requiredexample.pb.go till it stopped complaining.

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> go test -bench . -benchtime 10s ./1_test.go
testing: warning: no tests to run
PASS
BenchmarkMap 2000000 6644 ns/op
BenchmarkBitmask 20000000 989 ns/op
ok command-line-arguments 41.237s

full code: https://gist.github.com/anton-povarov/6a3733d77da9ac837ba9


Reply to this email directly or view it on GitHub
#48 (comment).

geapi added 2 commits May 14, 2015
Signed-off-by: John Tuley <jtuley@pivotal.io>
Signed-off-by: John Tuley <jtuley@pivotal.io>
@geapi
Copy link
Contributor

@geapi geapi commented May 14, 2015

@awalterschulze: We applied Anton's patch and added a test for the nested NinOptNative, please let us know if there is anything else we can do.

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 15, 2015

Ok this looks perfect! :)

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 15, 2015

@anton-povarov my go vet works, which go version are you running?

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 15, 2015

I am running go version go1.4 darwin/amd64

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 15, 2015

1.4.2

antoxa@antoxa-suse:~/_Dev/go/src/github.com/gogo/protobuf> go vet ./...
proto/all_test.go:1283: result of fmt.Sprintf call not used
exit status 1
antoxa@antoxa-suse:~/_Dev/go/src/github.com/gogo/protobuf> go version
go version go1.4.2 linux/amd64

@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 15, 2015

That should also be an issue in golang/protobuf.
You could report it there and I'll merge it when they fix it or even if they refuse to fix it.

awalterschulze added a commit that referenced this issue May 15, 2015
Add checks in marshal/unmarshal for presence of required fields
@awalterschulze awalterschulze merged commit b9e369e into gogo:master May 15, 2015
@awalterschulze
Copy link
Member

@awalterschulze awalterschulze commented May 15, 2015

Awesome Thank you for everyone's work.
I really appreciate it.
This was great team work 👍

@anton-povarov
Copy link
Contributor

@anton-povarov anton-povarov commented May 15, 2015

Glad to see this merged, thanks guys :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants