Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bson: Added Encoder and Decoder types for stream encoding/decoding. #127

Merged
merged 5 commits into from
Apr 6, 2018
Merged

Conversation

maxnoel
Copy link

@maxnoel maxnoel commented Mar 22, 2018

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.
JSON and YAML do that too, so let's be consistent.
@domodwyer
Copy link

Hi @maxnoel

Adding stream support makes sense! Thanks for the PR!

Really clear to read but I'm a little concerned that a maliciously crafted BSON document could be problematic:

  • Forcing a call to make([]byte, tailSize) with a negative tailSize causing a makeslice panic
    First 4 bytes: 0xFF, 0xFF, 0xFF, 0xFF

  • Passing a malformed BSON document causing the BSON decoder to panic (a behaviour I hate, but can't change for compatibility reasons)

  • Forcing a 2GB memory allocation from a 4 byte file
    First 4 bytes: 0xFF, 0xFF, 0xFF, 0x7F

They're solvable though! The first just needs a bounds check, and the second could be mitigated by having a recover() panic handler that returns an exported ErrCorrupt or something.

The third I think the only realistic way is to include a warning about the possibility in the documentation so at least it's not a surprise - something along the lines of "careful with untrusted data" - I can imagine this being a problem for small, low-RAM cloud instances more than anything.

Would you mind including test cases for the above? Thanks again!

Dom

@szank
Copy link

szank commented Mar 26, 2018

Please consider this:
bson is a package under mgo, so we might just apply mgo/mongodb restrictions to the bson package. Namely no document can be larger that 16MiB. This would solve the problem with malformed files where the first four bytes encode a very large integer. Opinions ?

@maxnoel
Copy link
Author

maxnoel commented Mar 26, 2018

Makes sense. I'll make the required modifications tomorrow, and restrict valid document sizes to [5B, 16MB]

Strangely, the BSON spec defines the document size header as a signed int32, but:
- No document can be smaller than 5 bytes (size header + null terminator).
- MongoDB constrains BSON documents to 16 MiB at most.

Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize.

In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal.
@maxnoel
Copy link
Author

maxnoel commented Mar 27, 2018

Done! I reused the handleErr panic handler instead of writing my own, which means I can't return a specific ErrCorruptDocument error. Let me know if you'd rather I write my own.

@maxnoel
Copy link
Author

maxnoel commented Apr 2, 2018

Do you need anything else?

@domodwyer
Copy link

Hi @maxnoel

I'm getting through the backlog this afternoon - a quick glance looks great, should have a proper review in an hour or so 👍

Thanks very much!

bson/stream.go Outdated
)

// ErrInvalidDocumentSize is an error returned when a BSON document's header
// contains a size smaller than minDocumentSize or greater than maxDocumentSize.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minDocumentSize and maxDocumentSize isn't visible to the user

@domodwyer
Copy link

Hi @maxnoel

Could you export the two const's referenced in the description and we can get this merged 👍

Great addition, it should make mgo easier to use for many people. And thanks for the slightly paranoid defensive programming ;)

Dom

@maxnoel
Copy link
Author

maxnoel commented Apr 3, 2018

Done!

@domodwyer domodwyer merged commit b5611a5 into globalsign:development Apr 6, 2018
@domodwyer
Copy link

Thanks @maxnoel - we really appreciate it!

@maxnoel maxnoel deleted the development branch April 6, 2018 11:54
@maxnoel
Copy link
Author

maxnoel commented Apr 6, 2018

No problems, glad I could contribute. Post-mortem analysis of Mongo dumps is something I do fairly frequently at work, and now that I'm doing it in Go instead of Python, copy/pasting my own decoder every time got old quickly ;)

@domodwyer domodwyer mentioned this pull request Apr 23, 2018
libi pushed a commit to libi/mgo that referenced this pull request Dec 1, 2022
…lobalsign#127)

* bson: Added Encoder and Decoder types for stream encoding/decoding.

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.

* Stream: NewEncoder and NewDecoder now return pointers.

JSON and YAML do that too, so let's be consistent.

* Stream decoder: added checks on document size limits, and panic handler.

Strangely, the BSON spec defines the document size header as a signed int32, but:
- No document can be smaller than 5 bytes (size header + null terminator).
- MongoDB constrains BSON documents to 16 MiB at most.

Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize.

In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal.

* Exported MinDocumentSize and MaxDocumentSize consts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants