bson: Added Encoder and Decoder types for stream encoding/decoding. #127

maxnoel · 2018-03-22T18:46:41Z

Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example.

JSON and YAML do that too, so let's be consistent.

domodwyer · 2018-03-26T09:20:20Z

Hi @maxnoel

Adding stream support makes sense! Thanks for the PR!

Really clear to read but I'm a little concerned that a maliciously crafted BSON document could be problematic:

Forcing a call to make([]byte, tailSize) with a negative tailSize causing a makeslice panic
First 4 bytes: 0xFF, 0xFF, 0xFF, 0xFF
Passing a malformed BSON document causing the BSON decoder to panic (a behaviour I hate, but can't change for compatibility reasons)
Forcing a 2GB memory allocation from a 4 byte file
First 4 bytes: 0xFF, 0xFF, 0xFF, 0x7F

They're solvable though! The first just needs a bounds check, and the second could be mitigated by having a recover() panic handler that returns an exported ErrCorrupt or something.

The third I think the only realistic way is to include a warning about the possibility in the documentation so at least it's not a surprise - something along the lines of "careful with untrusted data" - I can imagine this being a problem for small, low-RAM cloud instances more than anything.

Would you mind including test cases for the above? Thanks again!

Dom

szank · 2018-03-26T14:41:21Z

Please consider this:
bson is a package under mgo, so we might just apply mgo/mongodb restrictions to the bson package. Namely no document can be larger that 16MiB. This would solve the problem with malformed files where the first four bytes encode a very large integer. Opinions ?

maxnoel · 2018-03-26T15:22:19Z

Makes sense. I'll make the required modifications tomorrow, and restrict valid document sizes to [5B, 16MB]

Strangely, the BSON spec defines the document size header as a signed int32, but: - No document can be smaller than 5 bytes (size header + null terminator). - MongoDB constrains BSON documents to 16 MiB at most. Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize. In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal.

maxnoel · 2018-03-27T15:25:01Z

Done! I reused the handleErr panic handler instead of writing my own, which means I can't return a specific ErrCorruptDocument error. Let me know if you'd rather I write my own.

maxnoel · 2018-04-02T14:53:22Z

Do you need anything else?

domodwyer · 2018-04-02T14:57:56Z

Hi @maxnoel

I'm getting through the backlog this afternoon - a quick glance looks great, should have a proper review in an hour or so 👍

Thanks very much!

domodwyer · 2018-04-02T15:06:26Z

bson/stream.go

+)
+
+// ErrInvalidDocumentSize is an error returned when a BSON document's header
+// contains a size smaller than minDocumentSize or greater than maxDocumentSize.


minDocumentSize and maxDocumentSize isn't visible to the user

domodwyer · 2018-04-02T15:08:35Z

Hi @maxnoel

Could you export the two const's referenced in the description and we can get this merged 👍

Great addition, it should make mgo easier to use for many people. And thanks for the slightly paranoid defensive programming ;)

Dom

maxnoel · 2018-04-03T13:31:06Z

Done!

domodwyer · 2018-04-06T09:36:40Z

Thanks @maxnoel - we really appreciate it!

maxnoel · 2018-04-06T12:30:04Z

No problems, glad I could contribute. Post-mortem analysis of Mongo dumps is something I do fairly frequently at work, and now that I'm doing it in Go instead of Python, copy/pasting my own decoder every time got old quickly ;)

…lobalsign#127) * bson: Added Encoder and Decoder types for stream encoding/decoding. Those types are analog to those found in json and yaml. They allow us to operate on io.Readers/io.Writers instead of raw byte slices. Streams are expected to be sequences of concatenated BSON documents: *.bson files from MongoDB dumps, for example. * Stream: NewEncoder and NewDecoder now return pointers. JSON and YAML do that too, so let's be consistent. * Stream decoder: added checks on document size limits, and panic handler. Strangely, the BSON spec defines the document size header as a signed int32, but: - No document can be smaller than 5 bytes (size header + null terminator). - MongoDB constrains BSON documents to 16 MiB at most. Therefore, documents whose header doesn't obey those limits are discarded and Decode returns ErrInvalidDocumentSize. In addition, we're reusing the handleErr panic handler in Decode to protect from unwanted panics in Unmarshal. * Exported MinDocumentSize and MaxDocumentSize consts.

maxnoel added 2 commits March 22, 2018 14:42

Stream: NewEncoder and NewDecoder now return pointers.

6333e3d

JSON and YAML do that too, so let's be consistent.

domodwyer assigned maxnoel Mar 26, 2018

domodwyer added enhancement needs tests labels Mar 26, 2018

domodwyer suggested changes Apr 2, 2018

View reviewed changes

domodwyer requested review from szank, brknstrngz, tadukurow, domodwyer, weiishann, csucu and eminano April 2, 2018 15:08

domodwyer removed the needs tests label Apr 2, 2018

maxnoel added 2 commits April 3, 2018 09:30

Exported MinDocumentSize and MaxDocumentSize consts.

31dd704

Merge branch 'development' into development

a7dd080

szank approved these changes Apr 3, 2018

View reviewed changes

domodwyer approved these changes Apr 6, 2018

View reviewed changes

domodwyer merged commit b5611a5 into globalsign:development Apr 6, 2018

maxnoel deleted the development branch April 6, 2018 11:54

domodwyer mentioned this pull request Apr 23, 2018

Release/r2018.04.23 #152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bson: Added Encoder and Decoder types for stream encoding/decoding. #127

bson: Added Encoder and Decoder types for stream encoding/decoding. #127

maxnoel commented Mar 22, 2018

domodwyer commented Mar 26, 2018

szank commented Mar 26, 2018

maxnoel commented Mar 26, 2018

maxnoel commented Mar 27, 2018

maxnoel commented Apr 2, 2018

domodwyer commented Apr 2, 2018

domodwyer Apr 2, 2018

domodwyer commented Apr 2, 2018

maxnoel commented Apr 3, 2018

domodwyer commented Apr 6, 2018

maxnoel commented Apr 6, 2018

bson: Added Encoder and Decoder types for stream encoding/decoding. #127

bson: Added Encoder and Decoder types for stream encoding/decoding. #127

Conversation

maxnoel commented Mar 22, 2018

domodwyer commented Mar 26, 2018

szank commented Mar 26, 2018

maxnoel commented Mar 26, 2018

maxnoel commented Mar 27, 2018

maxnoel commented Apr 2, 2018

domodwyer commented Apr 2, 2018

domodwyer Apr 2, 2018

Choose a reason for hiding this comment

domodwyer commented Apr 2, 2018

maxnoel commented Apr 3, 2018

domodwyer commented Apr 6, 2018

maxnoel commented Apr 6, 2018