add support for ZStandard & gzip compression, use structure.Compression field #256

b5 · 2021-06-04T13:53:35Z

Overhaul compression package to define supported formats, Reader & Writer constructors
add a new dependency on github.com/klauspost/compress to support zstd & gzip formats
adjust dsio Readers & Writers to wrap with Compression / Decompression
add a new BodyFilename method to dataset.Structure that interprets things like Structure{ Format: "csv", Compression: "gzip" } -> body.csv.gzip
adjust detect package to pick up on multi-extension compression formats, converting filenames like body.csv.gzip -> Structure{ Format: "csv", Compression: "gzip" }

starting to think about *actually* supporting compressed data, starting with zstandard

add support for reading & writing compressed CBOR,CSV,JSON data via compression package

…format

extract data & compression formats from a filename string by examining file extensions. Assumes that when multiple exentions are present they come in the order: filename.[data_format].[compression_format] BREAKING CHANGE: detect.ExtensionDataFormat -> dectect.FormatFromFilename, now returns 3 args: dataFormat, compressionFormat, error

detect/detect.go

compression/compression.go

ramfox · 2021-06-04T19:07:04Z

compression/compression_test.go

+			}
+
+			if result.String() != plainText {
+				t.Errorf("compression roun trip result mismatch.\nwant: %s\ngot: %s", plainText, result.String())


roun -> round

Arqu

Aside from Kasey's comments, I don't have much to add code wise and generally I'd give it a thumbs up. I Like the maybeWrap(De)Compressor layer to smooth things out.

However the thing I'm slightly concerned is the work this implies further up the stack. While this should not break anything with current usage, I wonder about the product implications and expectations. Specifically the /get endpoints. Not a 100% sure if they will pick up the correct extension here and even if so, what are the expectations, return the orignal binary, or further, what if it's a compressed file and they request a regular body. Things balloon further with larger body sizes.

Anyways, I think we should make a decision if this should be reserved for hard mode users or if it should bubble up.

👍

b5 · 2021-06-04T20:19:13Z

with current usage, I wonder about the product implications and expectations. Specifically the /get endpoints.

great question. To date we've designed the Qri binary always interprets the data being requested, even if no format conversion is involved (by interpret I mean create a dsio.EntryReader & read individual entries until complete). Adding compression support here should only affect what types of persisted data Qri can interpret & persist, meaning if a user asks for data, they're always getting structured data in the format they asked for, and by default that format doesn't include compression.

We use a pattern a bunch up in the main qri codebase where we create a new dataset.Structure for writing responses, pull things from the Structure we're reading from (like Schema and Format), then dsio.Copy from the original source to our modified write destination. It's my expectation that pattern will stick, and compression won't be added to the destination dataset.Structure unless the user explicitly asks for it.

Arqu

Thx for the explainer on that one. I guess I'm good with the PR once those nitpicks are addressed.

b5 · 2021-06-04T20:56:18Z

ok, ran the qri test suite atop this branch just to be sure, everything looks good so far!

b5 added 3 commits June 3, 2021 13:01

feat(compress): overhaul compression package, support zstandard

ffd2363

starting to think about *actually* supporting compressed data, starting with zstandard

feat(compression): add support for gzip

5276241

feat(dsio): use structure.Compression field with Readers & Writers

071a80a

add support for reading & writing compressed CBOR,CSV,JSON data via compression package

b5 added the feat A code change that adds functionality label Jun 4, 2021

b5 self-assigned this Jun 4, 2021

docs: fix golint errors

25aae29

b5 force-pushed the b5/feat_compres_zstd branch from a551234 to 25aae29 Compare June 4, 2021 13:57

b5 added 2 commits June 4, 2021 12:45

feat(dataset): BodyFilename produces strings with data & compression …

073a3e8

…format

b5 changed the title ~~add support for zest & gzip compression, using structure.Compression field~~ add support for ZStandard & gzip compression, use structure.Compression field Jun 4, 2021

b5 requested review from dustmop, Arqu and ramfox June 4, 2021 17:29