S3 loader

Enes Aldemir edited this page Aug 20, 2017 · 1 revision

HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Storage > Snowplow S3 Loader

Overview

The Snowplow S3 Loader consumes records from an Amazon Kinesis stream or NSQ topic, and writes them to S3.

There are 2 file formats supported:

  • LZO
  • GZip

LZO

The records are treated as raw byte arrays. Elephant Bird's BinaryBlockWriter class is used to serialize them as a Protocol Buffers array (so it is clear where one record ends and the next begins) before compressing them.

The compression process generates both compressed .lzo files and small .lzo.index files (splittable LZO). Each index file contain the byte offsets of the LZO blocks in the corresponding compressed file, meaning that the blocks can be processed in parallel.

GZip

The records are treated as byte arrays containing UTF-8 encoded strings (whether CSV, JSON or TSV). New lines are used to separate records written to a file. This format can be used with the Snowplow Kinesis Enriched stream, among other streams.

See also the setup guide.

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.