Skip to content

Latest commit

 

History

History
137 lines (103 loc) · 5.52 KB

README.md

File metadata and controls

137 lines (103 loc) · 5.52 KB

onyx-amazon-s3

Onyx plugin for Amazon S3.

Installation

In your project file:

[org.onyxplatform/onyx-amazon-s3 "0.14.5.1-SNAPSHOT"]

Functions

Input Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-input])

Catalog entry:

{:onyx/name task-name
 :onyx/plugin :onyx.plugin.s3-input/input
 :onyx/type :input
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/max-peers 1
 :s3/bucket "mybucket"
 :s3/prefix "filter-prefix/example/"
 :s3/deserializer-fn :my.ns/deserializer-fn
 :s3/buffer-size-bytes 10000000
 :onyx/doc "Reads segments from keys in an S3 bucket."}

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-input/s3-input-calls}

Attributes

key type description
:s3/bucket string The name of the s3 bucket to read objects from.
:s3/deserializer-fn keyword A namespaced keyword pointing to a fully qualified function that will deserialize from bytes to segments. Currently only reading from newline separated values is supported, thus the serializer must deserialize line by line.
:s3/prefix string Filter the keys to be read by a supplied prefix.
:s3/file-key string When set, includes the S3 key of file from which the segment's line was read under this key.
Output Task

In your peer boot-up namespace:

(:require [onyx.plugin.s3-output])

Catalog entry:

{:onyx/name <<TASK_NAME>>
 :onyx/plugin :onyx.plugin.s3-output/output
 :s3/bucket <<BUCKET_NAME>>
 :s3/encryption :none
 :s3/serializer-fn :my.ns/serializer-fn
 :s3/key-naming-fn :onyx.plugin.s3-output/default-naming-fn
 :s3/prefix "filter-prefix/example/"
 :s3/prefix-separator "/"
 :s3/serialize-per-element? false
 :s3/max-concurrent-uploads 20
 :onyx/type :output
 :onyx/medium :s3
 :onyx/batch-size 20
 :onyx/doc "Writes segments to s3 files, one file per batch"}

Segments received by this task must be serialized to bytes by the :s3/serializer-fn, into a file per batch, placed at a key in the bucket which is named via the function defined at :s3/key-naming-fn. This function takes an event map and returns a string. Using the default naming function, :onyx.plugin.s3-output/default-naming-fn, will name keys in the following format in UTC time format: "yyyy-MM-dd-hh.mm.ss.SSS_batch_BATCH_UUID".

You can define :s3/encryption to be :aes256 if your S3 bucket has encryption enabled. The default value is :none.

When :s3/serialize-per-element? is set to true, the serializer will be called on each individual segmnt, rather than the whole batch, and will be separated by the string value set in :s3/serialize-per-element-separator.

An alternative upload mode can be selected via :s3/multi-upload documented below. When this option is used segments will be partitioned into different objects via a grouping key.

Lifecycle entry:

{:lifecycle/task <<TASK_NAME>>
 :lifecycle/calls :onyx.plugin.s3-output/s3-output-calls}

Attributes

key type description
:s3/bucket string The name of the s3 bucket to write to
:s3/serializer-fn keyword A namespaced keyword pointing to a fully qualified function that will serialize the batch of segments to bytes
:s3/key-naming-fn keyword A namespaced keyword pointing to a fully qualified function that be supplied with the Onyx event map, and produce an s3 key for the batch.
:s3/prefix string A prefix to prepend to the keys generated by :s3/key-naming-fn.
:s3/prefix-separator string A separator to add after :s3/prefix and before the result of :s3/key-naming-fn. Defaults to "/".
:s3/multi-upload boolean Flag that causes the plugin to group the batch of segments by :s3/prefix-key, and upload an object per group.
:s3/prefix-key any Used with s3/multi-upload. Key to batch segments into prefixed objects via e.g. [{:a 3 :k "batch1"}{:a 2 :k "batch2"}] will cause two objects to be uploaded under prefix "batch1" and "batch2".
:s3/content-type string Optional content type for value
:s3/encryption keyword Optional server side encryption setting. One of :sse256 or :none.
:s3/region string The S3 region to write objects to.
:s3/endpoint-url string The S3 endpoint-url to connect to (for S3-compatible storage solutions).
:s3/max-concurrent-uploads integer Maximum number of simultaneous uploads.
:s3/serialize-per-element? boolean Flag for whether to serialize as an entire batch, or serialize per element and separate by newline characters.
:s3/serialize-per-element-separator string String to separate per element strings with. Defaults to newline charactor.

Acknowledgments

Many thanks to AdGoji for allowing this work to be open sourced and contributed back to the Onyx Platform community.

Contributing

Pull requests into the master branch are welcomed.

License

Copyright © 2017 Distributed Masonry LLC

Distributed under the Eclipse Public License, the same as Clojure.