Skip to content
This repository has been archived by the owner on Mar 10, 2019. It is now read-only.

mfikes/self-host-etl-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

self-host-etl-pipeline

A translation of Building ETL pipelines with Clojure and transducers to self-hosted ClojureScript.

Motivation: I was curious if this code could be cleanly translated to self-host, using Planck's IO facilities and what the performance difference might be for that environment when using transducers.

The code is in the etl-pipeline.core namespace.

Usage

Start up Planck, setting it to use src for code:

$ planck -c src

Load the code and change to the namespace:

(require 'etl-pipeline.core)
(in-ns 'etl-pipeline.core)

Create a dummy JSON file:

(create-file)

Time processing without transducers:

(time (process ["/tmp/dummy.json"]))
(time (process (repeat 8 "/tmp/dummy.json")))

Time processing with transducers:

(time (process-with-transducers ["/tmp/dummy.json"]))
(time (process-with-transducers (repeat 8 "/tmp/dummy.json")))

Comparison

Processing without transducers:

1 file:
Clojure: 2857.870524 msecs
Self-host: 8620.306281 msecs

8 files:
Clojure: 29106.211138 msecs
Self-host: 72213.714800 msecs

Processing with transducers:

1 file:
Clojure: 2595.401761 msecs
Self-host: 7374.490957 msecs

8 files:
Clojure: 19478.215058 msecs
Self-host: 60890.650729 msecs

Interestingly, Planck without transducers ends up using about 1 1/2 cores, while with transducers, it uses 1 core. (Perhaps this reflects JavaScriptCore collecting garbage in the non-transducers use case.)

About

Self-host ETL Pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published