itty bitty scripts and notes for setting up data engineering pipelines
Setting up environment:

Scripts under the scripts directory contain scripts. I use ZSH so ecofriendly installs that for me. It's easy to run scripts from a url. The format looks like the following:

bash <(curl -s

Other scripts available include semi-automated installations for hadoop, kafka, zookeeper, storm, spark, and samza. Note that these just download files to certain directories. Configuration properties still have to be manually edited.

The remaining directories correspond to pipeline sections.

Other handy reference material