itty bitty scripts and notes for setting up data engineering pipelines
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ingestion
processing
scripts
.gitignore
README.md
ports.md

README.md

painkiller medication to relieve headaches. ingest when necessary.

Setting up environment:

Scripts under the scripts directory contain scripts. I use ZSH so ecofriendly installs that for me. It's easy to run scripts from a url. The format looks like the following:

bash <(curl -s https://raw.githubusercontent.com/katychuang/insight-sandbox/master/scripts/ecofriendly.sh)

Other scripts available include semi-automated installations for hadoop, kafka, zookeeper, storm, spark, and samza. Note that these just download files to certain directories. Configuration properties still have to be manually edited.

The remaining directories correspond to pipeline sections.

Other handy reference material