Skip to content
a microframework for ETL solutions intended to help you write more code while reducing boilerplate
Python Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bert added bottle_schedule_expression to bert-etl yaml config Nov 12, 2019
bert_tests aws lambda deploy target stable Aug 31, 2019
bin added KMS support Oct 30, 2019
docs updated docs on invoke_args Oct 29, 2019
.gitignore Added auto-run for redis-service backend Jul 2, 2019
LICENSE
Makefile Fixed up a few issues abount running the deployer without a bert-etl-… Sep 24, 2019
README.md added some documentation Oct 27, 2019
setup.py added bottle_schedule_expression to bert-etl yaml config Nov 12, 2019

README.md

Documentation Status

Bert

A microframework for simple ETL solutions.

Architecture

At its core, bert-etl uses Dynamodb Streams to communicate between lambda functions. bert-etl.yaml provides control on how the initial lambda function is called, either by periodic events, sns topics, or s3 bucket (planned)events. Passing an event to bert-etl is straight forward from zappa or a generic AWS lambda function you've hooked up to API Gateway.

At this moment in time, there are no plans to attach API Gateway to bert-etl.yaml because there is already great software(like zappa) that does this.

Begin with

Lets begin with an example of loading data from a file-server and than loading it into numpy arrays

$ virtualenv -p $(which python3) env
$ source env/bin/activate
$ pip install bert-etl
$ pip install librosa # for demo project
$ docker run -p 6379:6379 -d redis # bert-etl runs on redis to share data across CPUs
$ bert-runner.py -n demo
$ PYTHONPATH='.' bert-runner.py -m demo -j sync_sounds -f

Release Notes

0.3.0

  • Added Error Management. When an error occurs, bert-runner will log the error and re-run the job. If the same error happens often enough, the job will be aborted

0.2.1

  • Added Release Notes

0.2.0

  • Added Redis Service auto run. Using docker, redis will be pulled and started in the background
  • Added Redis Service channels, sometimes you'll want to run to etl-jobs on the same machine

Fund Bounty Target Upgrades

Bert provides a boiler plate framework that'll allow one to write concurrent ETL code using Pythons' microprocessing module. One function starts the process, piping data into a Redis backend that'll then be consumed by the next function. The queues are respectfully named for the scope of the function: Work(start) and Done(end) queue. Please consider contributing to Bert Bounty Targets to improve this documentation

https://www.patreon.com/jbcurtin

Roadmap

  • Create configuration file, bert-etl.yaml
  • Support conda venv
  • Support pyenv venv
  • Support dynamodb flush
  • Support multipule invocations per AWS account
  • Support undeploy AWS Lambda
  • Support Bottle functions in AWS Lambda

Tutorial Roadmap

  • Introduce Bert API
  • Explain bert.binding
  • Explain comm_binder
  • Explain work_queue
  • Explain done_queue
  • Explain ologger
  • Explain DEBUG and how turning it off allows for x-concurrent processes
  • Show an example on how to load timeseries data, calcualte the mean, and display the final output of the mean
  • Expand the example to show how to scale the application implicitly
  • Show how to run locally using Redis
  • Show how to run locally without Redis, using Dynamodb instead
  • Show how to run remotly using AWSLambda and Dynamodb
  • Talk about dynamodb and eventual consistency
You can’t perform that action at this time.