The purpose of this project is to enhance my Clojure skills and use them to resolve a data pipeline problem.
There is a requirement to transfer data from a relational database to a data lake in near real-time.
The project proposes using Debezium to capture changes (CDC - such as inserts, updates, and deletions) in the source database. The captured data will be processed and transformed by Clojure core.async and then loaded into delta.io format on the data lake.
other???
Core.Async in Use - Timothy Baldridge ETL with Clojure and Datomic grammarly - Building ETL Pipelines with Clojure and Transducers
Task | Status |
---|---|
Install SQL Server 2022 (16.x) on Ubuntu 22.04 local machine | DONE |
Install PostreSQL on Ubuntu 22.04 local machine | DONE |
Configure CDC on SQL Server | DONE |
Configure CDC on PostreSQL | TODO |
Embed Debezium Engine in Clojure App | TODO |
Read and process Debezium message in Clojure | TODO |
core.async | TODO |
Transducers | TODO |
Delta.io | TODO |
Azure Cloud Terraform | TODO |
Azure Cloud CICD | TODO |
Download from https://github.com/rtburger/my-cdc-project-test
FIXME: explanation
Run the project directly, via :exec-fn
:
$ clojure -X:run-x
Hello, Clojure!
Run the project, overriding the name to be greeted:
$ clojure -X:run-x :name '"Someone"'
Hello, Someone!
Run the project directly, via :main-opts
(-m rtburger.my-cdc-project-test
):
$ clojure -M:run-m
Hello, World!
Run the project, overriding the name to be greeted:
$ clojure -M:run-m Via-Main
Hello, Via-Main!
Run the project's tests (they'll fail until you edit them):
$ clojure -T:build test
Run the project's CI pipeline and build an uberjar (this will fail until you edit the tests to pass):
$ clojure -T:build ci
This will produce an updated pom.xml
file with synchronized dependencies inside the META-INF
directory inside target/classes
and the uberjar in target
. You can update the version (and SCM tag)
information in generated pom.xml
by updating build.clj
.
If you don't want the pom.xml
file in your project, you can remove it. The ci
task will
still generate a minimal pom.xml
as part of the uber
task, unless you remove version
from build.clj
.
Run that uberjar:
$ java -jar target/net.clojars.rtburger/my-cdc-project-test-0.1.0-SNAPSHOT.jar
FIXME: listing of options this app accepts.
...
...
Copyright © 2023 rtburger
EPLv1.0 is just the default for projects generated by deps-new
: you are not
required to open source this project, nor are you required to use EPLv1.0!
Feel free to remove or change the LICENSE
file and remove or update this
section of the README.md
file!
Distributed under the Eclipse Public License version 1.0.