GitHub - koenvantomme/datasticks.com: Datasticks - The Lightweight, 100% Open Source Alternative to Continuously Train and Deploy Streaming ML and AI Pipelines

100% Open Source, Continuously Train and Deploy Streaming ML and AI Pipelines

Live Demo

Click here for a live demo.

Note: Do not load any sensitive data into this environment!

Docker Images

Click here for all Docker images

Related Training and Workshops

Click here for a related project used for workshops

Setup Kubernetes Cluster

Follow the instructions here.

Setup Kubernetes Client CLI

Follow the instructions here.

Clone this Repo including Submodules

git clone --recursive https://github.com/fluxcapacitor/datasticks.com

Pull Latest Tips of Submodules

cd datasticks.com

git submodule update --recursive --remote && git pull --recurse-submodules

Deploy Datasticks to Kubernetes Cluster

./datasticks-up.sh

Get all Service Host/IPs

kubectl get svc -w

(Optional) Setup Friendly CNAMEs in DNS Pointing to Service Host/IPs above

ie. AWS Route53 REST API, GoDaddy API, etc

Navigate Browser to Apache Host/IP from Above

http://<apache-host-ip>

Advanced Features and Demos

Real-time Topology View of Live Kuberentes Cluster

kubectl describe svc weavescope-app

https://<KUBERNETES-ADMIN-UI-WEAVESCOPE-HOST-IP>

Note: You can manually scale Spark Workers through WeaveScope

Manually Scale Spark Workers

kubectl scale --replicas=4 rc spark-worker-2-0-1

`bash` into Live Docker Container

kubectl get pod

kubectl exec <pod-name> -it -- bash -il

Note: You can manually bash into live Docker containers through WeaveScope

Auto-scale Spark Workers based on CPU Utilization

kubectl autoscale rc spark-worker-2-0-1 --max=4 --cpu-percent=50

Rolling Update of JupyterHub to Increase `spark.max.cores` and `spark.executor.memory`

kubectl rolling-update jupyterhub-master -f jupyterhub-rc-2cores-2gb.yaml

Continuous Deploy, Monitor, and Rollback New Spark ML and TensorFlow AI Models

TODO:  Link to jupyter notebook

Continuous, Incremental Training of Spark ML and TensorFlow AI Models from Kafka

TODO:  Link to jupyter noteook

Highly-scalable, Highly-available Model Serving using Battle-tested NetflixOSS Components

TODO:  Link to Hystrix/Turbine dashboard

Support

Email help@fluxcapacitor.com for Support!

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
apachespark.ml @ 05bf8fd		apachespark.ml @ 05bf8fd
cassandra.ml @ 06d2c45		cassandra.ml @ 06d2c45
dashboard.ml @ f40fef7		dashboard.ml @ f40fef7
jupyterhub.ml @ 1ad7e84		jupyterhub.ml @ 1ad7e84
keyvalue.ml @ bef25fa		keyvalue.ml @ bef25fa
metastore.ml @ f88a229		metastore.ml @ f88a229
presto.ml @ 71d4cfc		presto.ml @ 71d4cfc
scheduler.ml @ 3835712		scheduler.ml @ 3835712
serve.ml @ 61d5bac		serve.ml @ 61d5bac
source.ml @ 7a2c35a		source.ml @ 7a2c35a
sql.ml @ 9f04e31		sql.ml @ 9f04e31
stream.ml @ f397a89		stream.ml @ f397a89
web.ml @ 7bd90bc		web.ml @ 7bd90bc
zeppelin.ml @ 18d1b40		zeppelin.ml @ 18d1b40
zookeeper.ml @ f9b6a01		zookeeper.ml @ f9b6a01
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
datasticks-down.sh		datasticks-down.sh
datasticks-up.sh		datasticks-up.sh

License

koenvantomme/datasticks.com

Folders and files

Latest commit

History

Repository files navigation

100% Open Source, Continuously Train and Deploy Streaming ML and AI Pipelines

Live Demo

Docker Images

Related Training and Workshops

Setup Kubernetes Cluster

Setup Kubernetes Client CLI

Clone this Repo including Submodules

Pull Latest Tips of Submodules

Deploy Datasticks to Kubernetes Cluster

Get all Service Host/IPs

(Optional) Setup Friendly CNAMEs in DNS Pointing to Service Host/IPs above

Navigate Browser to Apache Host/IP from Above

Advanced Features and Demos

Real-time Topology View of Live Kuberentes Cluster

Manually Scale Spark Workers

bash into Live Docker Container

Auto-scale Spark Workers based on CPU Utilization

Rolling Update of JupyterHub to Increase spark.max.cores and spark.executor.memory

Continuous Deploy, Monitor, and Rollback New Spark ML and TensorFlow AI Models

Continuous, Incremental Training of Spark ML and TensorFlow AI Models from Kafka

Highly-scalable, Highly-available Model Serving using Battle-tested NetflixOSS Components

Support

About

Resources

License

Stars

Watchers

Forks

Languages

`bash` into Live Docker Container

Rolling Update of JupyterHub to Increase `spark.max.cores` and `spark.executor.memory`