This project is dedicated to making deployments of distributed machine learning (ML) and deep learning (DL) workflows on on-premise infrastructure simple and scalable.
Kubernetes (K8s) is used for clustering and resource management as well as scaling and management of containerized applications. Apache spark is leveraged for data processing and spark ml is used for development and deployment of distributed machine learning jobs and Kubeflow for distributed deep learning pipelines.
All documentation is included in docs folder where the following can be found:
installation
: Guide for installing Kubernetes, Kubeflow, monitoring toolshowto
: Explanations for possible workflows such as ML and DL job submission, using monitoring
The results of selected benchmark tasks are in this folder
Weekly and biweekly reports and other required reports can be found here