use miniKF to understand kubernetes and kubeflow concepts by creating a small pipeline
The following guide is prepared with what I have learnt using MiniKF, the content below may no be accurate.
tips in case of failure
- make sure you have the latest virtualbox version
- try restarting the virtual environment (
vagrant reload
) vagrant reload
may fail to do a fresh restart sometimes, in that case open virtualbox GUI and delete the VM manually- make sure to upgrade miniKF as mentioned in the installation page
- if miniKF keeps pending forever use
vagrant ssh
to check the log files (ex: provisions.log)
Follow this guide Just do the clicks! If you are lucky you will finish the guide in 5 minutes
let's create our own pipeline and understand better. keep reading
minikF has following collaborators
- JupyterLab - in the quick run we created a jupyter notebook server with specifications such as CPU,GPU,RAM and Volumes. This server is similar to a real cloud service like AWS/GCS. The volumes we have mounted on the server contain the training data that we want to use in the pipeline. We used the notebook's terminal which has kubectl and Kubeflow Pipelines SDK commands.
- Rok - since miniKF runs locally the pipeline has no access to online resouces(trianing data). Hence we should create virtual volumes and pass them to the components at run time from where the components can read/write data. This makes it easy to define the components in the same file where the pipeline is defined. However this is not the ideal way to define a component. The best practices for building components cannot be used when working with miniKF.
- Building a pipeline with three components. The first two components are ideal. Each of them has a text file as input and outputs to a file. The third component reads the two output files and concatenates the content.
Step number 1,2 & 3 are optional as the docker images are already hosted in docker hub. To learn how to build and host docker images read Docker images
- write python scripts implementing the given objective https://github.com/hiruna72/miniKF_example_pipeline/tree/master/python_scripts
- build docker containers to run the scripts - one container for one script
- host the docker images in a cloud
- define components and the pipeline in a python script https://github.com/hiruna72/miniKF_example_pipeline/blob/master/small_pipeline.py
- compile the pipeline using Kubeflow Pipelines SDK https://github.com/hiruna72/miniKF_example_pipeline/blob/master/small_pipeline.tar.gz
- Use Rok to get a snapshot of the data directory (copy this content to the data volume of the notebook server)
- Upload and run the pipeline on miniKF
Once you have the Dockerfile ready easiest way to build and host docker image is to use a bash script. In the following example dockerhubusername
should be set
docker login --username ${dockerhubusername}
image_name_1=${dockerhubusername}/multiplier # Specify the image name here
image_tag_1=multiplier
full_image_name_1=${image_name_1}:${image_tag_1}
base_image_tag_1=1.12.0-py3
docker build --build-arg BASE_IMAGE_TAG=${base_image_tag_1} -t "${full_image_name_1}" .
docker push "$full_image_name_1"
docker inspect --format="{{index .RepoDigests 0}}" "${full_image_name_1}"
docker inspect
command outputs a hash link to the hosted image. This should be used when defining components