This contains steps to create k8s cluster on GCP, creating persistant storage for cluster, training & inferencing DL workload on kubernetes.
In this repository, we will develop Container and Kubernetes artifacts, perform DL training and DL Inference hosting in GKE: Google Kubernetes Engine.
Kubernetes, is an open-source platform designed to automate the deployment, scaling, and operation of application containers.
Google Kubernetes Engine[7] is a managed service provided by Google Cloud Platform (GCP) that allows you to deploy, manage, and scale containerized applications using Kubernetes. GKE gives you a Kubernetes environment on Google’s infrastructure, removing the need to install, manage, and operate your own Kubernetes clusters.
We would be using the code mnist-rnn
from the https://github.com/pytorch/examples/tree/main/mnist_rnn
repository.
We would first create a cluster, attach the PVC file and then run the necessary steps for training and testing the mnist-rnn code.
To begin with, we will need to create a kubernetes cluster on Google Kubernetes Engine.
The kubectl
will be configured in the cloud shell by the following command.
gcloud container clusters get-credentials <kubernetes-cluster-name> --zone <zone> --project <project-name>
Now we can successfully use kubectl commands
.
For more GCP commands, refer official documentation.
- To attach a PVC, we will need
pvc.yaml
which all specify all the configurations of our persistent storage. Create apvc.yaml
in the cloud cluster byvim pvc.yaml
. - In our cloud shell, we will use command
kubectl apply -f pvc.yaml
to apply these configurations to our cluster. - We can check using the command,
kubectl get pvc
orkubectl get pvc mnist-model-pvc
whether our pvc has been successfully bounded with our cluster or not. - Thus, our PVC is ready now and successfully bound, we can go ahead with training and inference steps.
For training we will have to create a dockerfile, push the docker image to docker registry, and then create a training.yaml and apply that configuration to the cluster.
- Build the docker image by
docker build -t trainmnistrnn . --platform=linux/amd64
. (This will name my image astrainmnistrnn
) - Then tag the image by
docker tag trainmnistrnn srush98/trainmnistrnn
- Then push the train image to the dockerhub registry by using the command
docker push srush98/trainmnistrnn
- Thus your train image is ready to be used by your kubernetes cluster.
- Now we will create
train.yaml
in the cloud shell byvim train.yaml
. - Deploy the training job:
kubectl apply -f train.yaml
- To check whether training is successful we can see
kubectl get jobs
andkubectl get pods
I will be using the gradio library for user interaction. The gradio related code changes have been made in inference code.
- Build the docker image by
docker build -t infermnistrnnfinal . --platform=linux/amd64
. - Then tag the image by
docker tag infermnistrnnfinal srush98/infermnistrnnfinal
. - Then push the inference image to the dockerhub registry by using the command
docker push srushti98/infermnistrnnfinal
. - Now, similar to training, we will create
infer.yaml
on our cloud shell usingvim infer.yaml
. - Deploy the inference application:
kubectl apply -f infer.yaml
- To verify whether the deployment is successful or not we can check
kubectl get deployments
andkubectl get pods
. The status running tells that the deployment is up and can be consumed by the service. - To describe a particular deployment we will use
kubectl describe deployment mnist-inference-deployment
. - Our deployment of inference code is successful now we can create our service to use our deployed app and expose it to users to use our inference code.
- Since our deployment is ready, we will now create a
service.yaml
. - Expose the Gradio app:
kubectl apply -f service.yaml
. - Verify the service is running properly by
kubectl get service
. - Describe the service using
kubectl describe service mnist-inference-service
. - Now we can see our service is up and running.
- The external ip is the ip exposed to the end user for interacting with the application.
- After this, we have to enable port forwarding, by going to services page in GCP console.
- Click on
port forwarding
. - This will give a command which will enable the forwarding.
- Once this is entered in the console, it will give a link for deployed app.
- This will be the url where the app will be deployed and be accessed.
- We can see the gradio interface[9], upload any image from mnist data and you will see the prediction on the screen as output.
[1] Docker official documentation: https://docs.docker.com/desktop/
[2] Blog of deploying an ML model on Docker. https://towardsdatascience.com/build-and-run-a-docker-container-for-your-machine-learning-model-60209c2d7a7f
[3] MNIST code repo=> https://github.com/pytorch/examples/tree/main/mnist_rnn
[4] Containzerization wikipedia=> https://en.wikipedia.org/wiki/Containerization_(computing)
[5] Kubernetes Wikipedia => https://en.wikipedia.org/wiki/Kubernetes
[6] Kubernetes official documentation => https://kubernetes.io/docs/home/
[7] GKE official documentation => https://cloud.google.com/kubernetes-engine?hl=en
[8] Medium blog of deploying containers to kubernetes => https://tsai-liming.medium.com/part-3-deploying-your-data-science-containers-to-kubernetes-aaae769144ec
[9] Gradio documentation => https://www.gradio.app/docs/interface
[10] NYU Prof. Hao and Chung's Slides => https://cs.nyu.edu/courses/spring21/CSCI-GA.3033-085/