This project demonstrates how to train and deploy a neural network for handwritten digit recognition using the MNIST dataset on Kubernetes. The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), commonly used for benchmarking machine learning models.
We've added a fun, interactive web application that lets users draw digits and get real-time AI predictions! The web app features:
- Two Game Modes: Free drawing and challenge mode
- Interactive Canvas: Draw digits with mouse or touch
- Real-time Predictions: Instant AI feedback on your drawings
- Fun Animations: Bouncing elements, confetti, and visual effects
- Score Tracking: Challenge mode with accuracy statistics
- Mobile Friendly: Responsive design that works on all devices
👉 Check out the Web App documentation for setup instructions and features!
First, clone this repository to your local machine:
git clone https://github.com/nvibert/ML-Tutorial.git
cd ML-Tutorial
- Docker
- kubectl
- a K8S cluster
Example Cilium L2 Announcement Policy and LB IP Pool:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
loadBalancerIPs: true
interfaces:
- eth0
nodeSelector:
matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: DoesNotExist
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "pool"
spec:
blocks:
- cidr: "172.18.255.200/29"
The training step uses a PyTorch implementation of a Convolutional Neural Network (CNN) to learn from the MNIST dataset. The model is trained to recognize digits from images and save the trained weights for later inference.
-
Build the training image and load it into the K8S cluster
- This step packages the training code and dependencies into a Docker image, then loads it into your Kubernetes cluster using kind.
cd training docker build -t mnist:train . kind load docker-image mnist:train
-
Deploy the training pod
- This creates a pod in your cluster that runs the training job. The pod will download the MNIST dataset, train the model, and optionally save the trained weights.
kubectl apply -f train-pod.yaml
-
Run the pod on a specific node (optional)
- You can label a node and use affinity rules to schedule the training pod on that node.
kubectl label nodes kind-worker training=allowed kubectl apply -f train-pod-affinity.yaml
-
Save the trained model to persistent storage (optional)
- Use a pod spec with a mounted volume to persist the trained model weights for later use in inference.
kubectl apply -f train-pod-affinity-mount.yaml
The inference step deploys a Python Flask server in Kubernetes that loads the trained model and predicts the digit in uploaded images.
-
Build the inference image and load it into the K8S cluster
- This packages the inference code and dependencies into a Docker image, then loads it into your cluster.
cd inference docker build -t mnist:inference . kind load docker-image mnist:inference
-
Deploy the inference server
- This creates a pod and service in your cluster that exposes the Flask API for digit prediction.
kubectl apply -f inference.yaml
-
Get the external IP of the service
- Use this command to find the IP address for accessing the inference API from outside the cluster.
kubectl get svc mnist-inference-service
-
Set the LoadBalancer IP as an environment variable
- Extract the external IP and store it for easy use in subsequent commands.
export INFERENCE_IP=$(kubectl get svc mnist-inference-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo "Inference service available at: $INFERENCE_IP"
-
Use the Interactive Web App (Recommended!) 🎨
- Deploy the fun web application for an interactive experience:
cd webapp docker build -t mnist:webapp . kind load docker-image mnist:webapp kubectl apply -f webapp.yaml # Get the web app URL export WEBAPP_IP=$(kubectl get svc mnist-webapp-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}') echo "🎨 Web app available at: http://$WEBAPP_IP"
OR send images to the server for prediction via curl
- Use
curl
to POST image files to the API. The server will return the predicted digit as JSON.
Test with different digits:
Test digit 0:
curl -X POST -F "file=@data/testing/0/10.jpg" http://$INFERENCE_IP:5000/predict
Test digit 1:
curl -X POST -F "file=@data/testing/1/1004.jpg" http://$INFERENCE_IP:5000/predict
Test digit 2:
curl -X POST -F "file=@data/testing/2/1.jpg" http://$INFERENCE_IP:5000/predict
Test digit 3:
curl -X POST -F "file=@data/testing/3/1020.jpg" http://$INFERENCE_IP:5000/predict
Test digit 7:
curl -X POST -F "file=@data/testing/7/0.jpg" http://$INFERENCE_IP:5000/predict
Test digit 9:
curl -X POST -F "file=@data/testing/9/1000.jpg" http://$INFERENCE_IP:5000/predict
Example responses:
{"prediction": 0} {"prediction": 1} {"prediction": 2} {"prediction": 3} {"prediction": 7} {"prediction": 9}
You can optionally deploy your kind cluster and install Cilium with the correct settings before applying the LoadBalancer IP Pool and L2 Announcement Policy.
-
Create your kind cluster Save the following config to a file (e.g.,
kind-config.yaml
):kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker - role: worker networking: disableDefaultCNI: true
Create the cluster:
kind create cluster --config samples/kind-config.yaml
-
Install Cilium with the required settings You can install Cilium using Helm or the Cilium CLI.
-
Cilium CLI:
cilium install --set kubeProxyReplacement=true --set l2announcements.enabled=true --set ipam.mode=kubernetes --set devices='{eth0}'
-
Helm: Save the sample values file from
samples/cilium-helm-values.yaml
and install with:helm repo add cilium https://helm.cilium.io/ helm install cilium cilium/cilium --namespace kube-system --values samples/cilium-helm-values.yaml
-
-
Apply the Cilium LoadBalancer IP Pool and L2 Announcement Policy After Cilium is installed, apply the following manifests:
kubectl apply -f samples/cilium-lb-pool.yaml kubectl apply -f samples/cilium-l2-policy.yaml
-
Optional: Enable Cilium Hubble (Network Observability) Hubble provides deep network visibility for your Kubernetes cluster.
Enable Hubble with the UI:
cilium hubble enable --ui
Open the Hubble UI (this will port-forward and open in your browser):
cilium hubble ui
- Simulate an attack where a user would modify the folders - swapping the
6
folder with the9
folder. - Understand who executed the commands with Tetragon.
- Prevent data poisoning by adding a network policy?
- Let's suppose that access to the training machine is compromised.
- Add a diagram of inter-communications between client and model.
- Add Hubble UI screenshot.
- Add Gateway API to introduce a new model. Refer back to the failed introduction of Chat GPT5 and the new to bring back Chat GPT 4-o model.
- Consider swapping MNIST to Fashionm-Mnist