Skip to content

nvibert/ML-Tutorial

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training & Inference in K8S

This project demonstrates how to train and deploy a neural network for handwritten digit recognition using the MNIST dataset on Kubernetes. The MNIST dataset consists of 70,000 grayscale images of handwritten digits (0-9), commonly used for benchmarking machine learning models.

🎨 New: Interactive Web Application!

We've added a fun, interactive web application that lets users draw digits and get real-time AI predictions! The web app features:

  • Two Game Modes: Free drawing and challenge mode
  • Interactive Canvas: Draw digits with mouse or touch
  • Real-time Predictions: Instant AI feedback on your drawings
  • Fun Animations: Bouncing elements, confetti, and visual effects
  • Score Tracking: Challenge mode with accuracy statistics
  • Mobile Friendly: Responsive design that works on all devices

👉 Check out the Web App documentation for setup instructions and features!

Getting Started

First, clone this repository to your local machine:

git clone https://github.com/nvibert/ML-Tutorial.git
cd ML-Tutorial

Prerequisites

  • Docker
  • kubectl
  • a K8S cluster
    • You can simply create one using kind.
    • Use Cilium as the CNI and Cilium L2 Announcement for LoadBalancer IPs.

Example Cilium L2 Announcement Policy and LB IP Pool:

apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
    name: policy1
spec:
    loadBalancerIPs: true
    interfaces:
        - eth0
    nodeSelector:
        matchExpressions:
            - key: node-role.kubernetes.io/control-plane
                operator: DoesNotExist
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
    name: "pool"
spec:
    blocks:
        - cidr: "172.18.255.200/29"

Training

The training step uses a PyTorch implementation of a Convolutional Neural Network (CNN) to learn from the MNIST dataset. The model is trained to recognize digits from images and save the trained weights for later inference.

  1. Build the training image and load it into the K8S cluster

    • This step packages the training code and dependencies into a Docker image, then loads it into your Kubernetes cluster using kind.
    cd training
    docker build -t mnist:train .
    kind load docker-image mnist:train
  2. Deploy the training pod

    • This creates a pod in your cluster that runs the training job. The pod will download the MNIST dataset, train the model, and optionally save the trained weights.
    kubectl apply -f train-pod.yaml
  3. Run the pod on a specific node (optional)

    • You can label a node and use affinity rules to schedule the training pod on that node.
    kubectl label nodes kind-worker training=allowed
    kubectl apply -f train-pod-affinity.yaml
  4. Save the trained model to persistent storage (optional)

    • Use a pod spec with a mounted volume to persist the trained model weights for later use in inference.
    kubectl apply -f train-pod-affinity-mount.yaml

Inference

The inference step deploys a Python Flask server in Kubernetes that loads the trained model and predicts the digit in uploaded images.

  1. Build the inference image and load it into the K8S cluster

    • This packages the inference code and dependencies into a Docker image, then loads it into your cluster.
    cd inference
    docker build -t mnist:inference .
    kind load docker-image mnist:inference
  2. Deploy the inference server

    • This creates a pod and service in your cluster that exposes the Flask API for digit prediction.
    kubectl apply -f inference.yaml
  3. Get the external IP of the service

    • Use this command to find the IP address for accessing the inference API from outside the cluster.
    kubectl get svc mnist-inference-service
  4. Set the LoadBalancer IP as an environment variable

    • Extract the external IP and store it for easy use in subsequent commands.
    export INFERENCE_IP=$(kubectl get svc mnist-inference-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    echo "Inference service available at: $INFERENCE_IP"
  5. Use the Interactive Web App (Recommended!) 🎨

    • Deploy the fun web application for an interactive experience:
    cd webapp
    docker build -t mnist:webapp .
    kind load docker-image mnist:webapp
    kubectl apply -f webapp.yaml
    
    # Get the web app URL
    export WEBAPP_IP=$(kubectl get svc mnist-webapp-service -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
    echo "🎨 Web app available at: http://$WEBAPP_IP"

    OR send images to the server for prediction via curl

    • Use curl to POST image files to the API. The server will return the predicted digit as JSON.

    Test with different digits:

    Test digit 0:

    curl -X POST -F "file=@data/testing/0/10.jpg" http://$INFERENCE_IP:5000/predict

    Test digit 1:

    curl -X POST -F "file=@data/testing/1/1004.jpg" http://$INFERENCE_IP:5000/predict

    Test digit 2:

    curl -X POST -F "file=@data/testing/2/1.jpg" http://$INFERENCE_IP:5000/predict

    Test digit 3:

    curl -X POST -F "file=@data/testing/3/1020.jpg" http://$INFERENCE_IP:5000/predict

    Test digit 7:

    curl -X POST -F "file=@data/testing/7/0.jpg" http://$INFERENCE_IP:5000/predict

    Test digit 9:

    curl -X POST -F "file=@data/testing/9/1000.jpg" http://$INFERENCE_IP:5000/predict

    Example responses:

    {"prediction": 0}
    {"prediction": 1}  
    {"prediction": 2}
    {"prediction": 3}
    {"prediction": 7}
    {"prediction": 9}

Optional: Deploy Cluster and Install Cilium

You can optionally deploy your kind cluster and install Cilium with the correct settings before applying the LoadBalancer IP Pool and L2 Announcement Policy.

  1. Create your kind cluster Save the following config to a file (e.g., kind-config.yaml):

    kind: Cluster
    apiVersion: kind.x-k8s.io/v1alpha4
    nodes:
      - role: control-plane
      - role: worker
      - role: worker
    networking:
      disableDefaultCNI: true

    Create the cluster:

    kind create cluster --config samples/kind-config.yaml
  2. Install Cilium with the required settings You can install Cilium using Helm or the Cilium CLI.

    • Cilium CLI:

      cilium install --set kubeProxyReplacement=true --set l2announcements.enabled=true --set ipam.mode=kubernetes --set devices='{eth0}'
    • Helm: Save the sample values file from samples/cilium-helm-values.yaml and install with:

      helm repo add cilium https://helm.cilium.io/
      helm install cilium cilium/cilium --namespace kube-system --values samples/cilium-helm-values.yaml
  3. Apply the Cilium LoadBalancer IP Pool and L2 Announcement Policy After Cilium is installed, apply the following manifests:

    kubectl apply -f samples/cilium-lb-pool.yaml
    kubectl apply -f samples/cilium-l2-policy.yaml
  4. Optional: Enable Cilium Hubble (Network Observability) Hubble provides deep network visibility for your Kubernetes cluster.

    Enable Hubble with the UI:

    cilium hubble enable --ui

    Open the Hubble UI (this will port-forward and open in your browser):

    cilium hubble ui

TO DO

  • Simulate an attack where a user would modify the folders - swapping the 6 folder with the 9 folder.
  • Understand who executed the commands with Tetragon.
  • Prevent data poisoning by adding a network policy?
  • Let's suppose that access to the training machine is compromised.
  • Add a diagram of inter-communications between client and model.
  • Add Hubble UI screenshot.
  • Add Gateway API to introduce a new model. Refer back to the failed introduction of Chat GPT5 and the new to bring back Chat GPT 4-o model.
  • Consider swapping MNIST to Fashionm-Mnist

About

Basic ML Tutorial on kind and Cilium

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 29.8%
  • HTML 25.5%
  • Shell 18.3%
  • Python 15.5%
  • CSS 7.3%
  • Slim 2.7%
  • Dockerfile 0.9%