# Create Kubernetes cluster and deploy HF model

## Prerequisites
- git-lfs installed to clone repo to s3
- 100 GB of free space on local disk

## Preparing model files
We would download model locally and then move it files to aws s3 bucket to be mounted by model container.

In [None]:
## Setting required env variables

%env S3_BUCKET_NAME=k8s-model-zephyr
%env REGION=eu-central-1
%env HF_MODEL_PATH=HuggingFaceH4/zephyr-7b-beta
%env HF_MODEL_NAME=zephyr-7b-beta
%env LOCAL_DIRECTORY=/data-tst/home/voa/projects/k8s-model
%env AWS_PROFILE voatsap-cluster-dev

In [None]:
# clone model to local folder and upload to s3 bucket
# this takes in my env(gigabit internet connection) ~9 min for clone and 6 min to upload

!mkdir $LOCAL_DIRECTORY/$HF_MODEL_NAME
!git lfs clone --depth=1 https://huggingface.co/$HF_MODEL_PATH $LOCAL_DIRECTORY/$HF_MODEL_NAME
!aws s3 mb s3://$S3_BUCKET_NAME --region $REGION || true
!aws s3 sync $LOCAL_DIRECTORY/$HF_MODEL_NAME s3://$S3_BUCKET_NAME/llm/deployment/$HF_MODEL_NAME --exclude "*.git/*"

In [None]:
# output of HF model url in s3 bucket

!echo s3://$S3_BUCKET_NAME/llm/deployment/$HF_MODEL_NAME

## Preparing cluster.dev stack variables
In cluster.dev folder there are 4 files:
- `project.yaml` to define some global variables like region
- `backend.yaml` required to set some state s3 bucket for cluster.dev and TF states
- `stack-eks.yaml` file describing values for EKS cluster configuration with required node groups with GPU support, GPU types
- `stack-model.yaml` Model variables required to deploy into EKS cluster



In [None]:
# bootstrap cluster
!cd cluster.dev
!cdev apply

In [None]:
# First we need to export KUBECONFIG to use kubectl
!export KUBECONFIG=`pwd`/kubeconfig
# Then we can examine workloads deployed in `default` namespace, since we define it in stack-model.yaml
!kubectl get pod
# To get logs from model startup, check if model is loaded without errors
!kubectl logs -f <output model pod name from kubectl get pod>
# To list services (should be model, chat and mongo if chat enabled)
!kubectl get svc
# Then you can port-forward service to your host
!kubectl port-forward svc/<model-output from above>  8080:8080
# Now you can chat with your model
!curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"Continue funny story: John decide to stick finger into outlet","parameters":{"max_new_tokens":1000}}' \
    -H 'Content-Type: application/json'

## Monitoring

Ref: https://aws.amazon.com/blogs/machine-learning/enable-pod-based-gpu-metrics-in-amazon-cloudwatch/

Now it could be set manually, but we'll add a monitoring stack to stackTemplate.

In [None]:
curl https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/main/etc/dcp-metrics-included.csv > /tmp/dcgm-metrics.csv

kubectl create namespace gpu-operator
kubectl create configmap metrics-config -n gpu-operator --from-file=/tmp/dcgm-metrics.csv

helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator \
--set dcgmExporter.config.name=metrics-config \
--set toolkit.enabled=false

# Install prometheus stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm inspect values prometheus-community/kube-prometheus-stack > /tmp/kube-prometheus-stack.values

sed -i '/serviceMonitorSelectorNilUsesHelmValues/ s/true/false/' /tmp/kube-prometheus-stack.values
yq eval '.prometheus.prometheusSpec.additionalScrapeConfigs += [{"job_name": "gpu-metrics", "scrape_interval": "1s", "metrics_path": "/metrics", "scheme": "http", "kubernetes_sd_configs": [{"role": "endpoints", "namespaces": {"names": ["gpu-operator"]}}], "relabel_configs": [{"source_labels": ["__meta_kubernetes_pod_node_name"], "action": "replace", "target_label": "kubernetes_node"}]}]' /tmp/kube-prometheus-stack.values -i

# get admin password for Grafana
kubectl -n prometheus get secret $(kubectl -n prometheus get secrets | grep grafana | cut -d ' ' -f 1) -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

# port forward Grafana
kubectl port-forward -n prometheus svc/$(kubectl -n prometheus get svc | grep grafana | cut -d ' ' -f 1) 8080:80 &
