# Create Kubernetes cluster and deploy HF model

## Reference article
https://getindata.com/blog/deploy-open-source-llm-private-cluster-hugging-face-gke-autopilot/

## Prerequisites
- git-lfs installed to clone repo to s3
- 100 GB of free space on local disk

## Preparing model files
We would download model locally and then move it files to aws s3 bucket to be mounted by model container.

In [2]:
## Setting required env variables

%env S3_BUCKET_NAME=k8s-model-zephyr
%env REGION=eu-central-1
%env HF_MODEL_PATH=HuggingFaceH4/zephyr-7b-beta
%env HF_MODEL_NAME=zephyr-7b-beta
%env LOCAL_DIRECTORY=/data-tst/home/voa/projects/k8s-model
%env AWS_PROFILE voatsap-cluster-dev

env: S3_BUCKET_NAME=k8s-model-zephyr
env: REGION=eu-central-1
env: HF_MODEL_PATH=HuggingFaceH4/zephyr-7b-beta
env: HF_MODEL_NAME=zephyr-7b-beta
env: LOCAL_DIRECTORY=/data-tst/home/voa/projects/k8s-model
env: AWS_PROFILE=voatsap-cluster-dev


In [6]:
# clone model to local folder and upload to s3 bucket
# this takes in my env(gigabit internet connection) ~9 min for clone and 6 min to upload

!mkdir $LOCAL_DIRECTORY/$HF_MODEL_NAME
!git lfs clone --depth=1 https://huggingface.co/$HF_MODEL_PATH $LOCAL_DIRECTORY/$HF_MODEL_NAME
!aws s3 mb s3://$S3_BUCKET_NAME --region $REGION || true
!aws s3 sync $LOCAL_DIRECTORY/$HF_MODEL_NAME s3://$S3_BUCKET_NAME/llm/deployment/$HF_MODEL_NAME --exclude "*.git/*"

          with new flags from 'git clone'

'git clone' has been updated in upstream Git to have comparable
speeds to 'git lfs clone'.
Cloning into '/data-tst/home/voa/projects/k8s-model/zephyr-7b-beta'...


remote: Enumerating objects: 25, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (23/23), done.[K
remote: Total 25 (delta 1), reused 21 (delta 1), pack-reused 0[K
Unpacking objects: 100% (25/25), 531.75 KiB | 1.50 MiB/s, done.
make_bucket failed: s3://k8s-model-zephyr An error occurred (BucketAlreadyOwnedByYou) when calling the CreateBucket operation: Your previous request to create the named bucket succeeded and you already own it.


In [7]:
# copy HF model to s3 bucket

!aws s3 sync $LOCAL_DIRECTORY/$HF_MODEL_NAME s3://$S3_BUCKET_NAME/llm/deployment/$HF_MODEL_NAME --exclude "*.git/*"

upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/all_results.json to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-beta/all_results.json
upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/added_tokens.json to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-beta/added_tokens.json
upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/.gitattributes to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-beta/.gitattributes
upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/generation_config.json to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-beta/generation_config.json
upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/config.json to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-beta/config.json
upload: ../../../../../../../data-tst/home/voa/projects/k8s-model/zephyr-7b-beta/eval_results.json to s3://k8s-model-zephyr/llm/deployment/zephyr-7b-b

## Preparing cluster.dev stack variables
In cluster.dev folder there are 4 files:
- `project.yaml` to define some global variables like region
- `backend.yaml` required to set some state s3 bucket for cluster.dev and TF states
- `stack-eks.yaml` file describing values for EKS cluster configuration with required node groups with GPU support, GPU types
- `stack-model.yaml` variables required to deploy model into EKS cluster


In [None]:
# bootstrap cluster
!cd cluster.dev
!cdev apply

In [None]:
# install nvidia drivers
!kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/main/nvidia-device-plugin.yml

In [None]:
# deploy model files
!kubectl apply -f ../kubernetes/model/deployment.yaml

# port forward model service
!kubectl port-forward svc/zephyr-7b-alpha-service  8081:8080

!curl 127.0.0.1:8081/generate \
    -X POST \
    -d '{"inputs":"Continue funny story: John decide to stick finger into outlet","parameters":{"max_new_tokens":2000}}' \
    -H 'Content-Type: application/json'