# Elastic/Nvidia GPU Integration

## Create GKE Cluster + GPU Node Pool

In [32]:
%%bash
gcloud container clusters create gpu-demo \
    --region us-central1 \
    --node-locations us-central1-a,us-central1-b,us-central1-c \
    --num-nodes 1 \
    --machine-type e2-standard-4 \
    --disk-type pd-standard \
    --disk-size 50GB

gcloud container node-pools create gpu-pool \
    --cluster gpu-demo \
    --region us-central1 \
    --node-locations us-central1-a,us-central1-b,us-central1-c \
    --num-nodes 1 \
    --enable-autoscaling \
    --total-min-nodes 3 \
    --total-max-nodes 6 \
    --machine-type g2-standard-4 \
    --disk-type pd-ssd \
    --disk-size 100GB \
    --image-type "UBUNTU_CONTAINERD" \
    --node-labels="gke-no-default-nvidia-gpu-device-plugin=true" \
    --accelerator type=nvidia-l4,count=1 \
    --location-policy ANY \
    --spot

kubectl get nodes -o custom-columns="NODE:.metadata.name,ZONE:.metadata.labels.topology\.kubernetes\.io/zone"

Note: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
Creating cluster gpu-demo in us-central1...
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

NAME      LOCATION     MASTER_VERSION      MASTER_IP      MACHINE_TYPE   NODE_VERSION        NUM_NODES  STATUS   STACK_TYPE
gpu-demo  us-central1  1.34.3-gke.1051003  34.46.172.208  e2-standard-4  1.34.3-gke.1051003  3          RUNNING  IPV4


Note: Modifications on the boot disks of node VMs do not persist across node recreations. Nodes are recreated during manual-upgrade, auto-upgrade, auto-repair, and auto-scaling. To preserve modifications across node recreation, use a DaemonSet.
Note: Machines with GPUs have certain limitations which may affect your workflow. Learn more at https://cloud.google.com/kubernetes-engine/docs/how-to/gpus
Note: Starting in GKE 1.30.1-gke.115600, if you don't specify a driver version, GKE installs the default GPU driver for your node's GKE version.
Creating node pool gpu-pool...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................

NAME      MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
gpu-pool  g2-standard-4  100           1.34.3-gke.1051003
NODE                                      ZONE
gke-gpu-demo-default-pool-571f952a-sm3g   us-central1-a
gke-gpu-demo-default-pool-b9d8fce9-22ls   us-central1-c
gke-gpu-demo-default-pool-cf534126-pz85   us-central1-b
gke-gpu-demo-gpu-pool-3b088dfc-4pwp       us-central1-a
gke-gpu-demo-gpu-pool-6395e6ae-gntm       us-central1-b
gke-gpu-demo-gpu-pool-a5e9250b-rmgg       us-central1-c


## Deploy Nvidia GPU Operator

In [33]:
%%bash
kubectl create ns gpu-operator
kubectl apply -n gpu-operator -f manifests/gpu-operator-quota.yaml

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update
helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --version=v25.10.1 \
    --set toolkit.env[0].name=RUNTIME_CONFIG_SOURCE \
    --set toolkit.env[0].value=file

echo "Waiting for GPU capacity to be registered on nodes..."

while true; do
    DESIRED=$(kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset --no-headers 2>/dev/null | wc -l)
    READY_COUNT=$(kubectl get nodes -o custom-columns=CAP:.status.capacity.'nvidia\.com/gpu' --no-headers 2>/dev/null | grep -v '<none>' | grep -v '^0$' | wc -l)
    DESIRED=$(echo "$DESIRED" | tr -d ' ')
    READY_COUNT=$(echo "$READY_COUNT" | tr -d ' ')

    if [ "$DESIRED" -gt 0 ] && [ "$READY_COUNT" -ge "$DESIRED" ]; then
        echo "✅ Success: $READY_COUNT nodes have active GPUs (Targeted: $DESIRED)"
        break
    fi
    sleep 5
done

namespace/gpu-operator created
resourcequota/gpu-operator-quota created
"nvidia" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repository
...Successfully got an update from the "elastic" chart repository
Update Complete. ⎈Happy Helming!⎈




NAME: gpu-operator-1770497980
LAST DEPLOYED: Sat Feb  7 13:59:42 2026
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
DESCRIPTION: Install complete
TEST SUITE: None
Waiting for GPU capacity to be registered on nodes...
✅ Success: 3 nodes have active GPUs (Targeted: 3)


## Deploy Elastic Cluster

In [34]:
%%bash
kubectl create -f https://download.elastic.co/downloads/eck/3.3.0/crds.yaml > /dev/null 2>&1
kubectl apply -f https://download.elastic.co/downloads/eck/3.3.0/operator.yaml > /dev/null 2>&1
kubectl apply -f manifests/es.yaml
kubectl apply -f manifests/kb.yaml

STS_NAME="elastic-es-data-node"
echo "⏳ Waiting for Elasticsearch ($STS_NAME) to be fully ready..."
while ! kubectl get statefulset $STS_NAME > /dev/null 2>&1; do sleep 2; done

until kubectl get statefulset $STS_NAME -o json | jq -e '
   .status.readyReplicas == .spec.replicas and 
   .status.updatedReplicas == .spec.replicas' > /dev/null; do
    sleep 5
done
echo "✅ Elasticsearch is Ready."

KB_DEPLOYMENT_NAME="kibana-kb" 
echo "⏳ Waiting for Kibana to rollout..."
while ! kubectl get deployment $KB_DEPLOYMENT_NAME > /dev/null 2>&1; do sleep 2; done
kubectl rollout status deployment/$KB_DEPLOYMENT_NAME --timeout=300s
echo "✅ Kibana is Ready."

cat > .env << EOF
ELASTIC_USERNAME=elastic
ELASTIC_PASSWORD=$(kubectl get secret elastic-es-elastic-user -o go-template='{{.data.elastic | base64decode}}')
ELASTIC_URL=https://$(kubectl get svc elastic-es-http -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):9200
EOF
kubectl get secret elastic-es-http-certs-public -o jsonpath='{.data.ca\.crt}' | base64 --decode > ca.crt

echo
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.nodeName}{"\t"}{.metadata.labels.app}{"\n"}{end}' | while read pod node app; do
    zone=$(kubectl get node $node -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}')
    echo "Pod: $pod | Zone: $zone | Node: $node"
done | column -t -s "|"
echo
echo "Kibana is available at: https://$(kubectl get svc kibana-kb-http -o jsonpath='{.status.loadBalancer.ingress[0].ip}'):5601"

secret/eck-trial-license created
elasticsearch.elasticsearch.k8s.elastic.co/elastic created
kibana.kibana.k8s.elastic.co/kibana created
⏳ Waiting for Elasticsearch (elastic-es-data-node) to be fully ready...
✅ Elasticsearch is Ready.
⏳ Waiting for Kibana to rollout...
Waiting for deployment "kibana-kb" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "kibana-kb" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "kibana-kb" rollout to finish: 1 old replicas are pending termination...
deployment "kibana-kb" successfully rolled out
✅ Kibana is Ready.

Pod: elastic-es-data-node-0       Zone: us-central1-b    Node: gke-gpu-demo-gpu-pool-6395e6ae-gntm
Pod: elastic-es-data-node-1       Zone: us-central1-a    Node: gke-gpu-demo-gpu-pool-3b088dfc-4pwp
Pod: elastic-es-data-node-2       Zone: us-central1-c    Node: gke-gpu-demo-gpu-pool-a5e9250b-rmgg
Pod: elastic-es-master-node-0     Zone: us-central1-b    Node: gke-gpu-demo-def

## Confirm Elastic Detects GPU

In [37]:
! kubectl logs elastic-es-data-node-0 -c elasticsearch | grep "NVIDIA"


{"@timestamp":"2026-02-07T21:14:59.867Z","log.level": "INFO","message":"Found compatible GPU [NVIDIA L4] (id: [0])", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"main","log.logger":"org.elasticsearch.gpu.GPUSupport","elasticsearch.node.name":"elastic-es-data-node-0","elasticsearch.cluster.name":"${sys:es.logs.cluster_name}"}


## Configure EIS for Self-Managed
https://www.elastic.co/docs/explore-analyze/elastic-inference/connect-self-managed-cluster-to-eis

## Install Python Prerequisites

In [39]:
! pip install -q -U -r requirements.txt

## Create Index + Semantic Search with Jina on EIS

In [40]:
import os
from dotenv import load_dotenv
from faker import Faker
import tqdm
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk, pack_dense_vector

DATASET_SIZE = 150
BATCH_SIZE = 15
INDEX_NAME = "jina_index"

faker = Faker(['en_US', 'es_ES', 'fr_FR', 'de_DE', 'zh_CN']) 
faker.seed_instance(12345)

load_dotenv(override=True)
es = Elasticsearch(
    hosts=os.getenv("ELASTIC_URL"),
    basic_auth=(os.getenv("ELASTIC_USERNAME"), os.getenv("ELASTIC_PASSWORD")),
    ca_certs="./ca.crt"
)

settings = {
    "index": {
        "number_of_shards": 3,
        "number_of_replicas": 1
    }
}
mappings = {
    "properties": {
        "paragraph": { 
            "type": "text" 
        },
        "embedding": {
            "type": "dense_vector",
            "dims": 1024,
            "index_options": { 
                "type": "int8_hnsw" 
            }
        }
    }
}  

es.options(ignore_status=[404]).indices.delete(index=INDEX_NAME)
es.indices.create(index=INDEX_NAME, body={"settings": settings, "mappings": mappings})

def get_jina_embeddings(batch):
    response = es.inference.inference(input=batch, inference_id=".jina-embeddings-v3")
    return [item['embedding'] for item in response['text_embedding']]

def generate_actions():
    for _ in tqdm.tqdm(range(DATASET_SIZE // BATCH_SIZE)):
        paragraphs = [faker.paragraph() for _ in range(BATCH_SIZE)]
        embeddings = get_jina_embeddings(paragraphs)
        for paragraph, embedding in zip(paragraphs, embeddings):
            yield {
                "paragraph": paragraph, 
                "embedding": pack_dense_vector(embedding)
            }

ok, result = bulk(client=es, index=INDEX_NAME, actions=generate_actions())
print(f"{ok} documents indexed.")
es.indices.refresh(index=INDEX_NAME)

query_str = faker['de_DE'].paragraph()
query_embedding = get_jina_embeddings([query_str])[0]
response = es.search(
    index = INDEX_NAME,
    knn = {
        "field": "embedding",
        "query_vector": query_embedding,
        "k": 5
    },
    source=["paragraph"]
)
print("\nQuery:", query_str)
print("\nTop 5 results:")
for hit in response['hits']['hits']:
    score = hit['_score']
    paragraph = hit['_source']['paragraph']
    print(f"Score: {score:.4f} | Text: {paragraph[:60]}...")

100%|██████████| 10/10 [00:05<00:00,  1.70it/s]


150 documents indexed.

Query: Mit dauern andere der. Dein Schiff Minutenmir neun Winter. Gab Abend Mutter schwer Minute.

Top 5 results:
Score: 0.6585 | Text: Darauf Mutter bauen böse Woche. Nur Apfel baden um. Brauchen...
Score: 0.6519 | Text: Bruder schicken ein her wollen nennen. Tante fehlen von Esse...
Score: 0.6419 | Text: Toi naturel respect passé petit haute. Voiture lien marier p...
Score: 0.6245 | Text: Depuis votre voix tombe surveiller midi tout. Circonstance p...
Score: 0.6238 | Text: Natürlich ließ wollen auf dort. Dich fast Meer....


## Verify Elastic GPU Usage via NVIDIA SMI

In [41]:
! kubectl exec elastic-es-data-node-0 -c elasticsearch -- nvidia-smi

Sat Feb  7 21:31:11 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.105.08             Driver Version: 580.105.08     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA L4                      On  |   00000000:00:03.0 Off |                    0 |
| N/A   50C    P0             34W /   72W |     308MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+----------------------------------------------

## Destroy Environment

In [42]:
%%bash
rm -f .env
rm -f ca.crt
gcloud container clusters delete gpu-demo \
--region us-central1 \
--quiet

Deleting cluster gpu-demo...
...........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................