- Command Line Tool Dependencies
- Cluster Requirements
- Installing Services
- Installing the GroundX Application
- Enabling Autoscaling (HPA)
- How Autoscaling Works
- Enabling the Custom Metrics Server
- Prometheus Integration (Optional)
- Using Simulated LLM Responses (Optional)
With this repository you can deploy GroundX RAG document ingestion and search capabilities to a Kubernetes cluster in a manner that can be isolated from any external dependencies.
GroundX delivers a unique approach to advanced RAG that consists of three interlocking systems:
- GroundX Ingest: A state-of-the-art vision model trained on over 1M pages of enterprise documents. It delivers unparalleled document understanding and can be fine-tuned for your unique document sets.
- GroundX Store: Secure, encrypted storage for source files, semantic objects, and vectors, ensuring your data is always protected.
- GroundX Search: Built on OpenSearch, it combines text and vector search with a fine-tuned re-ranker model for precise, enterprise-grade results.
In head-to-head testing, GroundX significantly outperforms many popular RAG tools (ref1, ref2, ref3), especially with respect to complex documents at scale. GroundX is trusted by organizations like Air France, Dartmouth and Samsung with over 2 billion tokens ingested on our models.
GroundX On-Prem allows you to leverage GroundX within hardened and secure environments. GroundX On-Prem requires no external dependencies when running, meaning it can be used in air-gapped environments. Deployment consists of two key steps:
- (Optional) Creation of Infrastructure on AWS via Terraform
- Deployment of GroundX onto Kubernetes via Helm
Currently, creation of infrastructure via Terraform is only supported for AWS. However, with sufficient expertise GroundX can be deployed onto any pre-existing Kubernetes cluster.
This repo is in Open Beta. Feedback is appreciated and encouraged. To use the hosted version of GroundX visit EyeLevel.ai. For white glove support in configuring this open source repo in your environment, or to access more performant and closed source versions of this repo, contact us. To learn more about what GroundX is, and what it's useful for, you may be interested in the following resources:
- A Video discussion the importance of parsing, and a comparison of several approaches
- GroundX being used to power a multi-modal RAG application
- GroundX being used to power a verbal AI Agent
If you're deploying GroundX On-Prem on AWS, you might be interested in this simple video guide for deploying on AWS. To see how well GroundX understands your documents, check out our online testing tool:
![]() |
|---|
| Test your documents for free online |
The GroundX ingest service expects visually complex documents in a variety of formats. It analyzes those documents with several fine tuned models, converts the documents into a queryable representation which is designed to be understood by LLMs, and stores that information for downstream search.
Once documents have been processed via the ingest service they can be queried against via natural language queries. We use a custom configuration of Open Search which has been designed in tandem with the representations generated from the ingest service.
You must have the following command line tools installed:
- `bash` shell (version 4.0 or later recommended. AWS Cloud Shell has insufficient resources.)
- `kubectl` (or `oc`) configured to a namespace you can write to (e.g., `eyelevel`) ([Setup Docs](https://kubernetes.io/docs/tasks/tools/))
- `helm` v3.8+In order to deploy GroundX On-Prem to your Kubernetes cluster, you must:
- Check that you have the required compute resources
- Configure or create appropriate node groups and nodes
- Create a Namespace or use an existing one
- Create a PV Class or use an existing one
- Install the NVIDIA GPU Operator if it's not already installed
By default, the GroundX On-Prem pods deploy to nodes using node selector labels and tolerations. Here is an example from one of the k8 yaml configs:
nodeSelector:
eyelevel_node: "eyelevel-cpu-only"
tolerations:
- key: "eyelevel_node"
value: "eyelevel-cpu-only"
effect: "NoSchedule"Node labels are defined in the values.yaml and must be applied to appropriate nodes within your cluster. Default node label values are:
eyelevel-cpu-memory
eyelevel-cpu-only
eyelevel-gpu-layout
eyelevel-gpu-ranker
eyelevel-gpu-summary
The publicly available GroundX On-Prem Kubernetes pods are all built for x86_64 architecture. Pods built for other architectures, such as arm64, are available upon customer request.
The GroundX On-Prem GPU pods are designed to run on NVIDIA GPUs with CUDA 12+. Other GPU types or older driver versions are not supported.
As part of the deployment, the NVIDIA GPU Operator must be installed. We offer terraform scripts to deploy the NVIDIA GPU Operator to your cluster, if you have not already done so.
The NVIDIA GPU operator should update your NVIDIA drivers and other software components needed to provision the GPU, so long as you have supported NVIDIA hardware on the machine.
The GroundX On-Prem recommended resource requirements are:
eyelevel-cpu-only
80 GB disk drive space
8 CPU cores
16 GB RAM
eyelevel-cpu-memory
20 GB disk drive space
4 CPU cores
16 GB RAM
eyelevel-gpu-layout
16 GB GPU memory
75 GB disk drive space
4 CPU cores
12 GB RAM
eyelevel-gpu-ranker
16 GB GPU memory
75 GB disk drive space
8 CPU cores
30 GB RAM
eyelevel-gpu-summary
48 GB GPU memory
100 GB disk drive space
4 CPU cores
30 GB RAM
The GroundX On-Prem pods are grouped into 5 categories, based on resource requirements, and deploy as described in the node group section.
These pods can be deployed to 5 different dedicated node groups, a single node group, or any combination in between, so long as the minimum resource requirements are met and the appropriate node labels are applied to the nodes.
The resource requirements are as follows:
Pods in this node group have minimal requirements on CPU, RAM, and disk drive space. They can run on virtually any machine with the supported architecture.
The resource requirements for these pods are described in more detail in the Total Recommended Resources section above.
Pods in this node group have a range of requirements on CPU, RAM, and disk drive space but can typically run on most machines with the supported architecture.
CPU and memory intensive ingestion pipeline pods, such as layout_ocr, layout_save, and pre_process, will deploy to the eyelevel-cpu-memory nodes. The layout_ocr pod includes tesseract, which benefits from access to multiple vCPU cores.
The resource requirements for these pods are described in more detail in the Total Recommended Resources section above.
Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.
The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.
The current configuration for this service assumes an NVIDIA GPU with 16 GB of GPU memory, 4 CPU cores, and at least 12 GB RAM. It deploys 1 pod with threads on this node (called layout.inference.threads in values.yaml) and claims the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator.
If your machine has different resources than this, you will need to modify layout.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.
Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).
Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.
The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.
The current configuration for this service assumes an NVIDIA GPU with 16 GB of GPU memory, 4 CPU cores, and at least 30 GB RAM. It deploys 1 pod with 14 workers on this node (called ranker.inference.workers in values.yaml). It does not claim the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator but uses 16 GB of GPU memory.
If your machine has different resources than this, you will need to modify ranker.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.
Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).
Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.
The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.
The current configuration for this service assumes an NVIDIA GPU with 48 GB of GPU memory, 4 CPU cores, and at least 30 GB RAM. It deploys 1 pod on this node (called summary.inference.replicas.desired in values.yaml). It does not claim the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator but uses 24 GB of GPU memory per worker.
If your machine has different resources than this, you will need to modify summary.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.
Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).
As mentioned in the node groups section, node labels are defined in values.yaml and must be applied to appropriate nodes within your cluster. Default node label values include:
eyelevel-cpu-memory
eyelevel-cpu-only
eyelevel-gpu-layout
eyelevel-gpu-ranker
eyelevel-gpu-summary
Multiple node labels can be applied to the same node group, so long as resources are available as described in the total recommended resource and node group resources sections.
However, all node labels must exist on at least 1 node group within your cluster. The label should be applied with a string key named eyelevel_node and an enumerated string value from the list above.
If you use the default labels described in Configure Node Groups, you do not need to do anything else. The helm chart assumes these default values during the deployment.
If you use custom labels, you must update the following values during GroundX deployment:
cache.node
cache.metrics.node
groundx.node
layout.api.node
layout.correct.node
layout.inference.node
layout.map.node
layout.ocr.node
layout.process.node
layout.save.node
layoutWebhook.node
preProcess.node
process.node
queue.node
ranker.api.node
ranker.inference.node
summary.api.node
summary.inference.node
summaryClient.node
upload.nodeSee the values.yaml README.md for more information about these values.
You will also need to update values for any services you deploy as well.
You must have a namespace where the GroundX application can be installed. If you need to create one, you can do so by running the following command:
kubectl create namespace eyelevelThis will create a namespace called eyelevel where GroundX pods will be installed.
The default values.yaml namespace assumes a name of eyelevel. If you choose to use a different namespace name, you will have to update the values.yaml file accordingly.
GroundX requires a PV class for some of the pods. If you have not created one, we have included a chart that will create one. You can run it with the following comand:
helm install groundx-storageclass \
groundx/groundx-storageclass \
-n eyelevelSome of the GroundX pods require access to an NVIDIA GPU. The easiest way to ensure access is to install the NVIDIA GPU Operator, which will ensure the appropriate drivers and libraries are installed on the GPU nodes.
If you'd like to install the NVIDIA GPU Operator to your cluster, use the following commands below:
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install nvidia-gpu-operator \
nvidia/gpu-operator \
-n nvidia-gpu-operator \
--create-namespace \
--atomic \
-f helm/values/nvidia/values.yamlIf you're installing the NVIDIA GPU operator into Microsoft Azure, be sure to set the runtimeClass to nvidia-container-runtime. We have included an example yaml that shows how to do this at helm/values/nvidia/values.aks.yaml.
If you'd like to install the NVIDIA GPU Operator with this AKS-specific yaml, use the following commands below:
helm install nvidia-gpu-operator nvidia/gpu-operator \
-n nvidia-gpu-operator \
--create-namespace \
--atomic \
-f helm/values/nvidia/values.aks.yamlIf you wish to use an existing redis cache, you must configure the cache.existing and cache.metrics.existing parameters in your values.yaml.
If you'd like to install redis to your cluster, instances will be automatically created during the application installation for you so long as cache.existing is an empty dictionary and cache.enabled is true.
If you wish to use an existing MySQL cluster, you must configure the db.existing parameters in your values.yaml.
If you'd like to install MySQL to your cluster, use the following commands below:
helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update
helm install db-operator \
percona/pxc-operator \
-n eyelevel \
-f helm/values/percona/values.operator.yaml
helm install db-cluster \
percona/pxc-db \
-n eyelevel \
-f helm/values/percona/values.cluster.yamlIf you wish to use an existing MinIO cluster, you must configure the file.existing parameters in your values.yaml.
If you wish to use an existing AWS S3 bucket, you must configure the file.existing parameters in your values.yaml and set file.serviceType to s3.
If you'd like to install MinIO to your cluster, use the following commands below:
helm repo add minio-operator https://operator.min.io/
helm repo update
helm install minio-operator \
minio-operator/operator \
-n eyelevel \
-f helm/values/minio/values.operator.yaml
helm install minio-cluster \
minio-operator/tenant \
-n eyelevel \
-f helm/values/minio/values.tenant.yamlIf you wish to use an existing OpenSearch cluster, you must configure the search.existing parameters in your values.yaml.
If you'd like to install OpenSearch to your cluster, use the following commands below:
helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update
helm install opensearch opensearch/opensearch -n eyelevel -f helm/values/opensearch/values.yamlIf you wish to use an existing Kafka cluster, you must configure the stream.existing parameters in your values.yaml.
If you wish to use existing AWS SQS queues, you must configure the stream.existing parameters in your values.yaml and set stream.serviceType to sqs.
If you'd like to install Kafka to your cluster, use the following commands below:
helm install stream-operator \
oci://quay.io/strimzi-helm/strimzi-kafka-operator \
-n eyelevel \
-f helm/values/strimzi/values.yamlOnce the operator is ready, run the following command:
helm install groundx-kafka-cluster \
groundx/groundx-strimzi-kafka-cluster \
-n eyelevelYou must have completed the following steps before attempting to install the GroundX application:
- Configure Node Groups
- Create or select a Namespace
- Create or select a PV Class
- Install the NVIDIA GPU Operator
- Install Services
Instructions on how to configure GroundX On-Prem can by found in the main README.md. A set of example configurations can be found at helm/values.
For a GroundX deployment with default settings:
- Copy
sample.values.yamlto something likevalues.yaml - We minimally suggest updating the following values:
groundxKey # a valid GroundX API key, to be used to look up licensing information
admin.apiKey # admin values are associated with
admin.username # the admin account for your deployment
admin.email
admin.password
cluster.pvClass # an existing storage class
cluster.type # type of Kubernetes clusterNote: admin.apiKey and admin.username must be valid UUIDs. We provide a helper script to generate random UUIDs. You can run it using thefollowing command:
bin/uuidTo install GroundX, add the chart repo to helm by running the following commands:
helm repo add groundx https://registry.groundx.ai/helm
helm repo updateOnce the repo is added, you can install the GroundX application by running the following command:
helm install groundx \
groundx/groundx \
-n eyelevel \
-f values.yamlReplace values.yaml with the path to the values.yaml file you created in the previous Configuration step.
GroundX includes built-in Horizontal Pod Autoscaling (HPA) and an optional custom metrics server.
Autoscaling is workload-aware: pods scale on pipeline throughput plus one additional metric (latency/backlog/throughput) depending on pod type.
HPA can be globally enabled for all supported pods:
cluster:
hpa: trueIf cluster.hpa is false, pods run with fixed replica counts.
You can enable or disable HPA per pod:
groundx:
replicas:
hpa: trueIf HPA is disabled for a pod:
POD.replicas.desiredcontrols the replica count.
If HPA is enabled for a pod:
POD.replicas.min
POD.replicas.maxcontrol the scaling bounds.
Every autoscaled pod scales on two metrics:
-
Pipeline throughput (all pods)
- The system estimates total pipeline throughput for files moving through GroundX.
- Each pod defines the throughput a single replica can support.
- Replicas increase until total pod capacity meets the estimated pipeline throughput.
-
Pod-specific metric (by pod type)
- api: response latency (default target 4s)
- queue: message backlog (default target 10)
- task: Celery task backlog (default target 10)
- inference: model request throughput (scales when requests exceed per-replica capacity)
The custom metrics server exposes:
- Pipeline throughput
- API latency
- Queue backlog
- Task backlog
- Inference request throughput
Enable it with:
metrics:
enabled: trueIf using Prometheus Operator, enable the ServiceMonitor:
metrics:
serviceMonitor:
enabled: trueThis allows Prometheus to automatically scrape the GroundX metrics endpoints.
For detailed monitoring setup, see the dedicated Monitoring README.
To run the system without calling OpenAI or self-hosted LLMs (useful for load testing or autoscaling validation), configure:
engines:
default:
engineId: test-model
extract:
agent:
modelId: test-modelWhen set, pods use simulated LLM responses instead of external model providers.
Once the setup is complete, run:
kubectl -n eyelevel get svc groundxThe API endpoint will be the external IP associated with the GroundX load balancer.
For instance, the "external IP" might resemble the following:
EXTERNAL-IP
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.comThe API endpoint, in conjuction with the admin.api_key defined during deployment, can be used to configure the GroundX SDK to communicate with your On-Prem instance of GroundX.
Note: you must append /api to your API endpoint in the SDK configuration.
from groundx import GroundX
external_ip = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.com'
api_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
client = GroundX(api_key=api_key, base_url=f"http://{external_ip}/api")import { GroundXClient } from "groundx";
const external_ip = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.com'
const groundx = new GroundXClient({
apiKey: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
environment: `http://${external_ip}/api`;,
});The API endpoint, in conjuction with the admin.api_key defined during deployment, can be used to interact with your On-Prem instance of GroundX.
All of the methods and operations described in the GroundX documentation are supported with your On-Prem instance of GroundX. You simply have to substitute https://api.groundx.ai with your API endpoint.
As of November 4, 2025, we have migrated to a pure helm release deployment. The previous hybrid terraform-helm approach is no longer supported.
If you would like to access the legacy terraform scripts, they can be pulled from legacy-terraform-deployment.


