Skip to content

eyelevelai/groundx-on-prem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

319 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GroundX On-Prem/On-Cloud Kubernetes Infrastructure As Code

Table of Contents

What is GroundX On-Prem?

Installing GroundX

Autoscaling & Monitoring

Using GroundX On-Prem

Legacy Terraform Deployment

What is GroundX On-Prem?

With this repository you can deploy GroundX RAG document ingestion and search capabilities to a Kubernetes cluster in a manner that can be isolated from any external dependencies.

GroundX delivers a unique approach to advanced RAG that consists of three interlocking systems:

  1. GroundX Ingest: A state-of-the-art vision model trained on over 1M pages of enterprise documents. It delivers unparalleled document understanding and can be fine-tuned for your unique document sets.
  2. GroundX Store: Secure, encrypted storage for source files, semantic objects, and vectors, ensuring your data is always protected.
  3. GroundX Search: Built on OpenSearch, it combines text and vector search with a fine-tuned re-ranker model for precise, enterprise-grade results.

In head-to-head testing, GroundX significantly outperforms many popular RAG tools (ref1, ref2, ref3), especially with respect to complex documents at scale. GroundX is trusted by organizations like Air France, Dartmouth and Samsung with over 2 billion tokens ingested on our models.

GroundX On-Prem allows you to leverage GroundX within hardened and secure environments. GroundX On-Prem requires no external dependencies when running, meaning it can be used in air-gapped environments. Deployment consists of two key steps:

  1. (Optional) Creation of Infrastructure on AWS via Terraform
  2. Deployment of GroundX onto Kubernetes via Helm

Currently, creation of infrastructure via Terraform is only supported for AWS. However, with sufficient expertise GroundX can be deployed onto any pre-existing Kubernetes cluster.

This repo is in Open Beta. Feedback is appreciated and encouraged. To use the hosted version of GroundX visit EyeLevel.ai. For white glove support in configuring this open source repo in your environment, or to access more performant and closed source versions of this repo, contact us. To learn more about what GroundX is, and what it's useful for, you may be interested in the following resources:

If you're deploying GroundX On-Prem on AWS, you might be interested in this simple video guide for deploying on AWS. To see how well GroundX understands your documents, check out our online testing tool:

GX Ingest
Test your documents for free online

GroundX Ingest Service

The GroundX ingest service expects visually complex documents in a variety of formats. It analyzes those documents with several fine tuned models, converts the documents into a queryable representation which is designed to be understood by LLMs, and stores that information for downstream search.

GroundX Ingest Service

GroundX Search Service

Once documents have been processed via the ingest service they can be queried against via natural language queries. We use a custom configuration of Open Search which has been designed in tandem with the representations generated from the ingest service.

GroundX Search Service

Installing GroundX

Command Line Tool Dependencies

You must have the following command line tools installed:

- `bash` shell (version 4.0 or later recommended. AWS Cloud Shell has insufficient resources.)
- `kubectl` (or `oc`) configured to a namespace you can write to (e.g., `eyelevel`) ([Setup Docs](https://kubernetes.io/docs/tasks/tools/))
- `helm` v3.8+

Cluster Requirements

In order to deploy GroundX On-Prem to your Kubernetes cluster, you must:

  1. Check that you have the required compute resources
  2. Configure or create appropriate node groups and nodes
  3. Create a Namespace or use an existing one
  4. Create a PV Class or use an existing one
  5. Install the NVIDIA GPU Operator if it's not already installed

Background

Node Groups

By default, the GroundX On-Prem pods deploy to nodes using node selector labels and tolerations. Here is an example from one of the k8 yaml configs:

nodeSelector:
  eyelevel_node: "eyelevel-cpu-only"
tolerations:
  - key: "eyelevel_node"
    value: "eyelevel-cpu-only"
    effect: "NoSchedule"

Node labels are defined in the values.yaml and must be applied to appropriate nodes within your cluster. Default node label values are:

eyelevel-cpu-memory
eyelevel-cpu-only
eyelevel-gpu-layout
eyelevel-gpu-ranker
eyelevel-gpu-summary

Required Compute Resources

Chip Architecture

The publicly available GroundX On-Prem Kubernetes pods are all built for x86_64 architecture. Pods built for other architectures, such as arm64, are available upon customer request.

Supported GPUs

The GroundX On-Prem GPU pods are designed to run on NVIDIA GPUs with CUDA 12+. Other GPU types or older driver versions are not supported.

As part of the deployment, the NVIDIA GPU Operator must be installed. We offer terraform scripts to deploy the NVIDIA GPU Operator to your cluster, if you have not already done so.

The NVIDIA GPU operator should update your NVIDIA drivers and other software components needed to provision the GPU, so long as you have supported NVIDIA hardware on the machine.

Total Recommended Resources

The GroundX On-Prem recommended resource requirements are:

eyelevel-cpu-only
    80 GB     disk drive space
    8         CPU cores
    16 GB     RAM

eyelevel-cpu-memory
    20 GB     disk drive space
    4         CPU cores
    16 GB     RAM

eyelevel-gpu-layout
    16 GB     GPU memory
    75 GB     disk drive space
    4         CPU cores
    12 GB     RAM

eyelevel-gpu-ranker
    16 GB     GPU memory
    75 GB     disk drive space
    8         CPU cores
    30 GB     RAM

eyelevel-gpu-summary
    48 GB     GPU memory
    100 GB    disk drive space
    4         CPU cores
    30 GB     RAM
Node Group Resources

The GroundX On-Prem pods are grouped into 5 categories, based on resource requirements, and deploy as described in the node group section.

These pods can be deployed to 5 different dedicated node groups, a single node group, or any combination in between, so long as the minimum resource requirements are met and the appropriate node labels are applied to the nodes.

The resource requirements are as follows:

eyelevel-cpu-only

Pods in this node group have minimal requirements on CPU, RAM, and disk drive space. They can run on virtually any machine with the supported architecture.

The resource requirements for these pods are described in more detail in the Total Recommended Resources section above.

eyelevel-cpu-memory

Pods in this node group have a range of requirements on CPU, RAM, and disk drive space but can typically run on most machines with the supported architecture.

CPU and memory intensive ingestion pipeline pods, such as layout_ocr, layout_save, and pre_process, will deploy to the eyelevel-cpu-memory nodes. The layout_ocr pod includes tesseract, which benefits from access to multiple vCPU cores.

The resource requirements for these pods are described in more detail in the Total Recommended Resources section above.

eyelevel-gpu-layout

Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.

The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.

The current configuration for this service assumes an NVIDIA GPU with 16 GB of GPU memory, 4 CPU cores, and at least 12 GB RAM. It deploys 1 pod with threads on this node (called layout.inference.threads in values.yaml) and claims the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator.

If your machine has different resources than this, you will need to modify layout.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.

Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).

eyelevel-gpu-ranker

Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.

The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.

The current configuration for this service assumes an NVIDIA GPU with 16 GB of GPU memory, 4 CPU cores, and at least 30 GB RAM. It deploys 1 pod with 14 workers on this node (called ranker.inference.workers in values.yaml). It does not claim the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator but uses 16 GB of GPU memory.

If your machine has different resources than this, you will need to modify ranker.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.

Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).

eyelevel-gpu-summary

Pods in this node group have specific requirements on GPU, CPU, RAM, and disk drive space.

The resource requirements for these pods are described detail in more detail in the Total Recommended Resources section above.

The current configuration for this service assumes an NVIDIA GPU with 48 GB of GPU memory, 4 CPU cores, and at least 30 GB RAM. It deploys 1 pod on this node (called summary.inference.replicas.desired in values.yaml). It does not claim the GPU via the nvidia.com/gpu resource provided by the NVIDIA GPU operator but uses 24 GB of GPU memory per worker.

If your machine has different resources than this, you will need to modify summary.inference in your values.yaml using the per pod requirements described above to optimize for your node resources.

Note: in many cloud Kubernetes services, such as EKS, you must use a node image that supports GPUs (e.g. AL2023_x86_64_NVIDIA).

Configure Node Groups

As mentioned in the node groups section, node labels are defined in values.yaml and must be applied to appropriate nodes within your cluster. Default node label values include:

eyelevel-cpu-memory
eyelevel-cpu-only
eyelevel-gpu-layout
eyelevel-gpu-ranker
eyelevel-gpu-summary

Multiple node labels can be applied to the same node group, so long as resources are available as described in the total recommended resource and node group resources sections.

However, all node labels must exist on at least 1 node group within your cluster. The label should be applied with a string key named eyelevel_node and an enumerated string value from the list above.

Applying Custom Node Groups

Default Labels

If you use the default labels described in Configure Node Groups, you do not need to do anything else. The helm chart assumes these default values during the deployment.

Custom Labels

If you use custom labels, you must update the following values during GroundX deployment:

cache.node
cache.metrics.node
groundx.node
layout.api.node
layout.correct.node
layout.inference.node
layout.map.node
layout.ocr.node
layout.process.node
layout.save.node
layoutWebhook.node
preProcess.node
process.node
queue.node
ranker.api.node
ranker.inference.node
summary.api.node
summary.inference.node
summaryClient.node
upload.node

See the values.yaml README.md for more information about these values.

You will also need to update values for any services you deploy as well.

Namespace

You must have a namespace where the GroundX application can be installed. If you need to create one, you can do so by running the following command:

kubectl create namespace eyelevel

This will create a namespace called eyelevel where GroundX pods will be installed.

The default values.yaml namespace assumes a name of eyelevel. If you choose to use a different namespace name, you will have to update the values.yaml file accordingly.

PV Class

GroundX requires a PV class for some of the pods. If you have not created one, we have included a chart that will create one. You can run it with the following comand:

helm install groundx-storageclass \
  groundx/groundx-storageclass \
  -n eyelevel

NVIDIA GPU Operator

Some of the GroundX pods require access to an NVIDIA GPU. The easiest way to ensure access is to install the NVIDIA GPU Operator, which will ensure the appropriate drivers and libraries are installed on the GPU nodes.

If you'd like to install the NVIDIA GPU Operator to your cluster, use the following commands below:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update

helm install nvidia-gpu-operator \
  nvidia/gpu-operator \
  -n nvidia-gpu-operator \
  --create-namespace \
  --atomic \
  -f helm/values/nvidia/values.yaml

Installing in Microsoft Azure

If you're installing the NVIDIA GPU operator into Microsoft Azure, be sure to set the runtimeClass to nvidia-container-runtime. We have included an example yaml that shows how to do this at helm/values/nvidia/values.aks.yaml.

If you'd like to install the NVIDIA GPU Operator with this AKS-specific yaml, use the following commands below:

helm install nvidia-gpu-operator nvidia/gpu-operator \
  -n nvidia-gpu-operator \
  --create-namespace \
  --atomic \
  -f helm/values/nvidia/values.aks.yaml

Installing Services

Redis

Using an Existing Redis Cluster

If you wish to use an existing redis cache, you must configure the cache.existing and cache.metrics.existing parameters in your values.yaml.

Deploying a Dedicated Redis Cluster

If you'd like to install redis to your cluster, instances will be automatically created during the application installation for you so long as cache.existing is an empty dictionary and cache.enabled is true.

MySQL

Using an Existing MySQL Cluster

If you wish to use an existing MySQL cluster, you must configure the db.existing parameters in your values.yaml.

Deploying a Dedicated MySQL Cluster

If you'd like to install MySQL to your cluster, use the following commands below:

helm repo add percona https://percona.github.io/percona-helm-charts/
helm repo update

helm install db-operator \
  percona/pxc-operator \
  -n eyelevel \
  -f helm/values/percona/values.operator.yaml
helm install db-cluster \
  percona/pxc-db \
  -n eyelevel \
  -f helm/values/percona/values.cluster.yaml

MinIO

Using an Existing MinIO Cluster

If you wish to use an existing MinIO cluster, you must configure the file.existing parameters in your values.yaml.

Using AWS S3

If you wish to use an existing AWS S3 bucket, you must configure the file.existing parameters in your values.yaml and set file.serviceType to s3.

Deploying a Dedicated MinIO Cluster

If you'd like to install MinIO to your cluster, use the following commands below:

helm repo add minio-operator https://operator.min.io/
helm repo update

helm install minio-operator \
  minio-operator/operator \
  -n eyelevel \
  -f helm/values/minio/values.operator.yaml
helm install minio-cluster \
  minio-operator/tenant \
  -n eyelevel \
  -f helm/values/minio/values.tenant.yaml

OpenSearch

Using an Existing OpenSearch Cluster

If you wish to use an existing OpenSearch cluster, you must configure the search.existing parameters in your values.yaml.

Deploying a Dedicated OpenSearch Cluster

If you'd like to install OpenSearch to your cluster, use the following commands below:

helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update

helm install opensearch opensearch/opensearch -n eyelevel -f helm/values/opensearch/values.yaml

Kafka

Using an Existing Kafka Cluster

If you wish to use an existing Kafka cluster, you must configure the stream.existing parameters in your values.yaml.

Using AWS SQS

If you wish to use existing AWS SQS queues, you must configure the stream.existing parameters in your values.yaml and set stream.serviceType to sqs.

Deploying a Dedicated Kafka Cluster

If you'd like to install Kafka to your cluster, use the following commands below:

helm install stream-operator \
  oci://quay.io/strimzi-helm/strimzi-kafka-operator \
  -n eyelevel \
  -f helm/values/strimzi/values.yaml

Once the operator is ready, run the following command:

helm install groundx-kafka-cluster \
  groundx/groundx-strimzi-kafka-cluster \
  -n eyelevel

Installing the GroundX Application

Pre-Requisites

You must have completed the following steps before attempting to install the GroundX application:

Configuration

Instructions on how to configure GroundX On-Prem can by found in the main README.md. A set of example configurations can be found at helm/values.

For a GroundX deployment with default settings:

  1. Copy sample.values.yaml to something like values.yaml
  2. We minimally suggest updating the following values:
groundxKey      # a valid GroundX API key, to be used to look up licensing information
admin.apiKey    # admin values are associated with
admin.username  # the admin account for your deployment
admin.email
admin.password
cluster.pvClass # an existing storage class
cluster.type    # type of Kubernetes cluster

Note: admin.apiKey and admin.username must be valid UUIDs. We provide a helper script to generate random UUIDs. You can run it using thefollowing command:

bin/uuid

Helm Installation

To install GroundX, add the chart repo to helm by running the following commands:

helm repo add groundx https://registry.groundx.ai/helm
helm repo update

Once the repo is added, you can install the GroundX application by running the following command:

helm install groundx \
  groundx/groundx \
  -n eyelevel \
  -f values.yaml

Replace values.yaml with the path to the values.yaml file you created in the previous Configuration step.

Autoscaling & Monitoring

GroundX includes built-in Horizontal Pod Autoscaling (HPA) and an optional custom metrics server.

Autoscaling is workload-aware: pods scale on pipeline throughput plus one additional metric (latency/backlog/throughput) depending on pod type.

Enabling Autoscaling (HPA)

HPA can be globally enabled for all supported pods:

cluster:
  hpa: true

If cluster.hpa is false, pods run with fixed replica counts.

Per-Pod HPA Control

You can enable or disable HPA per pod:

groundx:
  replicas:
    hpa: true

If HPA is disabled for a pod:

POD.replicas.desired

controls the replica count.

If HPA is enabled for a pod:

POD.replicas.min
POD.replicas.max

control the scaling bounds.

How Autoscaling Works

Every autoscaled pod scales on two metrics:

  1. Pipeline throughput (all pods)

    • The system estimates total pipeline throughput for files moving through GroundX.
    • Each pod defines the throughput a single replica can support.
    • Replicas increase until total pod capacity meets the estimated pipeline throughput.
  2. Pod-specific metric (by pod type)

    • api: response latency (default target 4s)
    • queue: message backlog (default target 10)
    • task: Celery task backlog (default target 10)
    • inference: model request throughput (scales when requests exceed per-replica capacity)

Enabling the Custom Metrics Server

The custom metrics server exposes:

  • Pipeline throughput
  • API latency
  • Queue backlog
  • Task backlog
  • Inference request throughput

Enable it with:

metrics:
  enabled: true

Prometheus Integration (Optional)

If using Prometheus Operator, enable the ServiceMonitor:

metrics:
  serviceMonitor:
    enabled: true

This allows Prometheus to automatically scrape the GroundX metrics endpoints.

For detailed monitoring setup, see the dedicated Monitoring README.

Using Simulated LLM Responses (Optional)

To run the system without calling OpenAI or self-hosted LLMs (useful for load testing or autoscaling validation), configure:

engines:
  default:
    engineId: test-model

extract:
  agent:
    modelId: test-model

When set, pods use simulated LLM responses instead of external model providers.

Using GroundX On-Prem

Get the API Endpoint

Once the setup is complete, run:

kubectl -n eyelevel get svc groundx

The API endpoint will be the external IP associated with the GroundX load balancer.

For instance, the "external IP" might resemble the following:

EXTERNAL-IP
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.com

Use the SDKs

The API endpoint, in conjuction with the admin.api_key defined during deployment, can be used to configure the GroundX SDK to communicate with your On-Prem instance of GroundX.

Note: you must append /api to your API endpoint in the SDK configuration.

from groundx import GroundX

external_ip = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.com'
api_key="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

client = GroundX(api_key=api_key, base_url=f"http://{external_ip}/api")
import { GroundXClient } from "groundx";

const external_ip = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxxxx.us-east-2.elb.amazonaws.com'

const groundx = new GroundXClient({
  apiKey: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  environment: `http://${external_ip}/api`;,
});

Use the APIs

The API endpoint, in conjuction with the admin.api_key defined during deployment, can be used to interact with your On-Prem instance of GroundX.

All of the methods and operations described in the GroundX documentation are supported with your On-Prem instance of GroundX. You simply have to substitute https://api.groundx.ai with your API endpoint.

Legacy Terraform Deployment

As of November 4, 2025, we have migrated to a pure helm release deployment. The previous hybrid terraform-helm approach is no longer supported.

Accessing Legacy Scripts

If you would like to access the legacy terraform scripts, they can be pulled from legacy-terraform-deployment.

About

A Kubernetes deployable instance of GroundX for document parsing, storage, and search.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors