# AnythingLLM RAG Demo - Deployment

This notebook is an interactive walkthrough to deploy the AnythingLLM RAG demo.

## Prerequisites

Before starting, ensure you have completed the main platform setup:
- ✓ OpenShift GitOps installed
- ✓ RHOAI installed and DataScienceCluster ready
- ✓ GPU nodes deployed
- ✓ At least one model downloaded and deployed

See the [platform-deployment.ipynb](../../platform-deployment.ipynb) notebook for these steps.

### Verify Prerequisites

In [None]:
%%bash
echo "=== GitOps Status ==="
oc get pods -n openshift-gitops | grep -E 'NAME|server'

echo -e "\n=== RHOAI Status ==="
oc get datasciencecluster -A

echo -e "\n=== GPU Nodes ==="
oc get nodes -l nvidia.com/gpu.present=true

echo -e "\n=== Model Storage ==="
oc get pvc -n demo

echo -e "\n=== InferenceServices ==="
oc get inferenceservice -n demo

## 1. Configure Model Selection

Select which model AnythingLLM should use for LLM inference and embeddings.

**Available Models:**

| Model ID | Service Name | Model Display Name | Description |
|----------|--------------|-------------------|-------------|
| `qwen3-vl-8b` | `qwen3-vl-8b-predictor` | `Qwen3-VL-8B-Instruct` | Multimodal vision-language model (recommended) |
| `granite-7b` | `granite-7b-predictor` | `granite-7b-instruct` | IBM's open instruction model |
| `llama-3-8b` | `llama-3-8b-predictor` | `llama-3-8b-instruct` | Meta's Llama 3 |

In [3]:
# Select your model (must match one deployed in prerequisites)
MODEL = "qwen3-vl-8b-fp8"
NAMESPACE = "demo"

# Model display names for AnythingLLM
MODEL_DISPLAY_NAMES = {
    "qwen3-vl-8b": "Qwen3-VL-8B-Instruct",
    "granite-7b": "granite-7b-instruct",
    "llama-3-8b": "llama-3-8b-instruct"
}

# Build service URL
MODEL_DISPLAY_NAME = MODEL_DISPLAY_NAMES.get(MODEL, MODEL)
SERVICE_URL = f"http://{MODEL}-predictor.{NAMESPACE}.svc.cluster.local/v1"

print(f"Selected Model: {MODEL}")
print(f"Display Name: {MODEL_DISPLAY_NAME}")
print(f"Service URL: {SERVICE_URL}")

Selected Model: qwen3-vl-8b-fp8
Display Name: qwen3-vl-8b-fp8
Service URL: http://qwen3-vl-8b-fp8-predictor.demo.svc.cluster.local/v1


## 2. Deploy AnythingLLM

Deploy AnythingLLM via GitOps with the selected model configuration.

In [4]:
%%bash -s "$SERVICE_URL" "$MODEL_DISPLAY_NAME"
# Generate and apply ArgoCD Application with model configuration
cat <<EOF | oc apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: anythingllm
  namespace: openshift-gitops
  labels:
    app.kubernetes.io/name: anythingllm
    app.kubernetes.io/part-of: rhoai-demos
    app.kubernetes.io/component: application
spec:
  project: default
  source:
    repoURL: https://github.com/redhat-ai-americas/rhoai-app-demos.git
    targetRevision: main
    path: apps/3rd-party-apps/anythingllm/helm
    helm:
      values: |
        namespace: anythingllm
        image:
          repository: mintplexlabs/anythingllm
          tag: latest
          pullPolicy: IfNotPresent
        replicas: 1
        service:
          type: ClusterIP
          port: 3001
        resources:
          requests:
            cpu: "500m"
            memory: "1Gi"
          limits:
            cpu: "2"
            memory: "4Gi"
        storage:
          size: 10Gi
        llm:
          provider: "generic-openai"
          baseUrl: "$1"
          model: "$2"
          apiKey: ""
        embedding:
          provider: "generic-openai"
          baseUrl: "$1"
          model: "$2"
          apiKey: ""
        env:
          - name: LLM_PROVIDER
            value: "generic-openai"
          - name: GENERIC_OPEN_AI_BASE_PATH
            value: "$1"
          - name: GENERIC_OPEN_AI_MODEL_PREF
            value: "$2"
          - name: GENERIC_OPEN_AI_MODEL_TOKEN_LIMIT
            value: "8192"
          - name: EMBEDDING_ENGINE
            value: "generic-openai"
          - name: EMBEDDING_BASE_PATH
            value: "$1"
          - name: EMBEDDING_MODEL_PREF
            value: "$2"
          - name: VECTOR_DB
            value: "chroma"
          - name: CHROMA_ENDPOINT
            value: "http://chromadb:8000"
          - name: DISABLE_TELEMETRY
            value: "true"
        chromadb:
          enabled: true
          image:
            repository: chromadb/chroma
            tag: latest
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "2Gi"
          storage:
            size: 20Gi
  destination:
    server: https://kubernetes.default.svc
    namespace: anythingllm
  syncPolicy:
    automated:
      selfHeal: true
      prune: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m
EOF

application.argoproj.io/anythingllm created


### Wait for Deployment (2-3 minutes)

In [None]:
%%bash
# Wait for ArgoCD to sync the application
echo "Waiting for ArgoCD Application to sync..."
sleep 10  # Give ArgoCD a moment to detect the application

# Wait for ArgoCD sync (this may take 2-3 minutes)
oc wait --for=jsonpath='{.status.sync.status}'=Synced \
  application/anythingllm -n openshift-gitops --timeout=300s

# Wait for pods to be ready
echo -e "\nWaiting for AnythingLLM pods to be ready..."
oc wait --for=condition=Ready pod \
  -l app=anythingllm \
  -n anythingllm --timeout=180s

### Get AnythingLLM URL

In [None]:
%%bash
# Get the route URL
ANYTHINGLLM_URL=$(oc get route anythingllm -n anythingllm -o jsonpath='https://{.spec.host}')
echo "AnythingLLM URL: $ANYTHINGLLM_URL"
echo ""
echo "✓ AnythingLLM is ready! Open the URL above in your browser."

## 3. Verify Deployment

In [None]:
%%bash
echo "=== AnythingLLM Pods ==="
oc get pods -n anythingllm

echo -e "\n=== AnythingLLM Route ==="
oc get route anythingllm -n anythingllm

### Test Model Connectivity

In [None]:
%%bash -s "$SERVICE_URL"
# Test that AnythingLLM can reach the model service
echo "Testing connectivity to model service: $1"
oc exec -it deploy/anythingllm -n anythingllm -- \
  curl -s $1/models | head -20

## Next Steps

✓ AnythingLLM is deployed and configured!

Continue to [demo-user-instructions.md](./demo-user-instructions.md) to:
- Set up your first workspace
- Upload documents for RAG
- Test the chat interface

## Cleanup

To remove AnythingLLM when done:

In [None]:
%%bash
# Uncomment to delete AnythingLLM
# oc delete application anythingllm -n openshift-gitops
# oc delete project anythingllm

## Troubleshooting

### AnythingLLM pod not starting

In [None]:
%%bash
# Check pod status and logs
oc get pods -n anythingllm
echo -e "\n=== AnythingLLM Logs ==="
oc logs -l app=anythingllm -n anythingllm --tail=50
echo -e "\n=== ChromaDB Logs ==="
oc logs -l app=chromadb -n anythingllm --tail=50
echo -e "\n=== Pod Description ==="
oc describe pod -l app=anythingllm -n anythingllm

### Model connection error

In [None]:
%%bash -s "$MODEL" "$NAMESPACE"
# Verify the InferenceService is Ready
echo "=== InferenceService Status ==="
oc get inferenceservice $1 -n $2

echo -e "\n=== Test model endpoint directly ==="
oc run curl-test --image=curlimages/curl -it --rm -n $2 -- \
  curl -s http://$1-predictor.$2.svc.cluster.local/v1/models