llm-d · jessicachitas · May 20, 2025 · May 20, 2025
diff --git a/docs/architecture/Component Architecture/01_deployer.md b/docs/architecture/Component Architecture/01_deployer.md
diff --git a/docs/architecture/Components/01_deployer.md b/docs/architecture/Components/01_deployer.md
@@ -0,0 +1,9 @@
+---
+sidebar_position: 1
+---
+
+# Deployer
+
+A key component in llm-d's toolbox is the **llm-d deployer**, the Helm chart for deploying llm-d on Kubernetes.
+
+[llm-d-deployer repository](https://github.com/llm-d/llm-d-deployer)
diff --git a/...omponent Architecture/02_inf-simulator.md → ...chitecture/Components/02_inf-simulator.md b/...omponent Architecture/02_inf-simulator.md → ...chitecture/Components/02_inf-simulator.md
diff --git a/...omponent Architecture/03_inf-scheduler.md → ...chitecture/Components/03_inf-scheduler.md b/...omponent Architecture/03_inf-scheduler.md → ...chitecture/Components/03_inf-scheduler.md
diff --git a/... Architecture/04_disagg_prefill-decode.md → ...re/Components/04_disagg_prefill-decode.md b/... Architecture/04_disagg_prefill-decode.md → ...re/Components/04_disagg_prefill-decode.md
diff --git a/...ponent Architecture/05_routing-sidecar.md → ...itecture/Components/05_routing-sidecar.md b/...ponent Architecture/05_routing-sidecar.md → ...itecture/Components/05_routing-sidecar.md
diff --git a/docs/guide/Installation/prerequisites.md b/docs/guide/Installation/prerequisites.md
@@ -5,60 +5,6 @@ sidebar_label: Prerequisites
 
 # Prerequisites for running the llm-d QuickStart
 
-### Target Platforms
-
-Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
-
-Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
-
-- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
-
-#### Minikube
-
-This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
-
-> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
-
-Verify you have properly installed the container toolkit with the runtime of your choice.
-
-```bash
-# Podman
-podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
-# Docker
-sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
-```
-
-#### OpenShift
-
-- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
-- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
-- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
-- Cluster administrator privileges are required to install the llm-d cluster scoped resources
-
-
-#### Kubernetes
-
-This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
-
-> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
-
-Verify you have properly installed the container toolkit with the runtime of your choice.
-
-```bash
-# Podman
-podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
-# Docker
-sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
-```
-
-#### OpenShift
-
-- OpenShift - This quickstart was tested on OpenShift 4.18. Older versions may work but have not been tested.
-- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
-- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
-
-
-## Software prerequisites -- Client Configuration
 
 ## Client Configuration
 
@@ -97,31 +43,38 @@ You can use the installer script that installs all the required dependencies.  C
 ### Required credentials and configuration
 
 - [llm-d-deployer GitHub repo – clone here](https://github.com/llm-d/llm-d-deployer.git)
-- [ghcr.io Registry – credentials](https://github.com/settings/tokens) You must have a GitHub account and a "classic" personal access token with `read:packages` access to the llm-d-deployer repository.
-- [Red Hat Registry – terms & access](https://access.redhat.com/registry/)
 - [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) with download access for the model you want to use.  By default the sample application will use [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).
 
 > ⚠️ Your Hugging Face account must have access to the model you want to use.  You may need to visit Hugging Face [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and
 > accept the usage terms if you have not already done so.
 
-Registry Authentication: The installer looks for an auth file in:
+### Target Platforms
 
-```bash
-~/.config/containers/auth.json
-# or
-~/.config/containers/config.json
-```
+Since the llm-d-deployer is based on helm cahrts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.
+
+Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.
 
-If not found, you can create one with the following commands:
+- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)
 
-Create with Docker:
+
+#### Minikube
+
+This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.
+
+> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.
+
+Verify you have properly installed the container toolkit with the runtime of your choice.
 
 ```bash
-docker --config ~/.config/containers/ login ghcr.io
+# Podman
+podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
+# Docker
+sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
 ```
 
-Create with Podman:
+#### OpenShift
 
-```bash
-podman login ghcr.io --authfile ~/.config/containers/auth.json
-```
+- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
+- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
+- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
+- Cluster administrator privileges are required to install the llm-d cluster scoped resources
diff --git a/docs/guide/Installation/quickstart.md b/docs/guide/Installation/quickstart.md
@@ -59,7 +59,6 @@ The installer needs to be run from the `llm-d-deployer/quickstart` directory as
 
 | Flag                                 | Description                                                   | Example                                                          |
 |--------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------|
-| `-a`, `--auth-file PATH`             | Path to containers auth.json                                  | `./llmd-installer.sh --auth-file ~/.config/containers/auth.json` |
 | `-z`, `--storage-size SIZE`          | Size of storage volume                                        | `./llmd-installer.sh --storage-size 15Gi`                        |
 | `-c`, `--storage-class CLASS`        | Storage class to use (default: efs-sc)                        | `./llmd-installer.sh --storage-class ocs-storagecluster-cephfs`  |
 | `-n`, `--namespace NAME`             | K8s namespace (default: llm-d)                                | `./llmd-installer.sh --namespace foo`                            |
@@ -102,6 +101,7 @@ gatewayClassName, and sits in front of your inference pods to handle path-based
 and metrics. This example validates that the gateway itself is routing your completion requests correctly.
 You can execute the [`test-request.sh`](https://github.com/llm-d/llm-d-deployer/blob/main/quickstart/test-request.sh) script in the quickstart folder to test on the cluster.
 
+
 > If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you
 > need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the
 > script and it should proceed without PodSecurity errors.
@@ -209,8 +209,8 @@ kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-grafan
 
 Access the UIs at:
 
-- Prometheus: [http://YOUR_IP:9090](#)
-- Grafana: [http://YOUR_IP:3000](#) (default credentials: admin/admin)
+- Prometheus: \<http://YOUR_IP:9090\>
+- Grafana: \<http://YOUR_IP:3000\> (default credentials: admin/admin)
 
 ##### Option 2: Ingress (Optional)
 
@@ -320,4 +320,4 @@ make a change, simply uninstall and then run the installer again with any change
 
 ```bash
 ./llmd-installer.sh --uninstall
-```
+```