Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions docs/architecture/Component Architecture/01_deployer.md

This file was deleted.

9 changes: 9 additions & 0 deletions docs/architecture/Components/01_deployer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
sidebar_position: 1
---

# Deployer

A key component in llm-d's toolbox is the **llm-d deployer**, the Helm chart for deploying llm-d on Kubernetes.

[llm-d-deployer repository](https://github.com/llm-d/llm-d-deployer)
91 changes: 22 additions & 69 deletions docs/guide/Installation/prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,60 +5,6 @@ sidebar_label: Prerequisites

# Prerequisites for running the llm-d QuickStart

### Target Platforms

Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.

Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.

- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)

#### Minikube

This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.

> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.

Verify you have properly installed the container toolkit with the runtime of your choice.

```bash
# Podman
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
# Docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
```

#### OpenShift

- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
- Cluster administrator privileges are required to install the llm-d cluster scoped resources


#### Kubernetes

This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.

> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.

Verify you have properly installed the container toolkit with the runtime of your choice.

```bash
# Podman
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
# Docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
```

#### OpenShift

- OpenShift - This quickstart was tested on OpenShift 4.18. Older versions may work but have not been tested.
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway


## Software prerequisites -- Client Configuration

## Client Configuration

Expand Down Expand Up @@ -97,31 +43,38 @@ You can use the installer script that installs all the required dependencies. C
### Required credentials and configuration

- [llm-d-deployer GitHub repo – clone here](https://github.com/llm-d/llm-d-deployer.git)
- [ghcr.io Registry – credentials](https://github.com/settings/tokens) You must have a GitHub account and a "classic" personal access token with `read:packages` access to the llm-d-deployer repository.
- [Red Hat Registry – terms & access](https://access.redhat.com/registry/)
- [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) with download access for the model you want to use. By default the sample application will use [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct).

> ⚠️ Your Hugging Face account must have access to the model you want to use. You may need to visit Hugging Face [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and
> accept the usage terms if you have not already done so.

Registry Authentication: The installer looks for an auth file in:
### Target Platforms

```bash
~/.config/containers/auth.json
# or
~/.config/containers/config.json
```
Since the llm-d-deployer is based on helm cahrts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them.

Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository.

If not found, you can create one with the following commands:
- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md)

Create with Docker:

#### Minikube

This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up.

> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state.

Verify you have properly installed the container toolkit with the runtime of your choice.

```bash
docker --config ~/.config/containers/ login ghcr.io
# Podman
podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi
# Docker
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
```

Create with Podman:
#### OpenShift

```bash
podman login ghcr.io --authfile ~/.config/containers/auth.json
```
- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested.
- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html).
- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway
- Cluster administrator privileges are required to install the llm-d cluster scoped resources
8 changes: 4 additions & 4 deletions docs/guide/Installation/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,6 @@ The installer needs to be run from the `llm-d-deployer/quickstart` directory as

| Flag | Description | Example |
|--------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------|
| `-a`, `--auth-file PATH` | Path to containers auth.json | `./llmd-installer.sh --auth-file ~/.config/containers/auth.json` |
| `-z`, `--storage-size SIZE` | Size of storage volume | `./llmd-installer.sh --storage-size 15Gi` |
| `-c`, `--storage-class CLASS` | Storage class to use (default: efs-sc) | `./llmd-installer.sh --storage-class ocs-storagecluster-cephfs` |
| `-n`, `--namespace NAME` | K8s namespace (default: llm-d) | `./llmd-installer.sh --namespace foo` |
Expand Down Expand Up @@ -102,6 +101,7 @@ gatewayClassName, and sits in front of your inference pods to handle path-based
and metrics. This example validates that the gateway itself is routing your completion requests correctly.
You can execute the [`test-request.sh`](https://github.com/llm-d/llm-d-deployer/blob/main/quickstart/test-request.sh) script in the quickstart folder to test on the cluster.


> If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you
> need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the
> script and it should proceed without PodSecurity errors.
Expand Down Expand Up @@ -209,8 +209,8 @@ kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-grafan

Access the UIs at:

- Prometheus: [http://YOUR_IP:9090](#)
- Grafana: [http://YOUR_IP:3000](#) (default credentials: admin/admin)
- Prometheus: \<http://YOUR_IP:9090\>
- Grafana: \<http://YOUR_IP:3000\> (default credentials: admin/admin)

##### Option 2: Ingress (Optional)

Expand Down Expand Up @@ -320,4 +320,4 @@ make a change, simply uninstall and then run the installer again with any change

```bash
./llmd-installer.sh --uninstall
```
```