diff --git a/docs/architecture/Component Architecture/01_deployer.md b/docs/architecture/Component Architecture/01_deployer.md deleted file mode 100644 index ba99eca..0000000 --- a/docs/architecture/Component Architecture/01_deployer.md +++ /dev/null @@ -1,9 +0,0 @@ ---- -sidebar_position: 1 ---- - -# Deployer Architecture - -The first component ofin llm-d's toolbox is the **llm-d deployer**, a Helm chart for deploying llm-d on Kubernetes. - -[llm-d-deployer repository](https://github.com/llm-d/llm-d-deployer) \ No newline at end of file diff --git a/docs/architecture/Components/01_deployer.md b/docs/architecture/Components/01_deployer.md new file mode 100644 index 0000000..b42df5a --- /dev/null +++ b/docs/architecture/Components/01_deployer.md @@ -0,0 +1,9 @@ +--- +sidebar_position: 1 +--- + +# Deployer + +A key component in llm-d's toolbox is the **llm-d deployer**, the Helm chart for deploying llm-d on Kubernetes. + +[llm-d-deployer repository](https://github.com/llm-d/llm-d-deployer) \ No newline at end of file diff --git a/docs/architecture/Component Architecture/02_inf-simulator.md b/docs/architecture/Components/02_inf-simulator.md similarity index 100% rename from docs/architecture/Component Architecture/02_inf-simulator.md rename to docs/architecture/Components/02_inf-simulator.md diff --git a/docs/architecture/Component Architecture/03_inf-scheduler.md b/docs/architecture/Components/03_inf-scheduler.md similarity index 100% rename from docs/architecture/Component Architecture/03_inf-scheduler.md rename to docs/architecture/Components/03_inf-scheduler.md diff --git a/docs/architecture/Component Architecture/04_disagg_prefill-decode.md b/docs/architecture/Components/04_disagg_prefill-decode.md similarity index 100% rename from docs/architecture/Component Architecture/04_disagg_prefill-decode.md rename to docs/architecture/Components/04_disagg_prefill-decode.md diff --git a/docs/architecture/Component Architecture/05_routing-sidecar.md b/docs/architecture/Components/05_routing-sidecar.md similarity index 100% rename from docs/architecture/Component Architecture/05_routing-sidecar.md rename to docs/architecture/Components/05_routing-sidecar.md diff --git a/docs/guide/Installation/prerequisites.md b/docs/guide/Installation/prerequisites.md index 051991a..87be122 100644 --- a/docs/guide/Installation/prerequisites.md +++ b/docs/guide/Installation/prerequisites.md @@ -5,60 +5,6 @@ sidebar_label: Prerequisites # Prerequisites for running the llm-d QuickStart -### Target Platforms - -Since the llm-d-deployer is based on helm charts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them. - -Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository. - -- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md) - -#### Minikube - -This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up. - -> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state. - -Verify you have properly installed the container toolkit with the runtime of your choice. - -```bash -# Podman -podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi -# Docker -sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -``` - -#### OpenShift - -- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested. -- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html). -- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway -- Cluster administrator privileges are required to install the llm-d cluster scoped resources - - -#### Kubernetes - -This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up. - -> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state. - -Verify you have properly installed the container toolkit with the runtime of your choice. - -```bash -# Podman -podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi -# Docker -sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi -``` - -#### OpenShift - -- OpenShift - This quickstart was tested on OpenShift 4.18. Older versions may work but have not been tested. -- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html). -- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway - - -## Software prerequisites -- Client Configuration ## Client Configuration @@ -97,31 +43,38 @@ You can use the installer script that installs all the required dependencies. C ### Required credentials and configuration - [llm-d-deployer GitHub repo – clone here](https://github.com/llm-d/llm-d-deployer.git) -- [ghcr.io Registry – credentials](https://github.com/settings/tokens) You must have a GitHub account and a "classic" personal access token with `read:packages` access to the llm-d-deployer repository. -- [Red Hat Registry – terms & access](https://access.redhat.com/registry/) - [HuggingFace HF_TOKEN](https://huggingface.co/docs/hub/en/security-tokens) with download access for the model you want to use. By default the sample application will use [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct). > ⚠️ Your Hugging Face account must have access to the model you want to use. You may need to visit Hugging Face [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and > accept the usage terms if you have not already done so. -Registry Authentication: The installer looks for an auth file in: +### Target Platforms -```bash -~/.config/containers/auth.json -# or -~/.config/containers/config.json -``` +Since the llm-d-deployer is based on helm cahrts, llm-d can be deployed on a variety of Kubernetes platforms. As more platforms are supported, the installer will be updated to support them. + +Documentation for example cluster setups are provided in the [infra](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra) directory of the llm-d-deployer repository. -If not found, you can create one with the following commands: +- [OpenShift on AWS](https://github.com/llm-d/llm-d-deployer/tree/main/quickstart/infra/openshift-aws.md) -Create with Docker: + +#### Minikube + +This can be run on a minimum ec2 node type [g6e.12xlarge](https://aws.amazon.com/ec2/instance-types/g6e/) (4xL40S 48GB but only 2 are used by default) to infer the model meta-llama/Llama-3.2-3B-Instruct that will get spun up. + +> ⚠️ If your cluster has no available GPUs, the **prefill** and **decode** pods will remain in **Pending** state. + +Verify you have properly installed the container toolkit with the runtime of your choice. ```bash -docker --config ~/.config/containers/ login ghcr.io +# Podman +podman run --rm --security-opt=label=disable --device=nvidia.com/gpu=all ubuntu nvidia-smi +# Docker +sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi ``` -Create with Podman: +#### OpenShift -```bash -podman login ghcr.io --authfile ~/.config/containers/auth.json -``` \ No newline at end of file +- OpenShift - This quickstart was tested on OpenShift 4.17. Older versions may work but have not been tested. +- NVIDIA GPU Operator and NFD Operator - The installation instructions can be found [here](https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/steps-overview.html). +- NO Service Mesh or Istio installation as Istio CRDs will conflict with the gateway +- Cluster administrator privileges are required to install the llm-d cluster scoped resources diff --git a/docs/guide/Installation/quickstart.md b/docs/guide/Installation/quickstart.md index 333b0b4..7acd489 100644 --- a/docs/guide/Installation/quickstart.md +++ b/docs/guide/Installation/quickstart.md @@ -59,7 +59,6 @@ The installer needs to be run from the `llm-d-deployer/quickstart` directory as | Flag | Description | Example | |--------------------------------------|---------------------------------------------------------------|------------------------------------------------------------------| -| `-a`, `--auth-file PATH` | Path to containers auth.json | `./llmd-installer.sh --auth-file ~/.config/containers/auth.json` | | `-z`, `--storage-size SIZE` | Size of storage volume | `./llmd-installer.sh --storage-size 15Gi` | | `-c`, `--storage-class CLASS` | Storage class to use (default: efs-sc) | `./llmd-installer.sh --storage-class ocs-storagecluster-cephfs` | | `-n`, `--namespace NAME` | K8s namespace (default: llm-d) | `./llmd-installer.sh --namespace foo` | @@ -102,6 +101,7 @@ gatewayClassName, and sits in front of your inference pods to handle path-based and metrics. This example validates that the gateway itself is routing your completion requests correctly. You can execute the [`test-request.sh`](https://github.com/llm-d/llm-d-deployer/blob/main/quickstart/test-request.sh) script in the quickstart folder to test on the cluster. + > If you receive an error indicating PodSecurity "restricted" violations when running the smoke-test script, you > need to remove the restrictive PodSecurity labels from the namespace. Once these labels are removed, re-run the > script and it should proceed without PodSecurity errors. @@ -209,8 +209,8 @@ kubectl port-forward -n llm-d-monitoring --address 0.0.0.0 svc/prometheus-grafan Access the UIs at: -- Prometheus: [http://YOUR_IP:9090](#) -- Grafana: [http://YOUR_IP:3000](#) (default credentials: admin/admin) +- Prometheus: \ +- Grafana: \ (default credentials: admin/admin) ##### Option 2: Ingress (Optional) @@ -320,4 +320,4 @@ make a change, simply uninstall and then run the installer again with any change ```bash ./llmd-installer.sh --uninstall -``` +``` \ No newline at end of file