diff --git a/articles/run-nvidia.ipynb b/articles/run-nvidia.ipynb index 47b983b70f..7de45f035f 100644 --- a/articles/run-nvidia.ipynb +++ b/articles/run-nvidia.ipynb @@ -18,7 +18,21 @@ "- `gpt-oss-20b`\n", "- `gpt-oss-120b`\n", "\n", - "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide." + "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md) deployment guide.\n", + "\n", + "Note: Your input prompts should use the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format for the model to work properly, though this guide does not require it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Launch on NVIDIA Brev\n", + "You can simplify the environment setup by using [NVIDIA Brev](https://developer.nvidia.com/brev). Click the button below to launch this project on a Brev instance with the necessary dependencies pre-configured.\n", + "\n", + "Once deployed, click on the \"Open Notebook\" button to get start with this guide\n", + "\n", + "[![Launch on Brev](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/launchable/deploy?launchableID=env-30i1YjHsRWT109HL6eYxLUeHIwF)" ] }, { @@ -33,69 +47,45 @@ "metadata": {}, "source": [ "### Hardware\n", - "To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n", + "To run the gpt-oss-20b model, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n", "\n", - "> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n", + "Recommended GPUs: NVIDIA Hopper (e.g., H100, H200), NVIDIA Blackwell (e.g., B100, B200), NVIDIA RTX PRO, NVIDIA RTX 50 Series (e.g., RTX 5090).\n", "\n", "### Software\n", "- CUDA Toolkit 12.8 or later\n", - "- Python 3.12 or later\n", - "- Access to the Orangina model checkpoint from Hugging Face" + "- Python 3.12 or later" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Installling TensorRT-LLM" + "## Installing TensorRT-LLM\n", + "\n", + "There are multiple ways to install TensorRT-LLM. In this guide, we'll cover using a pre-built Docker container from NVIDIA NGC as well as building from source.\n", + "\n", + "If you're using NVIDIA Brev, you can skip this section." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Using NGC\n", + "## Using NVIDIA NGC\n", "\n", - "Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n", + "Pull the pre-built [TensorRT-LLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for GPT-OSS from [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/).\n", "This is the easiest way to get started and ensures all dependencies are included.\n", "\n", - "`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n", - "`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n", + "```bash\n", + "docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n", + "docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n", + "```\n", "\n", - "## Using Docker (build from source)\n", + "## Using Docker (Build from Source)\n", "\n", "Alternatively, you can build the TensorRT-LLM container from source.\n", - "This is useful if you want to modify the source code or use a custom branch.\n", - "See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n", - "\n", - "The following commands will install required dependencies, clone the repository,\n", - "check out the GPT-OSS feature branch, and build the Docker container:\n", - " ```\n", - "#Update package lists and install required system packages\n", - "sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n", - "\n", - "# Initialize Git LFS (Large File Storage) for handling large model files\n", - "git lfs install\n", - "\n", - "# Clone the TensorRT-LLM repository\n", - "git clone https://github.com/NVIDIA/TensorRT-LLM.git\n", - "cd TensorRT-LLM\n", - "\n", - "# Check out the branch with GPT-OSS support\n", - "git checkout feat/gpt-oss\n", - "\n", - "# Initialize and update submodules (required for build)\n", - "git submodule update --init --recursive\n", - "\n", - "# Pull large files (e.g., model weights) managed by Git LFS\n", - "git lfs pull\n", - "\n", - "# Build the release Docker image\n", - "make -C docker release_build\n", - "\n", - "# Run the built Docker container\n", - "make -C docker release_run \n", - "```" + "This approach is useful if you want to modify the source code or use a custom branch.\n", + "For detailed instructions, see the [official documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker)." ] }, { diff --git a/registry.yaml b/registry.yaml index 2bce250b8f..c7db22961b 100644 --- a/registry.yaml +++ b/registry.yaml @@ -4,8 +4,7 @@ # should build pages for, and indicates metadata such as tags, creation date and # authors for each page. - -- title: Using NVIDIA TensorRT-LLM to run the 20B model +- title: Using NVIDIA TensorRT-LLM to run gpt-oss-20b path: articles/run-nvidia.ipynb date: 2025-08-05 authors: