From baa747d3cdd2a0e9f8957c27daeda73c6e8612fe Mon Sep 17 00:00:00 2001
From: Jay Rodge <jayrodge@live.com>
Date: Tue, 5 Aug 2025 09:53:24 -0700
Subject: [PATCH 1/4] Add NVIDIA TensorRT-LLM optimization guide for GPT-OSS
 models

---
 articles/run-nvidia.md | 123 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100644 articles/run-nvidia.md

diff --git a/articles/run-nvidia.md b/articles/run-nvidia.md
new file mode 100644
index 0000000000..fd6f278521
--- /dev/null
+++ b/articles/run-nvidia.md
@@ -0,0 +1,123 @@
+# Optimizing OpenAI GPT-OSS Models with NVIDIA TensorRT-LLM
+
+This notebook provides a step-by-step guide on how to optimizing `gpt-oss` models using NVIDIA's TensorRT-LLM for high-performance inference. TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
+
+
+TensorRT-LLM supports both models:
+- `gpt-oss-20b`
+- `gpt-oss-120b`
+
+In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide.
+
+Note: It’s important to ensure that your input prompts follow the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format as the model will not function correctly otherwise, not needed in this guide.
+
+## Prerequisites
+
+### Hardware
+To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 16GB+ of VRAM.
+
+> Recommended GPUs: NVIDIA RTX 50 Series (e.g. RTX 5090), NVIDIA H100, or L40S.
+
+### Software
+- CUDA Toolkit 12.8 or later
+- Python 3.12 or later
+
+## Installling TensorRT-LLM
+
+There are various ways to install TensorRT-LLM, in this guide, we will using pre-built docker container from NVIDIA NGC and build it from source.
+
+## Using NGC
+
+Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.
+This is the easiest way to get started and ensures all dependencies are included.
+
+```bash
+docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev
+docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev
+```
+
+## Using Docker (build from source)
+
+Alternatively, you can build the TensorRT-LLM container from source.
+This is useful if you want to modify the source code or use a custom branch.
+See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker
+
+The following commands will install required dependencies, clone the repository,
+check out the GPT-OSS feature branch, and build the Docker container:
+
+```bash
+#Update package lists and install required system packages
+sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake
+
+# Initialize Git LFS (Large File Storage) for handling large model files
+git lfs install
+
+# Clone the TensorRT-LLM repository
+git clone https://github.com/NVIDIA/TensorRT-LLM.git
+cd TensorRT-LLM
+
+# Check out the branch with GPT-OSS support
+git checkout feat/gpt-oss
+
+# Initialize and update submodules (required for build)
+git submodule update --init --recursive
+
+# Pull large files (e.g., model weights) managed by Git LFS
+git lfs pull
+
+# Build the release Docker image
+make -C docker release_build
+
+# Run the built Docker container
+make -C docker release_run 
+```
+
+TensorRT-LLM will be available through pip soon
+
+> Note on GPU Architecture: The first time you run the model, TensorRT-LLM will build an optimized engine for your specific GPU architecture (e.g., Hopper, Ada, or Blackwell). If you see warnings about your GPU's CUDA capability (e.g., sm_90, sm_120) not being compatible with the PyTorch installation, ensure you have the latest NVIDIA drivers and a matching CUDA Toolkit version for your version of PyTorch.
+
+# Verifying TensorRT-LLM Installation
+
+```python
+from tensorrt_llm import LLM, SamplingParams
+```
+
+# Utilizing TensorRT-LLM Python API
+
+In the next code cell, we will demonstrate how to use the TensorRT-LLM Python API to:
+1. Downloads the specified model weights from Hugging Face 
+2. Automatically build the TensorRT engine for your GPU architecture if it does not already exist.
+3. Load the model and prepare it for inference.
+4. Run a simple text generation example to verify everything is working.
+
+**Note**: The first run may take several minutes as it downloads the model and builds the engine.
+Subsequent runs will be much faster, as the engine will be cached.
+
+```python
+llm = LLM(model="openai/gpt-oss-20b")
+```
+
+```python
+prompts = ["Hello, my name is", "The capital of France is"]
+sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
+for output in llm.generate(prompts, sampling_params):
+    print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")
+```
+
+# Conclusion and Next Steps
+Congratulations! You have successfully optimized and run a large language model using the TensorRT-LLM Python API.
+
+In this notebook, you have learned how to:
+- Set up your environment with the necessary dependencies.
+- Use the `tensorrt_llm.LLM` API to download a model from the Hugging Face Hub.
+- Automatically build a high-performance TensorRT engine tailored to your GPU.
+- Run inference with the optimized model.
+
+
+You can explore more advanced features to further improve performance and efficiency:
+
+- Benchmarking: Try running a [benchmark](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench) to compare the latency and throughput of the TensorRT-LLM engine against the original Hugging Face model. You can do this by iterating over a larger number of prompts and measuring the execution time.
+
+- Quantization: TensorRT-LLM [supports](https://github.com/NVIDIA/TensorRT-Model-Optimizer) various quantization techniques (like INT8 or FP8) to reduce model size and accelerate inference with minimal impact on accuracy. This is a powerful feature for deploying models on resource-constrained hardware.
+
+- Deploy with NVIDIA Dynamo: For production environments, you can deploy your TensorRT-LLM engine using the [NVIDIA Dynamo](https://docs.nvidia.com/dynamo/latest/) for robust, scalable, and multi-model serving. 
\ No newline at end of file

From 1f7a931beebe00e3843edb924629957bb9a91e46 Mon Sep 17 00:00:00 2001
From: Jay Rodge <jayrodge@live.com>
Date: Tue, 5 Aug 2025 10:06:52 -0700
Subject: [PATCH 2/4] Convert NVIDIA TensorRT guide to Jupyter notebook format

---
 articles/run-nvidia.ipynb | 219 ++++++++++++++++++++++++++++++++++++++
 articles/run-nvidia.md    | 123 ---------------------
 authors.yaml              |   5 +
 registry.yaml             |   9 ++
 4 files changed, 233 insertions(+), 123 deletions(-)
 create mode 100644 articles/run-nvidia.ipynb
 delete mode 100644 articles/run-nvidia.md

diff --git a/articles/run-nvidia.ipynb b/articles/run-nvidia.ipynb
new file mode 100644
index 0000000000..47b983b70f
--- /dev/null
+++ b/articles/run-nvidia.ipynb
@@ -0,0 +1,219 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Optimizing OpenAI GPT-OSS Models with NVIDIA TensorRT-LLM"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook provides a step-by-step guide on how to optimizing `gpt-oss` models using NVIDIA's TensorRT-LLM for high-performance inference. TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.\n",
+    "\n",
+    "\n",
+    "TensorRT-LLM supports both models:\n",
+    "- `gpt-oss-20b`\n",
+    "- `gpt-oss-120b`\n",
+    "\n",
+    "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Hardware\n",
+    "To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
+    "\n",
+    "> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n",
+    "\n",
+    "### Software\n",
+    "- CUDA Toolkit 12.8 or later\n",
+    "- Python 3.12 or later\n",
+    "- Access to the Orangina model checkpoint from Hugging Face"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installling TensorRT-LLM"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using NGC\n",
+    "\n",
+    "Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n",
+    "This is the easiest way to get started and ensures all dependencies are included.\n",
+    "\n",
+    "`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
+    "`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
+    "\n",
+    "## Using Docker (build from source)\n",
+    "\n",
+    "Alternatively, you can build the TensorRT-LLM container from source.\n",
+    "This is useful if you want to modify the source code or use a custom branch.\n",
+    "See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n",
+    "\n",
+    "The following commands will install required dependencies, clone the repository,\n",
+    "check out the GPT-OSS feature branch, and build the Docker container:\n",
+    " ```\n",
+    "#Update package lists and install required system packages\n",
+    "sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n",
+    "\n",
+    "# Initialize Git LFS (Large File Storage) for handling large model files\n",
+    "git lfs install\n",
+    "\n",
+    "# Clone the TensorRT-LLM repository\n",
+    "git clone https://github.com/NVIDIA/TensorRT-LLM.git\n",
+    "cd TensorRT-LLM\n",
+    "\n",
+    "# Check out the branch with GPT-OSS support\n",
+    "git checkout feat/gpt-oss\n",
+    "\n",
+    "# Initialize and update submodules (required for build)\n",
+    "git submodule update --init --recursive\n",
+    "\n",
+    "# Pull large files (e.g., model weights) managed by Git LFS\n",
+    "git lfs pull\n",
+    "\n",
+    "# Build the release Docker image\n",
+    "make -C docker release_build\n",
+    "\n",
+    "# Run the built Docker container\n",
+    "make -C docker release_run \n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TensorRT-LLM will be available through pip soon"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> Note on GPU Architecture: The first time you run the model, TensorRT-LLM will build an optimized engine for your specific GPU architecture (e.g., Hopper, Ada, or Blackwell). If you see warnings about your GPU's CUDA capability (e.g., sm_90, sm_120) not being compatible with the PyTorch installation, ensure you have the latest NVIDIA drivers and a matching CUDA Toolkit version for your version of PyTorch."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Verifying TensorRT-LLM Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from tensorrt_llm import LLM, SamplingParams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Utilizing TensorRT-LLM Python API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the next code cell, we will demonstrate how to use the TensorRT-LLM Python API to:\n",
+    "1. Download the specified model weights from Hugging Face (using your HF_TOKEN for authentication).\n",
+    "2. Automatically build the TensorRT engine for your GPU architecture if it does not already exist.\n",
+    "3. Load the model and prepare it for inference.\n",
+    "4. Run a simple text generation example to verify everything is working.\n",
+    "\n",
+    "**Note**: The first run may take several minutes as it downloads the model and builds the engine.\n",
+    "Subsequent runs will be much faster, as the engine will be cached."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "llm = LLM(model=\"openai/gpt-oss-20b\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompts = [\"Hello, my name is\", \"The capital of France is\"]\n",
+    "sampling_params = SamplingParams(temperature=0.8, top_p=0.95)\n",
+    "for output in llm.generate(prompts, sampling_params):\n",
+    "    print(f\"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Conclusion and Next Steps\n",
+    "Congratulations! You have successfully optimized and run a large language model using the TensorRT-LLM Python API.\n",
+    "\n",
+    "In this notebook, you have learned how to:\n",
+    "- Set up your environment with the necessary dependencies.\n",
+    "- Use the `tensorrt_llm.LLM` API to download a model from the Hugging Face Hub.\n",
+    "- Automatically build a high-performance TensorRT engine tailored to your GPU.\n",
+    "- Run inference with the optimized model.\n",
+    "\n",
+    "\n",
+    "You can explore more advanced features to further improve performance and efficiency:\n",
+    "\n",
+    "- Benchmarking: Try running a [benchmark](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench) to compare the latency and throughput of the TensorRT-LLM engine against the original Hugging Face model. You can do this by iterating over a larger number of prompts and measuring the execution time.\n",
+    "\n",
+    "- Quantization: TensorRT-LLM [supports](https://github.com/NVIDIA/TensorRT-Model-Optimizer) various quantization techniques (like INT8 or FP8) to reduce model size and accelerate inference with minimal impact on accuracy. This is a powerful feature for deploying models on resource-constrained hardware.\n",
+    "\n",
+    "- Deploy with NVIDIA Dynamo: For production environments, you can deploy your TensorRT-LLM engine using the [NVIDIA Dynamo](https://docs.nvidia.com/dynamo/latest/) for robust, scalable, and multi-model serving.\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/articles/run-nvidia.md b/articles/run-nvidia.md
deleted file mode 100644
index fd6f278521..0000000000
--- a/articles/run-nvidia.md
+++ /dev/null
@@ -1,123 +0,0 @@
-# Optimizing OpenAI GPT-OSS Models with NVIDIA TensorRT-LLM
-
-This notebook provides a step-by-step guide on how to optimizing `gpt-oss` models using NVIDIA's TensorRT-LLM for high-performance inference. TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
-
-
-TensorRT-LLM supports both models:
-- `gpt-oss-20b`
-- `gpt-oss-120b`
-
-In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide.
-
-Note: It’s important to ensure that your input prompts follow the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format as the model will not function correctly otherwise, not needed in this guide.
-
-## Prerequisites
-
-### Hardware
-To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 16GB+ of VRAM.
-
-> Recommended GPUs: NVIDIA RTX 50 Series (e.g. RTX 5090), NVIDIA H100, or L40S.
-
-### Software
-- CUDA Toolkit 12.8 or later
-- Python 3.12 or later
-
-## Installling TensorRT-LLM
-
-There are various ways to install TensorRT-LLM, in this guide, we will using pre-built docker container from NVIDIA NGC and build it from source.
-
-## Using NGC
-
-Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.
-This is the easiest way to get started and ensures all dependencies are included.
-
-```bash
-docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev
-docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev
-```
-
-## Using Docker (build from source)
-
-Alternatively, you can build the TensorRT-LLM container from source.
-This is useful if you want to modify the source code or use a custom branch.
-See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker
-
-The following commands will install required dependencies, clone the repository,
-check out the GPT-OSS feature branch, and build the Docker container:
-
-```bash
-#Update package lists and install required system packages
-sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake
-
-# Initialize Git LFS (Large File Storage) for handling large model files
-git lfs install
-
-# Clone the TensorRT-LLM repository
-git clone https://github.com/NVIDIA/TensorRT-LLM.git
-cd TensorRT-LLM
-
-# Check out the branch with GPT-OSS support
-git checkout feat/gpt-oss
-
-# Initialize and update submodules (required for build)
-git submodule update --init --recursive
-
-# Pull large files (e.g., model weights) managed by Git LFS
-git lfs pull
-
-# Build the release Docker image
-make -C docker release_build
-
-# Run the built Docker container
-make -C docker release_run 
-```
-
-TensorRT-LLM will be available through pip soon
-
-> Note on GPU Architecture: The first time you run the model, TensorRT-LLM will build an optimized engine for your specific GPU architecture (e.g., Hopper, Ada, or Blackwell). If you see warnings about your GPU's CUDA capability (e.g., sm_90, sm_120) not being compatible with the PyTorch installation, ensure you have the latest NVIDIA drivers and a matching CUDA Toolkit version for your version of PyTorch.
-
-# Verifying TensorRT-LLM Installation
-
-```python
-from tensorrt_llm import LLM, SamplingParams
-```
-
-# Utilizing TensorRT-LLM Python API
-
-In the next code cell, we will demonstrate how to use the TensorRT-LLM Python API to:
-1. Downloads the specified model weights from Hugging Face 
-2. Automatically build the TensorRT engine for your GPU architecture if it does not already exist.
-3. Load the model and prepare it for inference.
-4. Run a simple text generation example to verify everything is working.
-
-**Note**: The first run may take several minutes as it downloads the model and builds the engine.
-Subsequent runs will be much faster, as the engine will be cached.
-
-```python
-llm = LLM(model="openai/gpt-oss-20b")
-```
-
-```python
-prompts = ["Hello, my name is", "The capital of France is"]
-sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
-for output in llm.generate(prompts, sampling_params):
-    print(f"Prompt: {output.prompt!r}, Generated text: {output.outputs[0].text!r}")
-```
-
-# Conclusion and Next Steps
-Congratulations! You have successfully optimized and run a large language model using the TensorRT-LLM Python API.
-
-In this notebook, you have learned how to:
-- Set up your environment with the necessary dependencies.
-- Use the `tensorrt_llm.LLM` API to download a model from the Hugging Face Hub.
-- Automatically build a high-performance TensorRT engine tailored to your GPU.
-- Run inference with the optimized model.
-
-
-You can explore more advanced features to further improve performance and efficiency:
-
-- Benchmarking: Try running a [benchmark](https://nvidia.github.io/TensorRT-LLM/performance/performance-tuning-guide/benchmarking-default-performance.html#benchmarking-with-trtllm-bench) to compare the latency and throughput of the TensorRT-LLM engine against the original Hugging Face model. You can do this by iterating over a larger number of prompts and measuring the execution time.
-
-- Quantization: TensorRT-LLM [supports](https://github.com/NVIDIA/TensorRT-Model-Optimizer) various quantization techniques (like INT8 or FP8) to reduce model size and accelerate inference with minimal impact on accuracy. This is a powerful feature for deploying models on resource-constrained hardware.
-
-- Deploy with NVIDIA Dynamo: For production environments, you can deploy your TensorRT-LLM engine using the [NVIDIA Dynamo](https://docs.nvidia.com/dynamo/latest/) for robust, scalable, and multi-model serving. 
\ No newline at end of file
diff --git a/authors.yaml b/authors.yaml
index d84aa18f87..901d2987dd 100644
--- a/authors.yaml
+++ b/authors.yaml
@@ -2,6 +2,11 @@
 
 # You can optionally customize how your information shows up cookbook.openai.com over here.
 # If your information is not present here, it will be pulled from your GitHub profile.
+jayrodge:
+  name: "Jay Rodge"
+  website: "https://www.linkedin.com/in/jayrodge/"
+  avatar: "https://developer-blogs.nvidia.com/wp-content/uploads/2024/05/Jay-Rodge.png"
+
 rajpathak-openai:
   name: "Raj Pathak"
   website: "https://www.linkedin.com/in/rajpathakopenai/"
diff --git a/registry.yaml b/registry.yaml
index 7e9cf0b1b9..041d9ab6dc 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -4,6 +4,15 @@
 # should build pages for, and indicates metadata such as tags, creation date and
 # authors for each page.
 
+- title: Using NVIDIA TensorRT-LLM to run the 20B model
+  path: examples/articles/run-nvidia.ipynb
+  date: 2025-08-05
+  authors:
+    - jayrodge
+  tags:
+    - nvidia
+    - tensorrt-llm
+
 - title: Temporal Agents with Knowledge Graphs
   path: examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb
   date: 2025-07-22

From c4f665d824dd482ddfafb5becf2df12df0890740 Mon Sep 17 00:00:00 2001
From: Jay Rodge <jayrodge@live.com>
Date: Tue, 5 Aug 2025 10:09:36 -0700
Subject: [PATCH 3/4] Update registry.yaml for NVIDIA notebook

---
 registry.yaml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/registry.yaml b/registry.yaml
index 041d9ab6dc..66f0b84eec 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -10,8 +10,8 @@
   authors:
     - jayrodge
   tags:
-    - nvidia
-    - tensorrt-llm
+    - gpt-oss
+    - open-models
 
 - title: Temporal Agents with Knowledge Graphs
   path: examples/partners/temporal_agents_with_knowledge_graphs/temporal_agents_with_knowledge_graphs.ipynb

From b5b50b8b77228aca79bf61d850a548942d391f62 Mon Sep 17 00:00:00 2001
From: Jay Rodge <jayrodge@live.com>
Date: Tue, 5 Aug 2025 13:48:22 -0700
Subject: [PATCH 4/4] Improve guide formatting and add NVIDIA Brev integration

---
 articles/run-nvidia.ipynb | 74 +++++++++++++++++----------------------
 registry.yaml             |  3 +-
 2 files changed, 33 insertions(+), 44 deletions(-)

diff --git a/articles/run-nvidia.ipynb b/articles/run-nvidia.ipynb
index 47b983b70f..7de45f035f 100644
--- a/articles/run-nvidia.ipynb
+++ b/articles/run-nvidia.ipynb
@@ -18,7 +18,21 @@
     "- `gpt-oss-20b`\n",
     "- `gpt-oss-120b`\n",
     "\n",
-    "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide."
+    "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md) deployment guide.\n",
+    "\n",
+    "Note: Your input prompts should use the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format for the model to work properly, though this guide does not require it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Launch on NVIDIA Brev\n",
+    "You can simplify the environment setup by using [NVIDIA Brev](https://developer.nvidia.com/brev). Click the button below to launch this project on a Brev instance with the necessary dependencies pre-configured.\n",
+    "\n",
+    "Once deployed, click on the \"Open Notebook\" button to get start with this guide\n",
+    "\n",
+    "[![Launch on Brev](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/launchable/deploy?launchableID=env-30i1YjHsRWT109HL6eYxLUeHIwF)"
    ]
   },
   {
@@ -33,69 +47,45 @@
    "metadata": {},
    "source": [
     "### Hardware\n",
-    "To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
+    "To run the gpt-oss-20b model, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
     "\n",
-    "> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n",
+    "Recommended GPUs: NVIDIA Hopper (e.g., H100, H200), NVIDIA Blackwell (e.g., B100, B200), NVIDIA RTX PRO, NVIDIA RTX 50 Series (e.g., RTX 5090).\n",
     "\n",
     "### Software\n",
     "- CUDA Toolkit 12.8 or later\n",
-    "- Python 3.12 or later\n",
-    "- Access to the Orangina model checkpoint from Hugging Face"
+    "- Python 3.12 or later"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Installling TensorRT-LLM"
+    "## Installing TensorRT-LLM\n",
+    "\n",
+    "There are multiple ways to install TensorRT-LLM. In this guide, we'll cover using a pre-built Docker container from NVIDIA NGC as well as building from source.\n",
+    "\n",
+    "If you're using NVIDIA Brev, you can skip this section."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using NGC\n",
+    "## Using NVIDIA NGC\n",
     "\n",
-    "Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n",
+    "Pull the pre-built [TensorRT-LLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for GPT-OSS from [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/).\n",
     "This is the easiest way to get started and ensures all dependencies are included.\n",
     "\n",
-    "`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
-    "`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
+    "```bash\n",
+    "docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
+    "docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
+    "```\n",
     "\n",
-    "## Using Docker (build from source)\n",
+    "## Using Docker (Build from Source)\n",
     "\n",
     "Alternatively, you can build the TensorRT-LLM container from source.\n",
-    "This is useful if you want to modify the source code or use a custom branch.\n",
-    "See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n",
-    "\n",
-    "The following commands will install required dependencies, clone the repository,\n",
-    "check out the GPT-OSS feature branch, and build the Docker container:\n",
-    " ```\n",
-    "#Update package lists and install required system packages\n",
-    "sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n",
-    "\n",
-    "# Initialize Git LFS (Large File Storage) for handling large model files\n",
-    "git lfs install\n",
-    "\n",
-    "# Clone the TensorRT-LLM repository\n",
-    "git clone https://github.com/NVIDIA/TensorRT-LLM.git\n",
-    "cd TensorRT-LLM\n",
-    "\n",
-    "# Check out the branch with GPT-OSS support\n",
-    "git checkout feat/gpt-oss\n",
-    "\n",
-    "# Initialize and update submodules (required for build)\n",
-    "git submodule update --init --recursive\n",
-    "\n",
-    "# Pull large files (e.g., model weights) managed by Git LFS\n",
-    "git lfs pull\n",
-    "\n",
-    "# Build the release Docker image\n",
-    "make -C docker release_build\n",
-    "\n",
-    "# Run the built Docker container\n",
-    "make -C docker release_run \n",
-    "```"
+    "This approach is useful if you want to modify the source code or use a custom branch.\n",
+    "For detailed instructions, see the [official documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker)."
    ]
   },
   {
diff --git a/registry.yaml b/registry.yaml
index d6eb0bc184..e2cbd25578 100644
--- a/registry.yaml
+++ b/registry.yaml
@@ -4,8 +4,7 @@
 # should build pages for, and indicates metadata such as tags, creation date and
 # authors for each page.
 
-
-- title: Using NVIDIA TensorRT-LLM to run the 20B model
+- title: Using NVIDIA TensorRT-LLM to run gpt-oss-20b
   path: examples/articles/run-nvidia.ipynb
   date: 2025-08-05
   authors: