From 77697c97a79cefe6b63f34362035481d26f4df5b Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 15:02:45 -0800
Subject: [PATCH 1/6] updated provider page

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 98 ++++++++++++++++++++
 docs/docs/integrations/providers/nvidia.mdx  | 77 ++++++++++++---
 2 files changed, 160 insertions(+), 15 deletions(-)
 create mode 100644 docs/docs/integrations/llms/nvidia_trt.ipynb

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
new file mode 100644
index 0000000000000..63815354ba971
--- /dev/null
+++ b/docs/docs/integrations/llms/nvidia_trt.ipynb
@@ -0,0 +1,98 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b56b221d",
+   "metadata": {},
+   "source": [
+    "# NVIDIA Triton Inference Server\n",
+    "\n",
+    ">[NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) is an inference server that provides an API style access to hosted LLM models. Likewise, `Nvidia TensorRT-LLM`, often abbreviated as `TRT-LLM`, is a GPU-accelerated SDK for running optimizations and inference on LLM models. This connector allows for Langchain to remotely interact with a Triton inference server over GRPC or HTTP to performance accelerated inference operations.\n",
+    "\n",
+    "[Triton Inference Server Github](https://github.com/triton-inference-server/server)\n",
+    "\n",
+    "\n",
+    "## TritonTensorRTLLM\n",
+    "\n",
+    "This example goes over how to use LangChain to interact with `TritonTensorRT` LLMs. To install, run the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59c710c4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install package\n",
+    "%pip install -U langchain-nvidia-trt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee90032",
+   "metadata": {},
+   "source": [
+    "## Create the Triton+TRT-LLM instance\n",
+    "\n",
+    "Remember that a Triton instance represents a running server instance therefore you should ensure you have a valid server configuration running and change the `localhost:8001` to the correct IP/hostname:port combination for your server.\n",
+    "\n",
+    "An example of setting up this environment can be found at Nvidia's (GenerativeAIExamples Github Repo)[https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RetrievalAugmentedGeneration]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "035dea0f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_core.prompts import PromptTemplate\n",
+    "from langchain_nvidia_trt.llms import TritonTensorRTLLM\n",
+    "\n",
+    "template = \"\"\"Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "prompt = PromptTemplate.from_template(template)\n",
+    "\n",
+    "# Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below\n",
+    "triton_llm = TritonTensorRTLLM(\n",
+    "    server_url=\"localhost:8001\", model_name=\"ensemble\", tokens=500\n",
+    ")\n",
+    "\n",
+    "chain = prompt | triton_llm\n",
+    "\n",
+    "chain.invoke({\"question\": \"What is LangChain?\"})"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/docs/integrations/providers/nvidia.mdx b/docs/docs/integrations/providers/nvidia.mdx
index c00eea6416024..5397e9488e303 100644
--- a/docs/docs/integrations/providers/nvidia.mdx
+++ b/docs/docs/integrations/providers/nvidia.mdx
@@ -1,18 +1,28 @@
 # NVIDIA
 
-> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.
-> 
-> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
+>NVIDIA provides two integration packages for LangChain: `langchain-nvidia-ai-endpoints` and `langchain-nvidia-trt`.
+
+## NVIDIA AI Foundation Endpoints
+
+> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for 
+> NVIDIA AI Foundation Models like `Mixtral 8x7B`, `Llama 2`, `Stable Diffusion`, etc. These models, 
+> hosted on the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), are optimized, tested, and hosted on 
+> the NVIDIA AI platform, making them fast and easy to evaluate, further customize, 
+> and seamlessly run at peak performance on any accelerated stack.
 > 
-> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.
+> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully 
+> accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these 
+> models can be deployed anywhere with enterprise-grade security, stability, 
+> and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
 
-## Installation
+A selection of NVIDIA AI Foundation models is supported directly in LangChain with familiar APIs.
 
-```bash
-pip install -U langchain-nvidia-ai-endpoints
-```
+The supported models can be found [in NGC](https://catalog.ngc.nvidia.com/ai-foundation-models).
 
-## Setup and Authentication
+These models can be accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) 
+package, as shown below.
+
+### Setting up
 
 - Create a free [NVIDIA NGC](https://catalog.ngc.nvidia.com/) account.
 - Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.
@@ -22,6 +32,16 @@ pip install -U langchain-nvidia-ai-endpoints
 export NVIDIA_API_KEY=nvapi-XXXXXXXXXXXXXXXXXXXXXXXXXX
 ```
 
+- Install a package:
+
+```bash
+pip install -U langchain-nvidia-ai-endpoints
+```
+
+### Chat models
+
+See a [usage example](/docs/integrations/chat/nvidia_ai_endpoints).
+
 ```python
 from langchain_nvidia_ai_endpoints import ChatNVIDIA
 
@@ -30,12 +50,39 @@ result = llm.invoke("Write a ballad about LangChain.")
 print(result.content)
 ```
 
-## Using NVIDIA AI Foundation Endpoints
+### Embedding models
+
+See a [usage example](/docs/integrations/text_embedding/nvidia_ai_endpoints).
+
+```python
+from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
+```
+
 
-A selection of NVIDIA AI Foundation models are supported directly in LangChain with familiar APIs.
+## NVIDIA Triton Inference Server
 
-The active models which are supported can be found [in NGC](https://catalog.ngc.nvidia.com/ai-foundation-models).
+>[NVIDIA Triton™ Inference Server](https://developer.nvidia.com/triton-inference-server), 
+> part of the `NVIDIA AI` platform and available with `NVIDIA AI Enterprise`, is 
+> open-source software that standardizes AI model deployment and execution across every workload.
+ 
+### Setting up
 
-**The following may be useful examples to help you get started:**
-- **[`ChatNVIDIA` Model](/docs/integrations/chat/nvidia_ai_endpoints).**
-- **[`NVIDIAEmbeddings` Model for RAG Workflows](/docs/integrations/text_embedding/nvidia_ai_endpoints).**
+See the installation guide for [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver).
+
+See the [client package documentation](https://github.com/triton-inference-server/client).
+
+- Install a package:
+
+```bash
+pip install tritonclient
+pip install -U langchain-nvidia-trt
+```
+
+### LLMs
+
+See a [usage example](/docs/integrations/llms/nvidia_trt).
+
+
+```python
+from langchain_nvidia_trt import TritonTensorRTLLM
+```

From 16cf4130e76d53bf04572e95da4cca4cbf092c2d Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 19:57:31 -0800
Subject: [PATCH 2/6] rolled back nvidia_trt artifacts

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 98 --------------------
 docs/docs/integrations/providers/nvidia.mdx  | 31 +------
 2 files changed, 1 insertion(+), 128 deletions(-)
 delete mode 100644 docs/docs/integrations/llms/nvidia_trt.ipynb

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
deleted file mode 100644
index 63815354ba971..0000000000000
--- a/docs/docs/integrations/llms/nvidia_trt.ipynb
+++ /dev/null
@@ -1,98 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "b56b221d",
-   "metadata": {},
-   "source": [
-    "# NVIDIA Triton Inference Server\n",
-    "\n",
-    ">[NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) is an inference server that provides an API style access to hosted LLM models. Likewise, `Nvidia TensorRT-LLM`, often abbreviated as `TRT-LLM`, is a GPU-accelerated SDK for running optimizations and inference on LLM models. This connector allows for Langchain to remotely interact with a Triton inference server over GRPC or HTTP to performance accelerated inference operations.\n",
-    "\n",
-    "[Triton Inference Server Github](https://github.com/triton-inference-server/server)\n",
-    "\n",
-    "\n",
-    "## TritonTensorRTLLM\n",
-    "\n",
-    "This example goes over how to use LangChain to interact with `TritonTensorRT` LLMs. To install, run the following command:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "59c710c4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# install package\n",
-    "%pip install -U langchain-nvidia-trt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0ee90032",
-   "metadata": {},
-   "source": [
-    "## Create the Triton+TRT-LLM instance\n",
-    "\n",
-    "Remember that a Triton instance represents a running server instance therefore you should ensure you have a valid server configuration running and change the `localhost:8001` to the correct IP/hostname:port combination for your server.\n",
-    "\n",
-    "An example of setting up this environment can be found at Nvidia's (GenerativeAIExamples Github Repo)[https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RetrievalAugmentedGeneration]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "035dea0f",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from langchain_core.prompts import PromptTemplate\n",
-    "from langchain_nvidia_trt.llms import TritonTensorRTLLM\n",
-    "\n",
-    "template = \"\"\"Question: {question}\n",
-    "\n",
-    "Answer: Let's think step by step.\"\"\"\n",
-    "\n",
-    "prompt = PromptTemplate.from_template(template)\n",
-    "\n",
-    "# Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below\n",
-    "triton_llm = TritonTensorRTLLM(\n",
-    "    server_url=\"localhost:8001\", model_name=\"ensemble\", tokens=500\n",
-    ")\n",
-    "\n",
-    "chain = prompt | triton_llm\n",
-    "\n",
-    "chain.invoke({\"question\": \"What is LangChain?\"})"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/docs/docs/integrations/providers/nvidia.mdx b/docs/docs/integrations/providers/nvidia.mdx
index 5397e9488e303..0be21e38f7178 100644
--- a/docs/docs/integrations/providers/nvidia.mdx
+++ b/docs/docs/integrations/providers/nvidia.mdx
@@ -1,6 +1,6 @@
 # NVIDIA
 
->NVIDIA provides two integration packages for LangChain: `langchain-nvidia-ai-endpoints` and `langchain-nvidia-trt`.
+>NVIDIA provides an integration package for LangChain: `langchain-nvidia-ai-endpoints`.
 
 ## NVIDIA AI Foundation Endpoints
 
@@ -57,32 +57,3 @@ See a [usage example](/docs/integrations/text_embedding/nvidia_ai_endpoints).
 ```python
 from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
 ```
-
-
-## NVIDIA Triton Inference Server
-
->[NVIDIA Triton™ Inference Server](https://developer.nvidia.com/triton-inference-server), 
-> part of the `NVIDIA AI` platform and available with `NVIDIA AI Enterprise`, is 
-> open-source software that standardizes AI model deployment and execution across every workload.
- 
-### Setting up
-
-See the installation guide for [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver).
-
-See the [client package documentation](https://github.com/triton-inference-server/client).
-
-- Install a package:
-
-```bash
-pip install tritonclient
-pip install -U langchain-nvidia-trt
-```
-
-### LLMs
-
-See a [usage example](/docs/integrations/llms/nvidia_trt).
-
-
-```python
-from langchain_nvidia_trt import TritonTensorRTLLM
-```

From 996aec827f6cc3925e223b719a8ef3c83dfec475 Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 15:18:01 -0800
Subject: [PATCH 3/6] added the example notebook and link to it

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 96 ++++++++++++++++++++
 1 file changed, 96 insertions(+)
 create mode 100644 docs/docs/integrations/llms/nvidia_trt.ipynb

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
new file mode 100644
index 0000000000000..961f393372b2b
--- /dev/null
+++ b/docs/docs/integrations/llms/nvidia_trt.ipynb
@@ -0,0 +1,96 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "b56b221d",
+   "metadata": {},
+   "source": [
+    "# NVIDIA Triton Inference Server\n",
+    "\n",
+    ">[NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) is an inference server that provides an API style access to hosted LLM models. Likewise, `Nvidia TensorRT-LLM`, often abbreviated as `TRT-LLM`, is a GPU-accelerated SDK for running optimizations and inference on LLM models. This connector allows for Langchain to remotely interact with a Triton inference server over GRPC or HTTP to performance accelerated inference operations.\n",
+    "\n",
+    "[Triton Inference Server Github](https://github.com/triton-inference-server/server)\n",
+    "\n",
+    "\n",
+    "## TritonTensorRTLLM\n",
+    "\n",
+    "This example goes over how to use LangChain to interact with `TritonTensorRT` LLMs. To install, run the following command:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "59c710c4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# install package\n",
+    "%pip install -U langchain-nvidia-trt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0ee90032",
+   "metadata": {},
+   "source": [
+    "## Create the Triton+TRT-LLM instance\n",
+    "\n",
+    "Remember that a Triton instance represents a running server instance therefore you should ensure you have a valid server configuration running and change the `localhost:8001` to the correct IP/hostname:port combination for your server.\n",
+    "\n",
+    "An example of setting up this environment can be found at Nvidia's (GenerativeAIExamples Github Repo)[https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RetrievalAugmentedGeneration]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "035dea0f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from langchain_core.prompts import PromptTemplate\n",
+    "from langchain_nvidia_trt.llms import TritonTensorRTLLM\n",
+    "\n",
+    "template = \"\"\"Question: {question}\n",
+    "\n",
+    "Answer: Let's think step by step.\"\"\"\n",
+    "\n",
+    "prompt = PromptTemplate.from_template(template)\n",
+    "\n",
+    "# Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below\n",
+    "triton_llm = TritonTensorRTLLM(server_url =\"localhost:8001\", model_name=\"ensemble\", tokens=500)\n",
+    "\n",
+    "chain = prompt | triton_llm \n",
+    "\n",
+    "chain.invoke({\"question\": \"What is LangChain?\"})"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From d6e411a7670b83dba0aa653f33c2429e8689f1d3 Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 15:26:57 -0800
Subject: [PATCH 4/6] fixed format

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
index 961f393372b2b..d0943c6a33768 100644
--- a/docs/docs/integrations/llms/nvidia_trt.ipynb
+++ b/docs/docs/integrations/llms/nvidia_trt.ipynb
@@ -59,7 +59,9 @@
     "prompt = PromptTemplate.from_template(template)\n",
     "\n",
     "# Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below\n",
-    "triton_llm = TritonTensorRTLLM(server_url =\"localhost:8001\", model_name=\"ensemble\", tokens=500)\n",
+    "triton_llm = TritonTensorRTLLM(\n",
+    "    server_url=\"localhost:8001\", model_name=\"ensemble\", tokens=500\n",
+    ")\n",
     "\n",
     "chain = prompt | triton_llm \n",
     "\n",

From cecf8ffb5e87aec144694dc413fcd025cbe74eac Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 15:28:15 -0800
Subject: [PATCH 5/6] fixed format

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
index d0943c6a33768..63815354ba971 100644
--- a/docs/docs/integrations/llms/nvidia_trt.ipynb
+++ b/docs/docs/integrations/llms/nvidia_trt.ipynb
@@ -63,7 +63,7 @@
     "    server_url=\"localhost:8001\", model_name=\"ensemble\", tokens=500\n",
     ")\n",
     "\n",
-    "chain = prompt | triton_llm \n",
+    "chain = prompt | triton_llm\n",
     "\n",
     "chain.invoke({\"question\": \"What is LangChain?\"})"
    ]

From 25e44a33b62495e322c0d0e9500a2fe4e6256071 Mon Sep 17 00:00:00 2001
From: leo-gan <leo.gan.57@gmail.com>
Date: Fri, 23 Feb 2024 20:04:21 -0800
Subject: [PATCH 6/6] rolled back nvidia_trt artifacts

---
 docs/docs/integrations/llms/nvidia_trt.ipynb | 98 --------------------
 1 file changed, 98 deletions(-)
 delete mode 100644 docs/docs/integrations/llms/nvidia_trt.ipynb

diff --git a/docs/docs/integrations/llms/nvidia_trt.ipynb b/docs/docs/integrations/llms/nvidia_trt.ipynb
deleted file mode 100644
index 63815354ba971..0000000000000
--- a/docs/docs/integrations/llms/nvidia_trt.ipynb
+++ /dev/null
@@ -1,98 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "b56b221d",
-   "metadata": {},
-   "source": [
-    "# NVIDIA Triton Inference Server\n",
-    "\n",
-    ">[NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) is an inference server that provides an API style access to hosted LLM models. Likewise, `Nvidia TensorRT-LLM`, often abbreviated as `TRT-LLM`, is a GPU-accelerated SDK for running optimizations and inference on LLM models. This connector allows for Langchain to remotely interact with a Triton inference server over GRPC or HTTP to performance accelerated inference operations.\n",
-    "\n",
-    "[Triton Inference Server Github](https://github.com/triton-inference-server/server)\n",
-    "\n",
-    "\n",
-    "## TritonTensorRTLLM\n",
-    "\n",
-    "This example goes over how to use LangChain to interact with `TritonTensorRT` LLMs. To install, run the following command:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "59c710c4",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# install package\n",
-    "%pip install -U langchain-nvidia-trt"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0ee90032",
-   "metadata": {},
-   "source": [
-    "## Create the Triton+TRT-LLM instance\n",
-    "\n",
-    "Remember that a Triton instance represents a running server instance therefore you should ensure you have a valid server configuration running and change the `localhost:8001` to the correct IP/hostname:port combination for your server.\n",
-    "\n",
-    "An example of setting up this environment can be found at Nvidia's (GenerativeAIExamples Github Repo)[https://github.com/NVIDIA/GenerativeAIExamples/tree/main/RetrievalAugmentedGeneration]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "035dea0f",
-   "metadata": {
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from langchain_core.prompts import PromptTemplate\n",
-    "from langchain_nvidia_trt.llms import TritonTensorRTLLM\n",
-    "\n",
-    "template = \"\"\"Question: {question}\n",
-    "\n",
-    "Answer: Let's think step by step.\"\"\"\n",
-    "\n",
-    "prompt = PromptTemplate.from_template(template)\n",
-    "\n",
-    "# Connect to the TRT-LLM Llama-2 model running on the Triton server at the url below\n",
-    "triton_llm = TritonTensorRTLLM(\n",
-    "    server_url=\"localhost:8001\", model_name=\"ensemble\", tokens=500\n",
-    ")\n",
-    "\n",
-    "chain = prompt | triton_llm\n",
-    "\n",
-    "chain.invoke({\"question\": \"What is LangChain?\"})"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.12"
-  },
-  "vscode": {
-   "interpreter": {
-    "hash": "e971737741ff4ec9aff7dc6155a1060a59a8a6d52c757dbbe66bf8ee389494b1"
-   }
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}