diff --git a/nvidia-nemo-oke/images/access-cluster.png b/nvidia-nemo-oke/images/access-cluster.png
new file mode 100644
index 0000000..5906979
Binary files /dev/null and b/nvidia-nemo-oke/images/access-cluster.png differ
diff --git a/nvidia-nemo-oke/images/create-oke-cluster.png b/nvidia-nemo-oke/images/create-oke-cluster.png
new file mode 100644
index 0000000..17fe20d
Binary files /dev/null and b/nvidia-nemo-oke/images/create-oke-cluster.png differ
diff --git a/nvidia-nemo-oke/images/generate-ngc-api-key.png b/nvidia-nemo-oke/images/generate-ngc-api-key.png
new file mode 100644
index 0000000..9577c36
Binary files /dev/null and b/nvidia-nemo-oke/images/generate-ngc-api-key.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-db-connection-setup.png b/nvidia-nemo-oke/images/jupyter-db-connection-setup.png
new file mode 100644
index 0000000..6241f95
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-db-connection-setup.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-db-connection-success.png b/nvidia-nemo-oke/images/jupyter-db-connection-success.png
new file mode 100644
index 0000000..69443c7
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-db-connection-success.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-input-api-key.png b/nvidia-nemo-oke/images/jupyter-input-api-key.png
new file mode 100644
index 0000000..18f7908
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-input-api-key.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-install-oracledb.png b/nvidia-nemo-oke/images/jupyter-install-oracledb.png
new file mode 100644
index 0000000..6c0fff9
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-install-oracledb.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-run-all-cells.png b/nvidia-nemo-oke/images/jupyter-run-all-cells.png
new file mode 100644
index 0000000..97ba200
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-run-all-cells.png differ
diff --git a/nvidia-nemo-oke/images/jupyter-upload-files.png b/nvidia-nemo-oke/images/jupyter-upload-files.png
new file mode 100644
index 0000000..d4152d9
Binary files /dev/null and b/nvidia-nemo-oke/images/jupyter-upload-files.png differ
diff --git a/nvidia-nemo-oke/readme.md b/nvidia-nemo-oke/readme.md
new file mode 100644
index 0000000..74bde7d
--- /dev/null
+++ b/nvidia-nemo-oke/readme.md
@@ -0,0 +1,539 @@
+# Deploy NVIDIA NeMo microservices on Oracle Kubernetes Engine (OKE)
+
+**Summary:** The following tutorial will take you through the requisite steps for deploying and configuring [NVIDIA NeMo Microservices](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) on OCI. The deployment will use OKE (managed Kubernetes) and will utilize Oracle Database 23ai for both structured data and vector data store.
+
+Requirements
+
+* An [NVIDIA NGC account](https://org.ngc.nvidia.com/setup/personal-keys) where you can provision an API key.
+* An Oracle Cloud Infrastructure (OCI) paid account with access to GPU shapes. NVIDIA A10 will be sufficient.
+* General understanding of Python and Jupyter Notebooks
+
+## Task 1: Collect and configure prerequisites
+
+1. Generate an NGC API Key via the NVIDIA portal.
+
+ 
+
+2. Log into your [Oracle Cloud](https://cloud.oracle.com) account.
+
+3. Using the menu in the top left corner, navigate to **`Developer Services`** -> **`Kubernetes Clusters (OKE)`**
+
+4. Click **`[Create cluster]`** and choose the **Quick create** option. Click **`[Submit]`**
+
+ 
+
+5. Provide the following confniguration details for your cluster:
+
+ * Name
+ * Kubernetes Endpoint: Public endpoint
+ * Node type: Managed
+ * Kubernetes worker nodes: Private workers
+ * Shape: VM. Standard.E3.Flex (or E4 | E5, depending on your available capacity)
+ * Select the number of OCPUs: 2 or more
+ * Node count: 1
+
+ >Note: After the cluster is online, we'll provision a second node pool with GPU shapes. The *E#* flex shapes will be used for cluster operations and the Oracle Database 23ai deployment.
+
+6. Click **`[Next]`**, validate the settings, then click **`[Create cluster]`**.
+
+ >Note: The cluster creation process will take around 15 minutes.
+
+7. Once the cluster is **Active** click the cluster name to view details. Use the navigation menu in the left pane to locate, then click **Node pools**
+
+8. You should see **pool1** that was automatically provisioned with the cluster. Click **`[Add node pool]`**.
+
+9. Provide the following configuration parameters:
+
+ * Name
+ * Node Placement Configuration:
+ * Availability domain: select at least 1
+ * Worker node subnet: select the *node* subnet
+ * Node shape: An NVIDIA GPU shape. VM.GPU.A10.1 will work.
+ * Node count: 3
+ * Click **Specify a custom boot volume size and change the value to 250.
+ * Click the very last **Show advanced options**, found just above the **`[Add]`** button. Under **Initialization script** choose **Paste Cloud-Init Script and enter the following:
+
+ ```bash
+ #!/bin/bash
+ curl --fail -H "Authorization: Bearer Oracle" -L0 http://169.254.169.254/opc/v2/instance/metadata/oke_init_script | base64 --decode >/var/run/oke-init.sh
+ bash /var/run/oke-init.sh
+ bash /usr/libexec/oci-growfs -y
+ systemctl restart kubelet.service
+ ```
+
+ >Note: This deployment requires 3 GPUs to function properly. You can either deploy 3 separate single-GPU nodes, or a single node with 4+ GPUs.
+
+10. Click **`[Add]`** to create the new node pool.
+
+11. While that is creating, return to the **Cluster details** page and click the **`[Access Cluster]`** at the top of the page.
+
+12. In the dialog that opens, click the button to **`[Launch Cloud Shell]`**, then copy the command found in step 2. When Cloud Shell becomes available, paste and run the command.
+
+ 
+
+13. The command you just executed will create your Kube config file. To test it, run the following:
+
+ ```bash
+ kubectl cluster-info
+ kubectl get nodes -o wide
+ ```
+
+ >Note: The GPU nodes may still be provisioning and might not show up just yet. The node name is its private IP address.
+
+14. Finally, on the Cluster details page, locate the **Add-ons** link and click it. Click **`[Manage add-ons]`** and enable the following:
+
+ * Certificate Manager
+ * Databaes Operator
+ * Metrics Server
+
+ >Note: Enable them on at a time by clicking the box, checking the **Enable** option, and saving the changes.
+
+
+## Task 2: Install JupyterHub
+
+1. Return to Cloud Shell. Create a new file called **jh-values.yaml** and paste the following:
+
+ ```
+ # default configuration
+ singleuser:
+ cloudMetadata:
+ blockWithIptables: false
+ # optional – if you want to spawn GPU-based user notebooks, remove the comment character from the following lines.
+ #profileList:
+ # - display_name: "GPU Server"
+ # description: "Spawns a notebook server with access to a GPU"
+ # kubespawner_override:
+ # extra_resource_limits:
+ # nvidia.com/gpu: "1"
+ ```
+
+ >Note: In this tutorial we use Jupyter notebooks to interact with the GPU-driven NVIDIA microservices. You will not need to enable GPU-based user notebooks to complete the tasks herein.
+
+2. Add the Helm repo.
+
+ ```bash
+ helm repo add jupyterhub https://hub.jupyter.org/helm-chart/ && helm repo update
+ ```
+
+3. Perform the install using Helm, and reference the values file created in step 1.
+
+ ```bash
+ helm upgrade --cleanup-on-fail –install jupyter-hub jupyterhub/jupyterhub --namespace k8s-jupyter --create-namespace --values jh-values.yaml
+ ```
+
+4. Once the deployment is complete, the Kubernetes service that gets created will provision an OCI Load Balancer for public access. Locate the public IP address of the load balancer and store it for later.
+
+ ```bash
+ kubectl get svc -n k8s-jupyter
+ ```
+
+ Output:
+ ```bash
+ NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
+ k8s-jupyter proxy-public LoadBalancer 10.96.177.9 129.213.1.77 80:30141/TCP
+ ```
+
+5. When you access the JupyterHub UI for the first time, you will be prompted for a username and password. Specify values of your choosing but make sure you safe them for future use. After logging in, you'll need to click the button to start the server. The startup process will take 5-7 minutes.
+
+## Task 3: Deploy the Oracle Database 23ai pod
+
+1. Before creating the database, you'll need to create role-based access control (RBAC) for the node. Create a file called **node-rbac.yaml** and paste the following:
+
+ ```
+ ---
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRole
+ metadata:
+ name: oracle-database-operator-manager-role-node
+ rules:
+ - apiGroups:
+ - ""
+ resources:
+ - nodes
+ verbs:
+ - list
+ - watch
+ ---
+ apiVersion: rbac.authorization.k8s.io/v1
+ kind: ClusterRoleBinding
+ metadata:
+ name: oracle-database-operator-manager-role-node-cluster-role-binding
+ roleRef:
+ apiGroup: rbac.authorization.k8s.io
+ kind: ClusterRole
+ name: oracle-database-operator-manager-role-node
+ subjects:
+ - kind: ServiceAccount
+ name: default
+ namespace: oracle-database-operator-system
+ ---
+ ```
+
+2. Create a file called **db-admin-secret.yaml** that will be used to set the DB password upon deployment. Paste the follwing:
+
+ ```
+ apiVersion: v1
+ kind: Secret
+ metadata:
+ name: freedb-admin-secret
+ namespace: oracle23ai
+ type: Opaque
+ stringData:
+ oracle_pwd: YOURPASSWORDHERE
+ ```
+
+ >Note: Be sure to replace the **YOURPASSWORDHERE** above with a value of your own choosing. At least 15 characters, 2 upper case, 2 lower case, 2 numbers, and 2 special characters.
+
+3. Create a file called **db23ai-instance.yaml** and paste the following:
+
+ ```
+ apiVersion: database.oracle.com/v1alpha1
+ kind: SingleInstanceDatabase
+ metadata:
+ name: nemo-23ai
+ namespace: oracle23ai
+ spec:
+ sid: FREE
+ edition: free
+ adminPassword:
+ secretName: freedb-admin-secret
+
+ image:
+ pullFrom: container-registry.oracle.com/database/free:latest
+ prebuiltDB: true
+
+ persistence:
+ size: 50Gi
+ storageClass: "oci-bv"
+ accessMode: "ReadWriteOnce"
+
+ replicas: 1
+ ---
+ ```
+
+4. Apply the manifests using the following command; this creates the RBAC, the password, and the DB pod.
+
+ ```bash
+ kubectl apply -n oracle23ai -f node-rbac.yaml,db-admin-secret.yaml,db23ai-instance.yaml
+ ```
+
+5. After the command completes, it may take 3-5 minutes for the DB instance to come online. You can check the status with the following command. Do not proceed until the status is **Healthy**
+
+ ```bash
+ kubectl get singleinstancedatabase -n oracle23ai
+ ```
+
+ Output:
+ ```bash
+ kubectl get singleinstancedatabase -n oracle23ai
+ NAME EDITION STATUS ROLE VERSION CONNECT STR TCPS CONNECT STR OEM EXPRESS URL
+ nemo-23ai Free Healthy PRIMARY 23.4.0.24.05 10.0.10.246:31452/FREE Unavailable Unavailable
+ ```
+
+ >Note: Be sure to write down the connection string for later. You'll need the IP address and port number.
+
+6. Run the following command to gather details about the DB instance and set them to environment variables.
+
+ ```bash
+ export ORA_PASS=$(kubectl get secret/freedb-admin-secret -n oracle23ai -o jsonpath='{.data.oracle_pwd}' | base64 -d)
+ export ORACLE_SID=$(kubectl get singleinstancedatabase -n oracle23ai -o 'jsonpath={.items[0].metadata.name}')
+ export ORA_POD=$(kubectl get pods -n oracle23ai -o jsonpath='{.items[0].metadata.name}')
+ export ORA_CONN=$(kubectl get singleinstancedatabase ${ORACLE_SID} -n oracle23ai -o "jsonpath={.status.connectString}")
+ ```
+
+ >Note: If you leave Cloud Shell and return later, you'll need to run the above commands again if you wish to connect to the DB instance directly. That said, after this section, all DB access should be done via Jupyter Notebooks.
+
+7. Connect to the DB instance.
+
+ ```bash
+ kubectl exec -it pods/${ORA_POD} -n oracle23ai -- sqlplus sys/${ORA_PASS}@${ORACLE_SID} as sysdba
+ ```
+
+8. Create a vector DB user that will enable your Python code to access the vector data store.
+
+ ```bash
+ create user c##vector identified by ;
+ grant create session, db_developer_role, unlimited tablespace to c##vector container=ALL;
+ ```
+
+ >Note: You will need to run these commands one at at a time. **Don't forget** to specify your own password in the first command. ** making sure to remove the <> brackets.
+
+9. Type *exit* to leave the container.
+
+## Task 4: Prepare the NeMo deployment
+
+1. Now to prep for the NeMo deployment. Create a new Kubernetes namespace.
+
+ ```bash
+ kubectl create ns embedding-nim
+ ```
+
+2. Add your NGC API Key to an environment variable.
+
+ ```
+ export NGC_API_KEY=
+ ```
+
+ >Note: Paste your own API key in place of ``; remove the <> brackets and encapsulate within double quotes "".
+
+3. Confirm that your key gets you access to the NVCR container registry:
+
+ ```
+ echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
+ ```
+
+ You should get Login Succeeded:
+
+ ```bash
+ echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
+ WARNING! Your password will be stored unencrypted in /home/username/.docker/config.json.
+ Configure a credential helper to remove this warning. See
+ https://docs.docker.com/engine/reference/commandline/login/#credentials-store
+
+ Login Succeeded
+ ```
+
+ >Note: If you do not see the Login Succeeded message, you'll need to troubleshoot your API key on the NVIDIA website.
+
+4. Create a docker-registry secret in Kubernetes. The kubelet will use this secret to download the container images needed to run pods.
+
+ ```
+ kubectl -n embedding-nim create secret docker-registry registry-secret --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NGC_API_KEY
+ ```
+
+5. Create a secret for your NGC API KEY that will be passed to your pod via environment variable later.
+
+ ```
+ kubectl -n embedding-nim create secret generic ngc-api-key --from-literal=ngc-api-key=”$NGC_API_KEY”
+ ```
+
+6. You can check the value with the following command.
+
+ ```
+ kubectl -n embedding-nim get secret/ngc-api-key -o jsonpath='{.data.ngc-api-key}' | base64 -d
+ ```
+
+7. Next, you’ll create three separate files to deploy the NeMo retriever microservices.
+
+ a. **llama3-8b-instruct.yaml**
+
+ ```
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: nim-llama3-8b-instruct
+ labels:
+ name: nim-llama3-8b-instruct
+
+ spec:
+ containers:
+ - name: nim-llama3-8b-instruct
+ image: nvcr.io/nim/meta/llama3-8b-instruct:latest
+ securityContext:
+ privileged: true
+ env:
+ - name: NGC_API_KEY
+ valueFrom:
+ secretKeyRef:
+ name: ngc-api-key
+ key: ngc-api-key
+ resources:
+ limits:
+ nvidia.com/gpu: 1
+ imagePullPolicy: Always
+
+ hostNetwork: true
+
+ imagePullSecrets:
+ - name: registry-secret
+
+ ```
+
+ b. **nv-embedqa-e5-v5.yaml**
+
+ ```
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: nim-nv-embedqa-e5-v5
+ labels:
+ name: nim-nv-embedqa-e5-v5
+
+ spec:
+ containers:
+ - name: nim-nv-embedqa-e5-v5
+ image: nvcr.io/nim/nvidia/nv-embedqa-e5-v5:1.0.1
+ securityContext:
+ privileged: true
+ env:
+ - name: NGC_API_KEY
+ valueFrom:
+ secretKeyRef:
+ name: ngc-api-key
+ key: ngc-api-key
+ resources:
+ limits:
+ nvidia.com/gpu: 1
+ imagePullPolicy: Always
+
+ hostNetwork: true
+
+ imagePullSecrets:
+ - name: registry-secret
+
+ ```
+
+ c. **nv-rerankqa-mistral-4b-v3.yaml**
+
+ ```
+
+ apiVersion: v1
+ kind: Pod
+ metadata:
+ name: nim-nv-ererankqa-mistral-4b-v3
+ labels:
+ name: nim-nv-ererankqa-mistral-4b-v3
+
+ spec:
+ containers:
+ - name: nim-nv-ererankqa-mistral-4b-v3
+ image: nvcr.io/nim/nvidia/nv-rerankqa-mistral-4b-v3:1.0.1
+ securityContext:
+ privileged: true
+ env:
+ - name: NGC_API_KEY
+ valueFrom:
+ secretKeyRef:
+ name: ngc-api-key
+ key: ngc-api-key
+ resources:
+ limits:
+ nvidia.com/gpu: 1
+ imagePullPolicy: Always
+
+ hostNetwork: true
+
+ imagePullSecrets:
+ - name: registry-secret
+
+ ```
+
+8. Apply the 3 manifest files to your Kubernetes cluster.
+
+ ```bash
+ kubectl -n embedding-nim apply -f llama3-8b-instruct.yaml,nv-embedqa-e5-v5.yaml,nv-rerankqa-mistral-4b-v3.yaml
+ ```
+
+9. View the pods to ensure they are all running.
+
+ ```bash
+ kubectl -n embedding-nim get pods -o wide
+ ```
+
+ Output:
+
+ ```bash
+ NAME READY STATUS RESTARTS AGE IP NODE
+ nim-llama3-8b-instruct 1/1 Running 0 3m 10.0.10.7 10.0.10.7
+ nim-nv-embedqa-e5-v5 1/1 Running 0 3m 10.0.10.11 10.0.10.11
+ nim-nv-rerankqa-mistral-4b-v3 1/1 Running 0 3m 10.0.10.18 10.0.10.18
+ ```
+
+
+10. Now that everything is up and running, you can return to your JupyterHub web page and launch a new notebook. If you need to double-check the IP address of your JupyterHub instance, run the following command:
+
+ ```
+ kubectl get svc -n k8s-jupyter
+ ```
+
+ >Note: make sure to access via HTTP and not HTTPS as we did not configure TLS in this exercise.
+
+11. Within the notebook, install the oracledb libraries:
+
+ ```
+ pip install oracledb
+ ```
+
+ 
+
+12. Test connectivity to the Oracle Database. Add a second entry into the notebook and paste the following:
+
+ ```
+ import oracledb
+ #from dotenv import load_dotenv
+ import os
+ # Load environment variables
+ #load_dotenv()
+ username = "c##vector"
+ password = ""
+ host=""
+ port=""
+ service_name="FREE"
+ dsn=host+":"+port+"/"+service_name
+ #dsn = "10.0.0.124:1521/DB0601_nvz_fra.subnet04021055.vcn04021055.oraclevcn.com"
+ #COMPARTMENT_OCID = "ocid1.compartment.oc1..aaaaaaaamveskuaejui5qx3ohucymgnbnfidzh5kqw4ued5uv5rhi3mif4ta"
+ print("The database user name is:", username)
+ print("Database connection information is:", dsn)
+ # Connect to the database
+ try:
+ conn23c = oracledb.connect(user=username, password=password, dsn=dsn)
+ print("Connection successful!")
+ except oracledb.DatabaseError as e:
+ error, = e.args
+ print(f"Connection failed. Error code: {error.code}")
+ print(f"Error message: {error.message}")
+ ```
+
+ >Note: Be sure to enter the password you created, along with the pod IP and port number from task 3 step 5.
+
+13. Run the notebook task:
+
+ ```
+ The database user name is: c##vector
+ Database connection information is: 10.0.10.246:31452/FREE
+ Connection successful!
+
+## Task 5: Working with sample reranking and embedding notebooks
+
+1. Locate the two sample notebooks: [reranking](sample-notebooks/reranking_23ai_clean.ipynb) and [text embedding](sample-notebooks/text_embedding_23ai_clean.ipynb in the **`sample-notebooks`** directory. Download to your computer.
+
+2. Upload the notebooks to JupyterHub.
+
+ 
+
+3. Start with the **text embedding** notebook. In the second cell you'll need to paste your NGC API Key. Run cells 1 and 2 to import libraries and validate your NGC API key.
+
+ 
+
+4. Next, locate the 10th cell where you'll need to input the Database connection information you gathered / tested earlier.
+
+ 
+
+5. After updating the fields, run this cell to confirm DB connectivity. A successful connection should look like this:
+
+ 
+
+6. Now head back up to the top of the notebook and run all cells.
+
+ 
+
+7. Scrolling through you should see several different questions. Note cell 9 which asks about the NVIDIA H200. At this time, the LLM has no data on this product. Cell 15 performs text embedding on the product page for the H200. You'll then see in Cell 27 after the entire notebook completes, that RAG is able to provide an answer to the question about the H200.
+
+8. Moving to the **Reranking** notebook, be sure to repeat steps 3-5 above. Then run the entire notebook. Reranking will orient the data in a more efficient manner, yielding faster and more accurate results with a lower degree of hullicinations.
+
+9. And now, you've completed the tutorial on deploying NVIDIA NeMo microservices to Oracle Kubernetes Engine (OKE). If you'd like to experiment, you can upload different PDF files to see how embedding and reranking will function with additional data.
+
+
+## Acknowledgements
+
+* **Author(s)** - Eli Schilling - Technical Architect, Sadra Fardhosseini - Data Scientist
+* **Contributors** -
+* **Last Updated By/Date** - October, 2024
+
+
+
+
+
+
diff --git a/nvidia-nemo-oke/sample-notebooks/example_pdf.pdf b/nvidia-nemo-oke/sample-notebooks/example_pdf.pdf
new file mode 100644
index 0000000..d2ec040
Binary files /dev/null and b/nvidia-nemo-oke/sample-notebooks/example_pdf.pdf differ
diff --git a/nvidia-nemo-oke/sample-notebooks/reranking_23ai_clean.ipynb b/nvidia-nemo-oke/sample-notebooks/reranking_23ai_clean.ipynb
new file mode 100644
index 0000000..fa15027
--- /dev/null
+++ b/nvidia-nemo-oke/sample-notebooks/reranking_23ai_clean.ipynb
@@ -0,0 +1,411 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "a6b2ea73-7421-48ea-9903-0f8807f7668b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#importing the needed packages\n",
+ "import os\n",
+ "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
+ "from langchain_core.prompts import ChatPromptTemplate\n",
+ "from langchain_core.output_parsers import StrOutputParser\n",
+ "import oracledb\n",
+ "from langchain_community.document_loaders import PyPDFLoader\n",
+ "from langchain.chains import RetrievalQA\n",
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+ "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
+ "from langchain_community.vectorstores.oraclevs import OracleVS\n",
+ "from langchain_community.vectorstores.utils import DistanceStrategy\n",
+ "from langchain.retrievers import ContextualCompressionRetriever\n",
+ "from langchain_nvidia_ai_endpoints import NVIDIARerank\n",
+ "from langchain_nvidia_ai_endpoints import NVIDIARerank"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "cb0df8da-1f3a-4b7b-92f5-bd3ba0345996",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Set the NVIDIA API key as an environment variable\n",
+ "os.environ[\"NVIDIA_API_KEY\"] = \"\" \n",
+ "# Initialize the LLM (Large Language Model) with the specified model\n",
+ "llm = ChatNVIDIA(model=\"meta/llama3-8b-instruct\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "69b7510e-793a-45fa-ba4d-51ec90ea7178",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create a chat prompt template with a system message and a user message\n",
+ "prompt = ChatPromptTemplate.from_messages([\n",
+ " (\"system\", (\n",
+ " \"You are a helpful and friendly AI!\"\n",
+ " \"Your responses should be concise and no longer than two sentences.\"\n",
+ " \"Say you don't know if you don't have this information.\"\n",
+ " )),\n",
+ " (\"user\", \"{question}\")\n",
+ "])\n",
+ "# Chain the prompt, LLM, and output parser together\n",
+ "chain = prompt | llm | StrOutputParser()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "b480e282-bab0-4b44-b0aa-b4791590bf80",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "A CPU (Central Processing Unit) is the brain of your computer, handling general computing tasks, executing instructions, and performing calculations. A GPU (Graphics Processing Unit) is designed specifically for handling graphics and computationally intensive tasks, like gaming, video editing, and scientific simulations, with many cores performing parallel processing.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example questions to invoke the LLM chain\n",
+ "print(chain.invoke({\"question\": \"What's the difference between a GPU and a CPU?\"}))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "0082e754-cc38-4120-b19c-e8100ecbffba",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "I'm not familiar with the NVIDIA H200, as it doesn't seem to be a publicly recognized product.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example questions to invoke the LLM chain\n",
+ "print(chain.invoke({\"question\": \"What does the H in the NVIDIA H200 stand for?\"}))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "865decf9-1767-4f5f-b006-0386d237c0e7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The database user name is: vector\n",
+ "Database connection information is: localhost:1521/freepdb1\n",
+ "Connection successful!\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Database connection setup\n",
+ "username = \"\"\n",
+ "password = \"\"\n",
+ "host=\"\"\n",
+ "port=\"\"\n",
+ "service_name=\"\"\n",
+ "dsn=host+\":\"+port+\"/\"+service_name\n",
+ "\n",
+ "print(\"The database user name is:\", username)\n",
+ "print(\"Database connection information is:\", dsn)\n",
+ "\n",
+ "# Connect to the database\n",
+ "try:\n",
+ " conn23c = oracledb.connect(user=username, password=password, dsn=dsn)\n",
+ " print(\"Connection successful!\")\n",
+ "except oracledb.DatabaseError as e:\n",
+ " error, = e.args\n",
+ " print(f\"Connection failed. Error code: {error.code}\")\n",
+ " print(f\"Error message: {error.message}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "231ddf31-d58f-43b0-9d5e-fbaba97edc28",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Document(metadata={'source': 'https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf', 'page': 0}, page_content='NVIDIA H200 Tensor Core GPU\\u2002|\\u2002Datasheet\\u2002|\\u2002 1NVIDIA H200 Tensor Core GPU\\nSupercharging AI and HPC workloads.\\nHigher Performance With Larger, Faster Memory\\nThe NVIDIA H200 Tensor Core GPU supercharges generative AI and high-\\nperformance computing (HPC) workloads with game-changing performance \\nand memory capabilities. \\nBased on the NVIDIA Hopper™ architecture , the NVIDIA H200 is the first GPU to \\noffer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s)—\\nthat’s nearly double the capacity of the NVIDIA H100 Tensor Core GPU with \\n1.4X more memory bandwidth. The H200’s larger and faster memory accelerates \\ngenerative AI and large language models, while advancing scientific computing for \\nHPC workloads with better energy efficiency and lower total cost of ownership. \\nUnlock Insights With High-Performance LLM Inference\\nIn the ever-evolving landscape of AI, businesses rely on large language models to \\naddress a diverse range of inference needs. An AI inference accelerator must deliver the \\nhighest throughput at the lowest TCO when deployed at scale for a massive user base. \\nThe H200 doubles inference performance compared to H100 GPUs when handling \\nlarge language models such as Llama2 70B.\\n.\\nPreliminary specifications. May be subject to change.\\nLlama2 13B: ISL 128, OSL 2K | Throughput | H100 SXM 1x GPU BS 64 | H200 SXM 1x GPU BS 128\\nGPT-3 175B: ISL 80, OSL 200 | x8 H100 SXM GPUs BS 64 | x8 H200 SXM GPUs BS 128\\nLlama2 70B: ISL 2K, OSL 128 | Throughput | H100 SXM 1x GPU BS 8 | H200 SXM 1x GPU BS 32.Key Features\\n >141GB of HBM3e GPU memory\\n >4.8TB/s of memory bandwidth\\n >4 petaFLOPS of FP8 performance\\n >2X LLM inference performance\\n >110X HPC performance\\nDatasheet')"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Load a PDF document from a URL\n",
+ "loader = PyPDFLoader(\"https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf\")\n",
+ "# Load the document into memory\n",
+ "document = loader.load()\n",
+ "document[0] # Print the first page of the document"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "2bc98e83-c610-46ad-af2e-8d47661a395d",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Number of chunks from the document: 16\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Initialize a text splitter to chunk the document into smaller pieces\n",
+ "text_splitter = RecursiveCharacterTextSplitter(\n",
+ " chunk_size=500,\n",
+ " chunk_overlap=100,\n",
+ " separators=[\"\\n\\n\", \"\\n\", \".\", \";\", \",\", \" \", \"\"],\n",
+ ")\n",
+ "# Split the document into chunks\n",
+ "document_chunks = text_splitter.split_documents(document)\n",
+ "print(\"Number of chunks from the document:\", len(document_chunks))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "d0d5abf6-a1f8-4425-8c3d-5ec7c859154a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Define the query to be used with the reranker\n",
+ "query = \"What does the H in the NVIDIA H200 stand for?\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "3eee336e-52a8-4575-a915-615dec68434c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Initialize the NVIDIA reranker with the specified model\n",
+ "reranker = NVIDIARerank(model=\"nvidia/nv-rerankqa-mistral-4b-v3\", base_url=\"http://localhost:8001/v1\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "ecc76695-ac0a-4729-a166-0c8a598d7161",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Rerank the document chunks based on the query\n",
+ "reranked_chunks = reranker.compress_documents(query=query,documents=document_chunks)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "67b01618-89b6-41f7-8903-c3b09ccc8616",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Relevance Score:16.3125, Page Content:NVIDIA H200 Tensor Core GPU | Datasheet | 1NVIDIA H200 Tensor Core GPU\n",
+ "Supercharging AI and HPC workloads.\n",
+ "Higher Performance With Larger, Faster Memory\n",
+ "The NVIDIA H200 Tensor Core GPU supercharges generative AI and high-\n",
+ "performance computing (HPC) workloads with game-changing performance \n",
+ "and memory capabilities. \n",
+ "Based on the NVIDIA Hopper™ architecture , the NVIDIA H200 is the first GPU to \n",
+ "offer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s)—...\n",
+ "----------------------------------------------------------------------------------------------------\n",
+ "Relevance Score:10.875, Page Content:NVIDIA H200 Tensor Core GPU | Datasheet | 3Unleashing AI Acceleration for Mainstream Enterprise Servers \n",
+ "With H200 NVL\n",
+ "The NVIDIA H200 NVL is the ideal choice for customers with space constraints within \n",
+ "the data center, delivering acceleration for every AI and HPC workload regardless of size. \n",
+ "With a 1.5X memory increase and a 1.2X bandwidth increase over the previous generation, \n",
+ "customers can fine-tune LLMs within a few hours and experience LLM inference 1.8X faster....\n",
+ "----------------------------------------------------------------------------------------------------\n",
+ "Relevance Score:6.79296875, Page Content:NVIDIA H200 Tensor Core GPU | Datasheet | 2Supercharge High-Performance Computing\n",
+ "Memory bandwidth is crucial for HPC applications, as it enables faster data \n",
+ "transfer and reduces complex processing bottlenecks. For memory-intensive \n",
+ "HPC applications like simulations, scientific research, and artificial intelligence, \n",
+ "the H200’s higher memory bandwidth ensures that data can be accessed and \n",
+ "manipulated efficiently, leading to 110X faster time to results....\n",
+ "----------------------------------------------------------------------------------------------------\n",
+ "Relevance Score:4.41796875, Page Content:offer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s)—\n",
+ "that’s nearly double the capacity of the NVIDIA H100 Tensor Core GPU with \n",
+ "1.4X more memory bandwidth. The H200’s larger and faster memory accelerates \n",
+ "generative AI and large language models, while advancing scientific computing for \n",
+ "HPC workloads with better energy efficiency and lower total cost of ownership. \n",
+ "Unlock Insights With High-Performance LLM Inference...\n",
+ "----------------------------------------------------------------------------------------------------\n",
+ "Relevance Score:4.078125, Page Content:Certified Systems™ with 4 or 8 GPUsNVIDIA MGX™ H200 NVL partner and \n",
+ "NVIDIA-Certified Systems with up to 8 GPUs\n",
+ "NVIDIA AI Enterprise Add-on Included\n",
+ "1. Preliminary specifications. May be subject to change. \n",
+ "2. With sparsity.\n",
+ "Ready to Get Started?\n",
+ "To learn more about the NVIDIA H200 Tensor Core GPU, \n",
+ "visit nvidia.com/h200\n",
+ "© 2024 NVIDIA Corporation and affiliates. All rights reserved. NVIDIA, the NVIDIA logo, HGX, Hopper, MGX, NVIDIA-...\n",
+ "----------------------------------------------------------------------------------------------------\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Print out the relevance score and page content for each chunk\n",
+ "for chunks in reranked_chunks:\n",
+ "\n",
+ " # Access the metadata of the document\n",
+ " metadata = chunks.metadata\n",
+ "\n",
+ " # Get the page content\n",
+ " page_content = chunks.page_content\n",
+ " \n",
+ " # Print the relevance score if it exists in the metadata, followed by page content\n",
+ " if 'relevance_score' in metadata:\n",
+ " print(f\"Relevance Score:{metadata['relevance_score']}, Page Content:{page_content}...\")\n",
+ " print(f\"{'-' * 100}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "77c1a332-dc05-44e7-ad69-bc18bb831eab",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/ubuntu/myenv_nemo/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:486: UserWarning: Found nvidia/nv-embedqa-e5-v5 in available_models, but type is unknown and inference may fail.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Initialize the NVIDIA embeddings model\n",
+ "embedding_model = NVIDIAEmbeddings(model=\"nvidia/nv-embedqa-e5-v5\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "e76a6f9c-02f4-4156-87de-8485ea9bf633",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Store the document chunks in an Oracle vector store with the embeddings model\n",
+ "vector_store = OracleVS.from_documents(\n",
+ " document_chunks,\n",
+ " embedding_model,\n",
+ " client=conn23c,\n",
+ " table_name=\"MY_DEM04\",\n",
+ " distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
+ " #tablespace=\"my_tablespace\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "042581c7-46b4-46d3-862e-c2db278acbb8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Convert the vector store into a retriever with the specified search parameters\n",
+ "retriever =vector_store.as_retriever(search_kwargs={\"k\": 10})"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "0f6de851-ffe0-4352-9ecb-6e112928ddd8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Re-initialize the compressor with the reranker model\n",
+ "compressor = NVIDIARerank(model=\"nvidia/nv-rerankqa-mistral-4b-v3\",\n",
+ " base_url=\"http://localhost:8001/v1\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "bccfcd50-1d7d-490e-8309-05dfc1d6de19",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "compression_retriever = ContextualCompressionRetriever(\n",
+ " base_compressor=compressor,\n",
+ " base_retriever=retriever\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "1f0e8160-6678-4f9c-8eab-8a9d54275660",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'query': 'What does the H in the NVIDIA H200 stand for?',\n",
+ " 'result': 'The \"H\" in the NVIDIA H200 stands for \"Hopper\". The NVIDIA H200 is based on the NVIDIA Hopper architecture, which is a specific design and technical architecture used by NVIDIA for their GPUs.'}"
+ ]
+ },
+ "execution_count": 22,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Define the query to be used with the retrieval QA chain\n",
+ "query = \"What does the H in the NVIDIA H200 stand for?\"\n",
+ "# Create a retrieval QA chain using the LLM and retriever\n",
+ "chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)\n",
+ "chain.invoke(query)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/nvidia-nemo-oke/sample-notebooks/text_embedding_23ai_clean.ipynb b/nvidia-nemo-oke/sample-notebooks/text_embedding_23ai_clean.ipynb
new file mode 100644
index 0000000..33756b9
--- /dev/null
+++ b/nvidia-nemo-oke/sample-notebooks/text_embedding_23ai_clean.ipynb
@@ -0,0 +1,395 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "32c6cd3c-a339-4b67-b041-8bf8517a7dbb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#importing all the needed packages\n",
+ "import os\n",
+ "from langchain_nvidia_ai_endpoints import ChatNVIDIA\n",
+ "from langchain_core.prompts import ChatPromptTemplate\n",
+ "from langchain_core.output_parsers import StrOutputParser\n",
+ "import oracledb\n",
+ "from langchain_community.document_loaders import PyPDFLoader\n",
+ "from langchain_community.vectorstores.oraclevs import OracleVS\n",
+ "from langchain_community.vectorstores.utils import DistanceStrategy\n",
+ "from langchain_core.prompts import ChatPromptTemplate\n",
+ "from langchain_core.runnables import RunnablePassthrough\n",
+ "from langchain_core.output_parsers import StrOutputParser\n",
+ "from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings\n",
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "c1308beb-9a6a-40fb-bc29-0a7675532b44",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Set the NVIDIA API key as an environment variable\n",
+ "os.environ[\"NVIDIA_API_KEY\"] = \"\" \n",
+ "# Initialize the LLM (Large Language Model) with the specified model\n",
+ "llm = ChatNVIDIA(model=\"meta/llama3-8b-instruct\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "613c3945-2e60-4e63-b7c6-9448ec98c577",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Create a chat prompt template with a system message and a user message\n",
+ "prompt = ChatPromptTemplate.from_messages([\n",
+ " (\"system\", (\n",
+ " \"You are a helpful and friendly AI!\"\n",
+ " \"Your responses should be concise and no longer than two sentences.\"\n",
+ " \"Say you don't know if you don't have this information.\"\n",
+ " )),\n",
+ " (\"user\", \"{question}\")\n",
+ "])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "1773cd2f-1e28-4002-bd3d-6db55fb03dfa",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Chain the prompt, LLM, and output parser together\n",
+ "chain = prompt | llm | StrOutputParser()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "12143a42-8e20-40d4-b8bd-2dc5f29103aa",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "A CPU (Central Processing Unit) is the brain of your computer, handling general computing tasks, executing instructions, and performing calculations. A GPU (Graphics Processing Unit) is designed specifically for handling graphics and computationally intensive tasks, like gaming, video editing, and scientific simulations, with many cores performing parallel processing.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example questions to invoke the LLM chain\n",
+ "print(chain.invoke({\"question\": \"What's the difference between a GPU and a CPU?\"}))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "fe31329f-9f95-4d8f-8737-d61a7ea3e251",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "I'm happy to help! The \"A\" in NVIDIA A100 likely stands for \"Accelerated\", which refers to the card's enhanced computing capabilities.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example questions to invoke the LLM chain\n",
+ "print(chain.invoke({\"question\": \"What does the A in the NVIDIA A100 stand for?\"}))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "b77aa0fc-6d7e-4516-bf2c-2ec046263dc9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "I'm not familiar with the NVIDIA H200, could you provide more context or information about it?\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Example questions to invoke the LLM chain\n",
+ "print(chain.invoke({\"question\": \"How much memory does the NVIDIA H200 have?\"}))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "d7a94eb5-4ef9-426f-bc36-4418c7cf2518",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The database user name is: vector\n",
+ "Database connection information is: localhost:1521/freepdb1\n",
+ "Connection successful!\n"
+ ]
+ }
+ ],
+ "source": [
+ "## # Database connection setup\n",
+ "username = \"\"\n",
+ "password = \"\"\n",
+ "host=\"\"\n",
+ "port=\"\"\n",
+ "service_name=\"\"\n",
+ "dsn=host+\":\"+port+\"/\"+service_name\n",
+ "\n",
+ "print(\"The database user name is:\", username)\n",
+ "print(\"Database connection information is:\", dsn)\n",
+ "\n",
+ "## Connect to the database\n",
+ "try:\n",
+ " conn23c = oracledb.connect(user=username, password=password, dsn=dsn)\n",
+ " print(\"Connection successful!\")\n",
+ "except oracledb.DatabaseError as e:\n",
+ " error, = e.args\n",
+ " print(f\"Connection failed. Error code: {error.code}\")\n",
+ " print(f\"Error message: {error.message}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "cee3c7c3-15e7-40ce-9c91-b98dbb44e8a2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/home/ubuntu/myenv_nemo/lib/python3.12/site-packages/langchain_nvidia_ai_endpoints/_common.py:486: UserWarning: Found nvidia/nv-embedqa-e5-v5 in available_models, but type is unknown and inference may fail.\n",
+ " warnings.warn(\n"
+ ]
+ }
+ ],
+ "source": [
+ "## Initialize an embedding model for query embedding\n",
+ "embedding_model = NVIDIAEmbeddings(model=\"nvidia/nv-embedqa-e5-v5\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "b74d98e5-f993-42a7-841e-77271cef322b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[-0.0251007080078125,\n",
+ " -0.038055419921875,\n",
+ " 0.035980224609375,\n",
+ " -0.061309814453125,\n",
+ " 0.056396484375,\n",
+ " -0.001224517822265625,\n",
+ " 0.01220703125,\n",
+ " -0.04010009765625,\n",
+ " -0.0258941650390625,\n",
+ " -0.029815673828125]"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "## Create an embedding vector for a specific query\n",
+ "embedding_model.embed_query(\"How much memory does the NVIDIA H200 have?\")[:10]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "128844b0-d715-4fb1-9664-0b31f949da82",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Document(metadata={'source': 'https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf', 'page': 0}, page_content='NVIDIA H200 Tensor Core GPU\\u2002|\\u2002Datasheet\\u2002|\\u2002 1NVIDIA H200 Tensor Core GPU\\nSupercharging AI and HPC workloads.\\nHigher Performance With Larger, Faster Memory\\nThe NVIDIA H200 Tensor Core GPU supercharges generative AI and high-\\nperformance computing (HPC) workloads with game-changing performance \\nand memory capabilities. \\nBased on the NVIDIA Hopper™ architecture , the NVIDIA H200 is the first GPU to \\noffer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s)—\\nthat’s nearly double the capacity of the NVIDIA H100 Tensor Core GPU with \\n1.4X more memory bandwidth. The H200’s larger and faster memory accelerates \\ngenerative AI and large language models, while advancing scientific computing for \\nHPC workloads with better energy efficiency and lower total cost of ownership. \\nUnlock Insights With High-Performance LLM Inference\\nIn the ever-evolving landscape of AI, businesses rely on large language models to \\naddress a diverse range of inference needs. An AI inference accelerator must deliver the \\nhighest throughput at the lowest TCO when deployed at scale for a massive user base. \\nThe H200 doubles inference performance compared to H100 GPUs when handling \\nlarge language models such as Llama2 70B.\\n.\\nPreliminary specifications. May be subject to change.\\nLlama2 13B: ISL 128, OSL 2K | Throughput | H100 SXM 1x GPU BS 64 | H200 SXM 1x GPU BS 128\\nGPT-3 175B: ISL 80, OSL 200 | x8 H100 SXM GPUs BS 64 | x8 H200 SXM GPUs BS 128\\nLlama2 70B: ISL 2K, OSL 128 | Throughput | H100 SXM 1x GPU BS 8 | H200 SXM 1x GPU BS 32.Key Features\\n >141GB of HBM3e GPU memory\\n >4.8TB/s of memory bandwidth\\n >4 petaFLOPS of FP8 performance\\n >2X LLM inference performance\\n >110X HPC performance\\nDatasheet')"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Load a PDF document from a URL\n",
+ "loader = PyPDFLoader(\"https://nvdam.widen.net/content/udc6mzrk7a/original/hpc-datasheet-sc23-h200-datasheet-3002446.pdf\")\n",
+ "document = loader.load()\n",
+ "document[0] # Print the first page of the document"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "505690c9-eba3-4aa8-89d3-c24048c2db50",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Initialize a text splitter to chunk the document into smaller pieces\n",
+ "text_splitter = RecursiveCharacterTextSplitter(\n",
+ " chunk_size=500,\n",
+ " chunk_overlap=100,\n",
+ " separators=[\"\\n\\n\", \"\\n\", \".\", \";\", \",\", \" \", \"\"],\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "dcd1e24e-e171-47cf-b291-c1df5d96f6fc",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Number of chunks from the document: 16\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Split the document into chunks\n",
+ "document_chunks = text_splitter.split_documents(document)\n",
+ "print(\"Number of chunks from the document:\", len(document_chunks))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "a08329bc-7289-4557-8ffc-3e181d7b9dbe",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Extract text (page content) from the document chunks\n",
+ "page_contents = [doc.page_content for doc in document_chunks]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "d4cccdda-383b-480b-9cf8-abd8c8f314a6",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[-0.0394287109375,\n",
+ " -0.03741455078125,\n",
+ " 0.06634521484375,\n",
+ " -0.0518798828125,\n",
+ " 0.08477783203125,\n",
+ " -0.0224456787109375,\n",
+ " 0.02484130859375,\n",
+ " -0.0247802734375,\n",
+ " -0.01496124267578125,\n",
+ " -0.005344390869140625]"
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Create vector embeddings from the document\n",
+ "embedding_model.embed_documents(page_contents)[0][:10]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "b47aa9ee-9056-4813-ab07-a96e6d355098",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Initialize a OracleVS vector store to store the document embeddings in oracle 23ai\n",
+ "vector_store = OracleVS.from_documents(\n",
+ " document_chunks,\n",
+ " embedding_model,\n",
+ " client=conn23c,\n",
+ " table_name=\"MY_DEM04\",\n",
+ " distance_strategy=DistanceStrategy.DOT_PRODUCT,\n",
+ " #tablespace=\"my_tablespace\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "a6ef58af-ea62-4e28-b468-552798fb6584",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The NVIDIA H200 has 141 gigabytes (GB) of HBM3e memory.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Create a new chat prompt template for the AI with context awareness\n",
+ "prompt = ChatPromptTemplate.from_messages([\n",
+ " (\"system\", \n",
+ " \"You are a helpful and friendly AI!\"\n",
+ " \"Your responses should be concise and no longer than two sentences.\"\n",
+ " \"Do not hallucinate. Say you don't know if you don't have this information.\"\n",
+ " # \"Answer the question using only the context\"\n",
+ " \"\\n\\nQuestion:{question}\\n\\nContext:{context}\"\n",
+ " ),\n",
+ " (\"user\", \"{question}\")\n",
+ "])\n",
+ "# Create a chain that retrieves context from the vector store and answers questions\n",
+ "chain = (\n",
+ " {\n",
+ " \"context\": vector_store.as_retriever(),\n",
+ " \"question\": RunnablePassthrough()\n",
+ " }\n",
+ " | prompt\n",
+ " | llm\n",
+ " | StrOutputParser()\n",
+ ")\n",
+ "# Invoke the chain with specific questions, using the retrieved context\n",
+ "print(chain.invoke(\"How much memory does the NVIDIA H200 have?\"))"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.12.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}