Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,27 @@
---
title: Cleanup
linkTitle: 10. Cleanup
title: Wrap-Up
linkTitle: 10. Wrap-Up
weight: 10
time: 5 minutes
---

## Wrap-Up

We hope you enjoyed this workshop, which provided hands-on experience deploying and working
with several of the technologies that are used to monitor Cisco AI PODs with
Splunk Observability Cloud. Specifically, you had the opportunity to:

* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
* Add Prometheus receivers to the collector to ingest infrastructure metrics.
* Deploy the Weaviate vector database to the cluster.
* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.

## Clean Up Steps

Follow the steps in this section to uninstall the OpenShift cluster.

Get the cluster ID, the Amazon Resource Names (ARNs) for the cluster-specific Operator roles,
Expand All @@ -22,12 +39,16 @@ rosa delete cluster --cluster=$CLUSTER_NAME --watch

Delete the cluster-specific Operator IAM roles:

> Note: just accept the default values when prompted.

``` bash
rosa delete operator-roles --prefix $OPERATOR_ROLES_PREFIX
```

Delete the OIDC provider:

> Note: just accept the default values when prompted.

``` bash
rosa delete oidc-provider --oidc-config-id $OIDC_ID
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ export OPERATOR_ROLES_PREFIX=rosa-test-a6x9

Create operator roles for the OIDC configuration using the following command:

> Note: just accept the default values when prompted.

``` bash
rosa create operator-roles --hosted-cp --prefix $OPERATOR_ROLES_PREFIX --oidc-config-id $OIDC_ID
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ time: 10 minutes
Now that our OpenShift cluster is up and running, let's deploy the
OpenTelemetry Collector, which gathers metrics, logs, and traces
from the infrastructure and applications running in the cluster, and
sends the resulting data to Splunk.
sends the resulting data to Splunk Observability Cloud.

## Deploy the OpenTelemetry Collector

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,14 @@ weight: 5
time: 20 minutes
---

The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such
The **NVIDIA GPU Operator** is a Kubernetes Operator that automates the deployment, configuration,
and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster.

The **NVIDIA NIM Operator** is used to deploy LLMs in Kubernetes environments, such
as the OpenShift cluster we created earlier in this workshop.

This section of the workshop walks through the steps necessary to deploy the
NVIDIA NIM operator in our OpenShift cluster.
This section of the workshop walks through the steps necessary to deploy both the
NVIDIA GPU and NIM operators in our OpenShift cluster.

## Create a NVIDIA NGC Account

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ weight: 6
time: 20 minutes
---

In this section, we'll use the NVIDIA NIM Operator to deploy a Large Language Model
In this section, we'll use the NVIDIA NIM Operator to deploy two Large Language Models
to our OpenShift Cluster.

## Create a Namespace
Expand Down
29 changes: 17 additions & 12 deletions content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,20 @@ weight: 8
time: 10 minutes
---

In this step, we'll deploy a vector database to the AI POD and populate it with
In this step, we'll deploy a vector database to the OpenShift cluster and populate it with
test data.

## What is a Vector Database?

A vector database stores and indexes data as numerical "vector embeddings," which capture
the semantic meaning of information like text or images. Unlike traditional databases,
they excel at "similarity searches," finding conceptually related data points rather
A **vector database** stores and indexes data as numerical "vector embeddings," which capture
the **semantic meaning** of information like text or images. Unlike traditional databases,
they excel at **similarity searches**, finding conceptually related data points rather
than exact matches.

## How is a Vector Database Used?

Vector databases play a key role in a pattern called
Retrieval Augmented Generation (RAG), which is widely used by
**Retrieval Augmented Generation (RAG)**, which is widely used by
applications that leverage Large Language Models (LLMs).

The pattern is as follows:
Expand Down Expand Up @@ -63,7 +63,7 @@ oc create namespace weaviate

Run the following command to allow Weaviate to run a privileged container:

> Note: this approach is not recommended for production
> Note: this approach is not recommended for production environments

``` bash
oc adm policy add-scc-to-user privileged -z default -n weaviate
Expand All @@ -85,9 +85,14 @@ Now that Weaviate is installed in our OpenShift cluster, let's modify the
OpenTelemetry collector configuration to scrape Weaviate's Prometheus
metrics.

To do so, let's add an additional Prometheus receiver to the `otel-collector-values.yaml` file:
To do so, let's add an additional Prometheus receiver creator section
to the `otel-collector-values.yaml` file:

``` yaml
receiver_creator/weaviate:
# Name of the extensions to watch for endpoints to start and stop.
watch_observers: [ k8s_observer ]
receivers:
prometheus/weaviate:
config:
config:
Expand Down Expand Up @@ -142,12 +147,12 @@ that we can more easily distinguish Weaviate metrics from other metrics that use
`service.instance.id`, which is a standard OpenTelemetry property used in
Splunk Observability Cloud.

We'll need to add this Resource processor to the metrics pipeline as well:
We'll need to add a new metrics pipeline for Weaviate metrics as well (we
need to use a separate pipeline since we don't want the `weaviate.instance.id`
metric to be added to non-Weaviate metrics):

``` yaml
service:
pipelines:
metrics/nvidia-metrics:
metrics/weaviate:
exporters:
- signalfx
processors:
Expand All @@ -158,7 +163,7 @@ We'll need to add this Resource processor to the metrics pipeline as well:
- resourcedetection
- resource
receivers:
- receiver_creator/nvidia
- receiver_creator/weaviate
```

Before applying the configuration changes to the collector, take a moment to compare the
Expand Down
143 changes: 125 additions & 18 deletions content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,136 @@ weight: 9
time: 10 minutes
---

In the final step of the workshop, we'll deploy an application to our Cisco AI POD
In the final step of the workshop, we'll deploy an application to our OpenShift cluster
that uses the instruct and embeddings models that we deployed earlier using the
NVIDIA NIM operator.

## Application Overview

Like most applications that interact with LLMs, our application is written in Python.
It also uses [LangChain](https://www.langchain.com/), which is an open-source orchestration
framework that simplifies the development of applications powered by LLMs.

Our application starts by connecting to two LLMs that we'll be using:

* `meta/llama-3.2-1b-instruct`: used for responding to user prompts
* `nvidia/llama-3.2-nv-embedqa-1b-v2`: used to calculate embeddings

``` python
# connect to a LLM NIM at the specified endpoint, specifying a specific model
llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct")

# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2)
embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2",
base_url=EMBEDDINGS_MODEL_URL)
```

The URL's used for both LLMs are defined in the `k8s-manifest.yaml` file:

``` yaml
- name: INSTRUCT_MODEL_URL
value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1"
- name: EMBEDDINGS_MODEL_URL
value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1"
```

The application then defines a prompt template that will be used in interactions
with the LLM:

``` python
prompt = ChatPromptTemplate.from_messages([
("system",
"You are a helpful and friendly AI!"
"Your responses should be concise and no longer than two sentences."
"Do not hallucinate. Say you don't know if you don't have this information."
"Answer the question using only the context"
"\n\nQuestion: {question}\n\nContext: {context}"
),
("user", "{question}")
])
```

> Note how we're explicitly instructing the LLM to just say it doesn't know the answer if
> it doesn't know, which helps minimize hallucinations. There's also a placeholder for
> us to provide context that the LLM can use to answer the question.

The application uses Flask, and defines a single endpoint named `/askquestion` to
respond to questions from end users. To implement this endpoint, the application
connects to the Weaviate vector database, and then invokes a chain (using LangChain)
that takes the user's question, converts it to an embedding, and then looks up similar
documents in the vector database. It then sends the user's question to the LLM, along
with the related documents, and returns the LLM's response.

``` python
# connect with the vector store that was populated earlier
vector_store = WeaviateVectorStore(
client=weaviate_client,
embedding=embeddings_model,
index_name="CustomDocs",
text_key="page_content"
)

chain = (
{
"context": vector_store.as_retriever(),
"question": RunnablePassthrough()
}
| prompt
| llm
| StrOutputParser()
)

response = chain.invoke(question)
```

## Instrument the Application with OpenTelemetry

To capture metrics, traces, and logs from our application, we've instrumented it with OpenTelemetry.
This required adding the following package to the `requirements.txt` file (which ultimately gets
installed with `pip install`):

````
splunk-opentelemetry==2.7.0
````

We also added the following to the `Dockerfile` used to build the
container image for this application, to install additional OpenTelemetry
instrumentation packages:

``` dockerfile
# Add additional OpenTelemetry instrumentation packages
RUN opentelemetry-bootstrap --action=install
```

Then we modified the `ENTRYPOINT` in the `Dockerfile` to call `opentelemetry-instrument`
when running the application:

``` dockerfile
ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"]
```

Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a
package named [OpenLIT](https://openlit.io/) to the `requirements.txt` file:

````
openlit==1.35.4
````

OpenLIT supports LangChain, and adds additional context to traces at instrumentation time,
such as the number of tokens used to process the request, and what the prompt and
response were.

To initialize OpenLIT, we added the following to the application code:

``` python
import openlit
...
openlit.init(environment="llm-app")
```

## Deploy the LLM Application

Let's deploy an application to our OpenShift cluster that answers questions
using the context that we loaded into the Weaviate vector database earlier.
Use the following command to deploy this application to the OpenShift cluster:

``` bash
oc apply -f ./llm-app/k8s-manifest.yaml
Expand Down Expand Up @@ -90,18 +212,3 @@ Finally, we can see the response from the LLM, the time it took, and the number
input and output tokens utilized:

![LLM Response](../images/LLMResponse.png)

## Wrap-Up

We hope you enjoyed this workshop, which provided hands-on experience deploying and working
with several of the technologies that are used to monitor Cisco AI PODs with
Splunk Observability Cloud. Specifically, you had the opportunity to:

* Deploy a RedHat OpenShift cluster with GPU-based worker nodes.
* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator.
* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster.
* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster.
* Add Prometheus receivers to the collector to ingest infrastructure metrics.
* Deploy the Weaviate vector database to the cluster.
* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs.
27 changes: 20 additions & 7 deletions content/en/ninja-workshops/14-cisco-ai-pods/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,29 @@ scalable, and efficient AI-ready infrastructure tailored to diverse needs.
**Splunk Observability Cloud** provides comprehensive visibility into all of this infrastructure
along with all the application components that are running on this stack.

The steps to configure Splunk Observability Cloud for a Cisco AI POD environment are fully
documented (see [here](https://github.com/signalfx/splunk-opentelemetry-examples/tree/main/collector/cisco-ai-ready-pods)
for details).

However, it's not always possible to get access to a Cisco AI POD environment to practice
the installation steps.

This workshop provides hands-on experience deploying and working with several of the technologies
that are used to monitor Cisco AI PODs with Splunk Observability Cloud, including:
that are used to monitor Cisco AI PODs with Splunk Observability Cloud, without requiring
access to an actual Cisco AI POD. This includes:

* Practice deploying an OpenTelemetry Collector in a Red Hat OpenShift cluster.
* Practice configuring Prometheus receivers with the collector to ingest infrastructure metrics.
* Practice instrumenting Python services that interact with Large Language Models (LLMs) with OpenTelemetry.
* Practice deploying a **RedHat OpenShift** cluster with GPU-based worker nodes.
* Practice deploying the **NVIDIA NIM Operator** and **NVIDIA GPU Operator**.
* Practice deploying **Large Language Models (LLMs)** using NVIDIA NIM to the cluster.
* Practice deploying the **OpenTelemetry Collector** in the Red Hat OpenShift cluster.
* Practice adding **Prometheus** receivers to the collector to ingest infrastructure metrics.
* Practice deploying the **Weaviate** vector database to the cluster.
* Practice instrumenting Python services that interact with Large Language Models (LLMs) with **OpenTelemetry**.
* Understanding which details which OpenTelemetry captures in the trace from applications that interact with LLMs.

While access to an actual Cisco AI POD isn't required, the workshop **does** require access
to an AWS account. We'll walk you through the steps of creating a Red Hat OpenShift
cluster in AWS that we'll use for the rest of the workshop.
> Please note: Red Hat OpenShift and NVIDIA AI Enterprise components
> are typically pre-installed with an actual AI POD. However, because we’re using AWS for this workshop,
> it’s necessary to perform these setup steps manually.

{{% notice title="Tip" style="primary" icon="lightbulb" %}}
The easiest way to navigate through this workshop is by using:
Expand Down
Loading