diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md b/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md index 33606572db..9976eb34ec 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/10-cleanup.md @@ -1,10 +1,27 @@ --- -title: Cleanup -linkTitle: 10. Cleanup +title: Wrap-Up +linkTitle: 10. Wrap-Up weight: 10 time: 5 minutes --- +## Wrap-Up + +We hope you enjoyed this workshop, which provided hands-on experience deploying and working +with several of the technologies that are used to monitor Cisco AI PODs with +Splunk Observability Cloud. Specifically, you had the opportunity to: + +* Deploy a RedHat OpenShift cluster with GPU-based worker nodes. +* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator. +* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster. +* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster. +* Add Prometheus receivers to the collector to ingest infrastructure metrics. +* Deploy the Weaviate vector database to the cluster. +* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry. +* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs. + +## Clean Up Steps + Follow the steps in this section to uninstall the OpenShift cluster. Get the cluster ID, the Amazon Resource Names (ARNs) for the cluster-specific Operator roles, @@ -22,12 +39,16 @@ rosa delete cluster --cluster=$CLUSTER_NAME --watch Delete the cluster-specific Operator IAM roles: +> Note: just accept the default values when prompted. + ``` bash rosa delete operator-roles --prefix $OPERATOR_ROLES_PREFIX ``` Delete the OIDC provider: +> Note: just accept the default values when prompted. + ``` bash rosa delete oidc-provider --oidc-config-id $OIDC_ID ``` diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/3-deploy-openshift-cluster.md b/content/en/ninja-workshops/14-cisco-ai-pods/3-deploy-openshift-cluster.md index d11a68c228..ae452677c7 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/3-deploy-openshift-cluster.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/3-deploy-openshift-cluster.md @@ -24,6 +24,8 @@ export OPERATOR_ROLES_PREFIX=rosa-test-a6x9 Create operator roles for the OIDC configuration using the following command: +> Note: just accept the default values when prompted. + ``` bash rosa create operator-roles --hosted-cp --prefix $OPERATOR_ROLES_PREFIX --oidc-config-id $OIDC_ID ``` diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md b/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md index 525a1700ef..a7eaf4bf08 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/4-deploy-otel-collector.md @@ -8,7 +8,7 @@ time: 10 minutes Now that our OpenShift cluster is up and running, let's deploy the OpenTelemetry Collector, which gathers metrics, logs, and traces from the infrastructure and applications running in the cluster, and -sends the resulting data to Splunk. +sends the resulting data to Splunk Observability Cloud. ## Deploy the OpenTelemetry Collector diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md b/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md index b653e7f574..4e6ba061a2 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/5-deploy-nvidia-nim.md @@ -5,11 +5,14 @@ weight: 5 time: 20 minutes --- -The NVIDIA NIM Operator is used to deploy LLMs in Kubernetes environments, such +The **NVIDIA GPU Operator** is a Kubernetes Operator that automates the deployment, configuration, +and management of all necessary NVIDIA software components to provision GPUs within a Kubernetes cluster. + +The **NVIDIA NIM Operator** is used to deploy LLMs in Kubernetes environments, such as the OpenShift cluster we created earlier in this workshop. -This section of the workshop walks through the steps necessary to deploy the -NVIDIA NIM operator in our OpenShift cluster. +This section of the workshop walks through the steps necessary to deploy both the +NVIDIA GPU and NIM operators in our OpenShift cluster. ## Create a NVIDIA NGC Account diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md b/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md index 75919521a8..a855912b71 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/6-deploy-llm.md @@ -5,7 +5,7 @@ weight: 6 time: 20 minutes --- -In this section, we'll use the NVIDIA NIM Operator to deploy a Large Language Model +In this section, we'll use the NVIDIA NIM Operator to deploy two Large Language Models to our OpenShift Cluster. ## Create a Namespace diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md b/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md index 42102d2e76..f351f37b73 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/8-deploy-vector-db.md @@ -5,20 +5,20 @@ weight: 8 time: 10 minutes --- -In this step, we'll deploy a vector database to the AI POD and populate it with +In this step, we'll deploy a vector database to the OpenShift cluster and populate it with test data. ## What is a Vector Database? -A vector database stores and indexes data as numerical "vector embeddings," which capture -the semantic meaning of information like text or images. Unlike traditional databases, -they excel at "similarity searches," finding conceptually related data points rather +A **vector database** stores and indexes data as numerical "vector embeddings," which capture +the **semantic meaning** of information like text or images. Unlike traditional databases, +they excel at **similarity searches**, finding conceptually related data points rather than exact matches. ## How is a Vector Database Used? Vector databases play a key role in a pattern called -Retrieval Augmented Generation (RAG), which is widely used by +**Retrieval Augmented Generation (RAG)**, which is widely used by applications that leverage Large Language Models (LLMs). The pattern is as follows: @@ -63,7 +63,7 @@ oc create namespace weaviate Run the following command to allow Weaviate to run a privileged container: -> Note: this approach is not recommended for production +> Note: this approach is not recommended for production environments ``` bash oc adm policy add-scc-to-user privileged -z default -n weaviate @@ -85,9 +85,14 @@ Now that Weaviate is installed in our OpenShift cluster, let's modify the OpenTelemetry collector configuration to scrape Weaviate's Prometheus metrics. -To do so, let's add an additional Prometheus receiver to the `otel-collector-values.yaml` file: +To do so, let's add an additional Prometheus receiver creator section +to the `otel-collector-values.yaml` file: ``` yaml + receiver_creator/weaviate: + # Name of the extensions to watch for endpoints to start and stop. + watch_observers: [ k8s_observer ] + receivers: prometheus/weaviate: config: config: @@ -142,12 +147,12 @@ that we can more easily distinguish Weaviate metrics from other metrics that use `service.instance.id`, which is a standard OpenTelemetry property used in Splunk Observability Cloud. -We'll need to add this Resource processor to the metrics pipeline as well: +We'll need to add a new metrics pipeline for Weaviate metrics as well (we +need to use a separate pipeline since we don't want the `weaviate.instance.id` +metric to be added to non-Weaviate metrics): ``` yaml - service: - pipelines: - metrics/nvidia-metrics: + metrics/weaviate: exporters: - signalfx processors: @@ -158,7 +163,7 @@ We'll need to add this Resource processor to the metrics pipeline as well: - resourcedetection - resource receivers: - - receiver_creator/nvidia + - receiver_creator/weaviate ``` Before applying the configuration changes to the collector, take a moment to compare the diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md b/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md index fc060c8ad6..8ef7746290 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/9-deploy-llm-app.md @@ -5,14 +5,136 @@ weight: 9 time: 10 minutes --- -In the final step of the workshop, we'll deploy an application to our Cisco AI POD +In the final step of the workshop, we'll deploy an application to our OpenShift cluster that uses the instruct and embeddings models that we deployed earlier using the NVIDIA NIM operator. +## Application Overview + +Like most applications that interact with LLMs, our application is written in Python. +It also uses [LangChain](https://www.langchain.com/), which is an open-source orchestration +framework that simplifies the development of applications powered by LLMs. + +Our application starts by connecting to two LLMs that we'll be using: + +* `meta/llama-3.2-1b-instruct`: used for responding to user prompts +* `nvidia/llama-3.2-nv-embedqa-1b-v2`: used to calculate embeddings + +``` python +# connect to a LLM NIM at the specified endpoint, specifying a specific model +llm = ChatNVIDIA(base_url=INSTRUCT_MODEL_URL, model="meta/llama-3.2-1b-instruct") + +# Initialize and connect to a NeMo Retriever Text Embedding NIM (nvidia/llama-3.2-nv-embedqa-1b-v2) +embeddings_model = NVIDIAEmbeddings(model="nvidia/llama-3.2-nv-embedqa-1b-v2", + base_url=EMBEDDINGS_MODEL_URL) +``` + +The URL's used for both LLMs are defined in the `k8s-manifest.yaml` file: + +``` yaml + - name: INSTRUCT_MODEL_URL + value: "http://meta-llama-3-2-1b-instruct.nim-service:8000/v1" + - name: EMBEDDINGS_MODEL_URL + value: "http://llama-32-nv-embedqa-1b-v2.nim-service:8000/v1" +``` + +The application then defines a prompt template that will be used in interactions +with the LLM: + +``` python +prompt = ChatPromptTemplate.from_messages([ + ("system", + "You are a helpful and friendly AI!" + "Your responses should be concise and no longer than two sentences." + "Do not hallucinate. Say you don't know if you don't have this information." + "Answer the question using only the context" + "\n\nQuestion: {question}\n\nContext: {context}" + ), + ("user", "{question}") +]) +``` + +> Note how we're explicitly instructing the LLM to just say it doesn't know the answer if +> it doesn't know, which helps minimize hallucinations. There's also a placeholder for +> us to provide context that the LLM can use to answer the question. + +The application uses Flask, and defines a single endpoint named `/askquestion` to +respond to questions from end users. To implement this endpoint, the application +connects to the Weaviate vector database, and then invokes a chain (using LangChain) +that takes the user's question, converts it to an embedding, and then looks up similar +documents in the vector database. It then sends the user's question to the LLM, along +with the related documents, and returns the LLM's response. + +``` python + # connect with the vector store that was populated earlier + vector_store = WeaviateVectorStore( + client=weaviate_client, + embedding=embeddings_model, + index_name="CustomDocs", + text_key="page_content" + ) + + chain = ( + { + "context": vector_store.as_retriever(), + "question": RunnablePassthrough() + } + | prompt + | llm + | StrOutputParser() + ) + + response = chain.invoke(question) +``` + +## Instrument the Application with OpenTelemetry + +To capture metrics, traces, and logs from our application, we've instrumented it with OpenTelemetry. +This required adding the following package to the `requirements.txt` file (which ultimately gets +installed with `pip install`): + +```` +splunk-opentelemetry==2.7.0 +```` + +We also added the following to the `Dockerfile` used to build the +container image for this application, to install additional OpenTelemetry +instrumentation packages: + +``` dockerfile +# Add additional OpenTelemetry instrumentation packages +RUN opentelemetry-bootstrap --action=install +``` + +Then we modified the `ENTRYPOINT` in the `Dockerfile` to call `opentelemetry-instrument` +when running the application: + +``` dockerfile +ENTRYPOINT ["opentelemetry-instrument", "flask", "run", "-p", "8080", "--host", "0.0.0.0"] +``` + +Finally, to enhance the traces and metrics collected with OpenTelemetry, we added a +package named [OpenLIT](https://openlit.io/) to the `requirements.txt` file: + +```` +openlit==1.35.4 +```` + +OpenLIT supports LangChain, and adds additional context to traces at instrumentation time, +such as the number of tokens used to process the request, and what the prompt and +response were. + +To initialize OpenLIT, we added the following to the application code: + +``` python +import openlit +... +openlit.init(environment="llm-app") +``` + ## Deploy the LLM Application -Let's deploy an application to our OpenShift cluster that answers questions -using the context that we loaded into the Weaviate vector database earlier. +Use the following command to deploy this application to the OpenShift cluster: ``` bash oc apply -f ./llm-app/k8s-manifest.yaml @@ -90,18 +212,3 @@ Finally, we can see the response from the LLM, the time it took, and the number input and output tokens utilized: ![LLM Response](../images/LLMResponse.png) - -## Wrap-Up - -We hope you enjoyed this workshop, which provided hands-on experience deploying and working -with several of the technologies that are used to monitor Cisco AI PODs with -Splunk Observability Cloud. Specifically, you had the opportunity to: - -* Deploy a RedHat OpenShift cluster with GPU-based worker nodes. -* Deploy the NVIDIA NIM Operator and NVIDIA GPU Operator. -* Deploy Large Language Models (LLMs) using NVIDIA NIM to the cluster. -* Deploy the OpenTelemetry Collector in the Red Hat OpenShift cluster. -* Add Prometheus receivers to the collector to ingest infrastructure metrics. -* Deploy the Weaviate vector database to the cluster. -* Instrument Python services that interact with Large Language Models (LLMs) with OpenTelemetry. -* Understand which details which OpenTelemetry captures in the trace from applications that interact with LLMs. diff --git a/content/en/ninja-workshops/14-cisco-ai-pods/_index.md b/content/en/ninja-workshops/14-cisco-ai-pods/_index.md index ddaac0904e..d2fc4f17bf 100644 --- a/content/en/ninja-workshops/14-cisco-ai-pods/_index.md +++ b/content/en/ninja-workshops/14-cisco-ai-pods/_index.md @@ -16,16 +16,29 @@ scalable, and efficient AI-ready infrastructure tailored to diverse needs. **Splunk Observability Cloud** provides comprehensive visibility into all of this infrastructure along with all the application components that are running on this stack. +The steps to configure Splunk Observability Cloud for a Cisco AI POD environment are fully +documented (see [here](https://github.com/signalfx/splunk-opentelemetry-examples/tree/main/collector/cisco-ai-ready-pods) +for details). + +However, it's not always possible to get access to a Cisco AI POD environment to practice +the installation steps. + This workshop provides hands-on experience deploying and working with several of the technologies -that are used to monitor Cisco AI PODs with Splunk Observability Cloud, including: +that are used to monitor Cisco AI PODs with Splunk Observability Cloud, without requiring +access to an actual Cisco AI POD. This includes: -* Practice deploying an OpenTelemetry Collector in a Red Hat OpenShift cluster. -* Practice configuring Prometheus receivers with the collector to ingest infrastructure metrics. -* Practice instrumenting Python services that interact with Large Language Models (LLMs) with OpenTelemetry. +* Practice deploying a **RedHat OpenShift** cluster with GPU-based worker nodes. +* Practice deploying the **NVIDIA NIM Operator** and **NVIDIA GPU Operator**. +* Practice deploying **Large Language Models (LLMs)** using NVIDIA NIM to the cluster. +* Practice deploying the **OpenTelemetry Collector** in the Red Hat OpenShift cluster. +* Practice adding **Prometheus** receivers to the collector to ingest infrastructure metrics. +* Practice deploying the **Weaviate** vector database to the cluster. +* Practice instrumenting Python services that interact with Large Language Models (LLMs) with **OpenTelemetry**. +* Understanding which details which OpenTelemetry captures in the trace from applications that interact with LLMs. -While access to an actual Cisco AI POD isn't required, the workshop **does** require access -to an AWS account. We'll walk you through the steps of creating a Red Hat OpenShift -cluster in AWS that we'll use for the rest of the workshop. +> Please note: Red Hat OpenShift and NVIDIA AI Enterprise components +> are typically pre-installed with an actual AI POD. However, because we’re using AWS for this workshop, +> it’s necessary to perform these setup steps manually. {{% notice title="Tip" style="primary" icon="lightbulb" %}} The easiest way to navigate through this workshop is by using: diff --git a/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml b/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml index ba58fb508e..82f594496d 100644 --- a/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml +++ b/workshop/cisco-ai-pods/otel-collector/otel-collector-values-with-weaviate.yaml @@ -147,6 +147,10 @@ agent: - targets: - '`endpoint`:8000' rule: type == "pod" && labels["app"] == "meta-llama-3-2-1b-instruct" + receiver_creator/weaviate: + # Name of the extensions to watch for endpoints to start and stop. + watch_observers: [ k8s_observer ] + receivers: prometheus/weaviate: config: config: @@ -160,6 +164,17 @@ agent: service: pipelines: metrics/nvidia-metrics: + exporters: + - signalfx + processors: + - memory_limiter + - filter/metrics_to_be_included + - batch + - resourcedetection + - resource + receivers: + - receiver_creator/nvidia + metrics/weaviate: exporters: - signalfx processors: @@ -170,4 +185,4 @@ agent: - resourcedetection - resource receivers: - - receiver_creator/nvidia \ No newline at end of file + - receiver_creator/weaviate \ No newline at end of file