<img src="https://opea.dev/wp-content/uploads/sites/9/2024/04/opea-horizontal-color.svg" alt="OPEA Logo">

# Deploy and Learn ChatQnA using OPEA on Intel Tiber AI Cloud 
## Kubernetes Deployment

This notebook will instruct you to deploy a RAG chatbot based on the OPEA ChatQnA blueprint using Kubernetes.

## I. Introduction
### The Open Platform for Enterprise AI (OPEA) Project 
OPEA uses microservices to create high-quality GenAI applications for enterprises, simplifying the scaling and deployment process for production. These microservices leverage a service composer that assembles them into a megaservice thereby creating real-world enterprise AI applications.

It’s important to familiarize yourself with the key elements of OPEA:

1. [**GenAIComps**](https://github.com/opea-project/GenAIComps) 
A collection of microservice components that form a service-based toolkit. Each microservice is designed to perform a specific function or task within the GenAI application architecture. By breaking down the system into these smaller, self-contained services, microservices promote modularity, flexibility, and scalability. This modular approach allows developers to independently develop, deploy, and scale individual components of the application, making it easier to maintain and evolve over time. All of the microservices are containerized, allowing cloud native deployment. Here, you will find contributions to multiple partners/communities to further construction.

2. [**GenAIExamples**](https://github.com/opea-project/GenAIExamples)
While *GenAIComps* offers a range of microservices, *GenAIExamples* provides practical, deployable solutions to help users implement these services effectively. This repo provides use-case-based applications that demonstrate how the OPEA architecture can be leveraged to build and deploy real-world GenAI applications. In the repo, developers can find practical resources such as Docker Compose files and Kubernetes Helm charts, which help streamline the deployment and scaling of these applications. These resources allow users to quickly set up and run the examples in local or cloud environments, ensuring a seamless experience.

## II. Prepare the Environment

This notebook uses Kubernetes as an engine, you can deploy this example on any of the available Cloud providers following [this guide]([here](https://opea-project.github.io/latest/getting-started/README.html)). 

Once your plaftorm is provisioned, follow the below steps to run the example:

### Clone the GenAIExamples Repo

As mentioned, end-to-end blueprints are provided on the OPEA [GenAIExamples repo](link). You can view other examples available like AgentsQnA, AudioQnA, and MultimodalQnA among others.

Clone the repo:

In [1]:
!git clone https://github.com/opea-project/GenAIExamples.git

Cloning into 'GenAIExamples'...
remote: Enumerating objects: 33071, done.[K
remote: Counting objects: 100% (905/905), done.[K
remote: Compressing objects: 100% (723/723), done.[K
remote: Total 33071 (delta 545), reused 182 (delta 182), pack-reused 32166 (from 5)[K
Receiving objects: 100% (33071/33071), 85.89 MiB | 36.62 MiB/s, done.
Resolving deltas: 100% (19021/19021), done.


## III. Deployment

Let's now explore the OPEA ChatQnA RAG deployment. As mentioned, it's a microservices blueprint designed for scalability, resilience, and flexibility. In this task you will explore each microservice, the purpose of exploring each microservice is to help you understand how each component contributes to the overall application. This learning path will guide you through the system, illustrating the role of each service and how they work together.

Each service can scale individually based on demand, optimizing resources and performance. Additionally, microservices improve fault isolation—if one service fails, it doesn’t disrupt the entire system. This architecture supports efficient maintenance, rapid updates, and adaptability, making it ideal for responding to changing business needs and user demands.

Every OPEA configuration is built on three main parts:

![microservices-arch](./Images/microservices-arch.png)

- **Megaservice**: Microservice "orchestrator". When deploying an end-to-end application with multiple parts involved, there is needed to specify how the flow will be within the microservices. You can learn more from [OPEA documentation](https://github.com/opea-project/GenAIComps?tab=readme-ov-file#megaservice)

- **Gateway**: A gateway is the interface for users to access to the `megaservice` It acts as the entry point for incoming requests, routing them to the appropriate Microservices within the megaservice architecture.

- **Microservice**: Each individual microservice part of the end-to-end application like: **embeddings**, **retrievers**, **LLM** and **vector databases** among others.

### 1. Deploy ChatQnA on Kubernetes Cluster with Xeon Node

To deploy ChatQnA on a Xeon node in a Kubernetes cluster, we will use [this Helm Chart](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/helm/README.md). 

Run `helm install` to deploy the Blueprint with your Hugging Face API Token:

In [5]:
%cd /home/devcloud/GenAI-Workshops/GenAIExamples/ChatQnA/kubernetes/helm
!helm install chatqna oci://ghcr.io/opea-project/charts/chatqna --set global.HUGGINGFACEHUB_API_TOKEN="your_huggingface_key" -f cpu-values.yaml

/home/devcloud/GenAIExamples/ChatQnA/kubernetes/helm
Pulled: ghcr.io/opea-project/charts/chatqna:1.3.0
Digest: sha256:47ec07bc392d2c602c719d9132eff50ca0fef44de4fce28e31d7bca12a9bb5a2
NAME: chatqna
LAST DEPLOYED: Thu May  1 18:25:24 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1


The application should take ~5 minutes to deploy.

### 2.Examine the Cluster Resources

Once the application has been deployed, launch a new terminal session and use the following `kubectl` commands to verify the deployment was successful. 

#### 2.1 List all pods in the default namespace

In [9]:
!kubectl get pods

NAME                                       READY   STATUS    RESTARTS   AGE
chatqna-868d98c5bf-vb2bk                   1/1     Running   0          8m6s
chatqna-chatqna-ui-ffd74c8d8-b7tm7         1/1     Running   0          8m6s
chatqna-data-prep-59849c8885-xn8pq         1/1     Running   0          8m6s
chatqna-nginx-6c855d856c-4nds9             1/1     Running   0          8m6s
chatqna-redis-vector-db-8566ffdb78-f8d62   1/1     Running   0          8m6s
chatqna-retriever-usvc-57c8c4c7d5-9rfrd    1/1     Running   0          8m6s
chatqna-tei-9c46456c7-z89lc                1/1     Running   0          8m6s
chatqna-teirerank-5d4c49cd8d-sq7jc         1/1     Running   0          8m6s
chatqna-vllm-59dc97d46-x9wmg               1/1     Running   0          8m6s


The output should display all pods "Running" (1/1)
```
NAME                                       READY   STATUS    RESTARTS   AGE
chatqna-77cfbfc775-8wx4d                   1/1     Running   0          6m18s
chatqna-chatqna-ui-bf9dd98cc-p4jnf         1/1     Running   0          6m17s
chatqna-data-prep-7c59568774-24pnq         1/1     Running   0          6m17s
chatqna-nginx-96fc84d58-nkrjn              1/1     Running   0          6m17s
chatqna-redis-vector-db-8566ffdb78-wnpsr   1/1     Running   0          6m18s
chatqna-retriever-usvc-f55d8c7f9-p7scr     1/1     Running   0          6m18s
chatqna-tei-7698c7bb79-mk5gk               1/1     Running   0          6m17s
chatqna-teirerank-759cc946-zb7l6           1/1     Running   0          6m17s
chatqna-vllm-9dc4c5ff4-d9cbh               1/1     Running   0          6m17s
```
This confirms that your application was succesfully deployed, and you are now ready to explore your application deployment and manage resources within the cluster.

Before start exploring, consider that only the gateway and UI services are exposed externally. In this task, you'll access each internal microservice directly to run tests, using the gateway (Nginx) to streamline access to these internal services.

<div class="alert alert-block alert-info">
<b>Important:</b> You'll need to take note of all pods deployed.
</div>

`kubectl get svc` lists all services in a Kubernetes cluster, showing their names, types, cluster IPs, and exposed ports. It provides an overview of how applications are exposed for internal or external access.

Run the following command. You will see output similar to this:
```
NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
chatqna                   ClusterIP   10.96.75.63     <none>        8888/TCP            4m52s
chatqna-chatqna-ui        ClusterIP   10.96.23.22     <none>        5174/TCP            4m52s
chatqna-data-prep         ClusterIP   10.96.47.193    <none>        6007/TCP            4m52s
chatqna-nginx             NodePort    10.96.194.75    <none>        80:31205/TCP        4m52s
chatqna-redis-vector-db   ClusterIP   10.96.78.94     <none>        6379/TCP,8001/TCP   4m52s
chatqna-retriever-usvc    ClusterIP   10.96.177.32    <none>        7000/TCP            4m52s
chatqna-tei               ClusterIP   10.96.102.249   <none>        80/TCP              4m52s
chatqna-teirerank         ClusterIP   10.96.47.217    <none>        80/TCP              4m52s
chatqna-vllm              ClusterIP   10.96.161.243   <none>        80/TCP              4m52s
kubernetes                ClusterIP   10.96.0.1       <none>        443/TCP             8d
```

In [10]:
!kubectl get svc

NAME                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
chatqna                   ClusterIP   10.96.246.79    <none>        8888/TCP            8m16s
chatqna-chatqna-ui        ClusterIP   10.96.239.113   <none>        5174/TCP            8m16s
chatqna-data-prep         ClusterIP   10.96.80.218    <none>        6007/TCP            8m16s
chatqna-nginx             NodePort    10.96.180.42    <none>        80:31883/TCP        8m16s
chatqna-redis-vector-db   ClusterIP   10.96.231.223   <none>        6379/TCP,8001/TCP   8m16s
chatqna-retriever-usvc    ClusterIP   10.96.215.190   <none>        7000/TCP            8m16s
chatqna-tei               ClusterIP   10.96.141.158   <none>        80/TCP              8m16s
chatqna-teirerank         ClusterIP   10.96.59.62     <none>        80/TCP              8m16s
chatqna-vllm              ClusterIP   10.96.160.181   <none>        80/TCP              8m16s
kubernetes                ClusterIP   10.96.0.1       <none>  

The command `kubectl get svc` is used to view the services in a Kubernetes cluster, which are like entry points for accessing your applications. Each service has a name (such as chatqna or chatqna-ui) that identifies it. Services can be exposed in different ways: for example, a ClusterIP service is only accessible within the cluster, while a NodePort service is accessible externally through a specific port on the node. The Cluster-IP is the internal address used by other parts of the system to reach the service. If the service were available from outside the cluster, you would see an External-IP, but in this case, it’s because these services are internal. The Ports column shows which network ports the service listens on, like 8888/TCP for chatqna or 80:30144/TCP for chatqna-nginx, indicating how traffic is directed to the service. Finally, the Age tells you how long the service has been running, which in this example is 12 hours for all listed services.

Now, let’s explore the architecture.

### 3 Explore Microservices


Each microservice follows the following logic performing a task within the RAG flow:

!["opea-rag"](./Images/opea_rag.png)

In the flow, you can observe the microservices and we can divide the RAG flow into two steps:

1. **Preprompting**: This step involves preparing the knowledge base (KB) by uploading relevant documents and ensuring that the information is organized for effective retrieval.

2. **Prompting**: This step focuses on retrieving the relevant data from the knowledge base and using it to generate an accurate answer to the user's question.

Preprompting
In this step, the logic is to start from a document (Nike's revenue PDF), and do the preprocessing needed to make it ready to be stored in a database. As shown, this process primarily involves 3 microservices: `data preparation`, `embeddings` and `vector store`. Let's explore each microservice.

!["opea-embedding"](./Images/pre_flow.png)

#### 3.1 Embedding Microservice (POD:chatqna-tei:80)
An **embedding** is a numerical representation of an object—such as a word, phrase, or document in a continuous vector space. In the context of natural language processing (NLP), embeddings represent words, sentences, or other pieces of text as a set of numbers (or a "vector") that capture their meaning, relationships, and context. By transforming text into this format, embeddings make it easier for machine learning models to understand and work with text data.

For example, the following image shows how word embeddings represent words as points in a vector space based on their relationships. Words with similar meanings, like "king" and "queen" are closer together, and the embedding model captures these connections through vector arithmetic.

During training, if the model sees "king" often used with "man" and "queen" with "woman," it learns that "king" and "queen" relate similarly to "man" and "woman." So, it positions these words in ways that reflect gender relationships in language.

!["opea-embedding"](./Images/king_vs_queen.png)

Embeddings are a key component for RAG:

• **Capturing Meaning**: Embeddings represent the semantic relationships between words, allowing RAG models to understand context and nuances in language, enhancing their ability to generate relevant responses.

• **Dimensionality Reduction**: By converting complex information into fixed-size vectors, embeddings streamline data processing, making RAG systems more efficient and faster.

• **Improving Model Performance**: Embeddings enable RAG models to generalize better by leveraging semantic similarities, facilitating more accurate information retrieval, and improving the quality of generated content.

OPEA provides multiple options to run your embeddings microservice, as detailed in the [OPEA embedding documentation](https://github.com/opea-project/GenAIComps/tree/main/comps/embeddings):

In this case, ChatQnA uses [Hugging Face TEI](https://huggingface.co/docs/text-embeddings-inference/en/index) microservice running the embedding model `BAAI/bge-large-en-v1.5` locally.

To explore the microservices that are not exposed, you will use the nginx pod to contact them via curl. To do that, the microservice will be accessed using each microservice's internal DNS name. Above, you used the command `kubectl get svc` to list out all of the services. These are the services you will interact with.



1. Save your ngnix POD name dynamically: 

In [49]:
# Fetch the pod name dynamically
nginx_pod = !kubectl get pods --no-headers | grep chatqna-nginx | awk '{print $1}'

# Assign the pod name to a variable
NGINX_POD_NAME = nginx_pod[0].strip()

# Print the pod name
print(f"chatqna-nginx pod: {NGINX_POD_NAME}")

chatqna-nginx pod: chatqna-nginx-6c855d856c-4nds9


2. This cell executes a command inside the NGINX pod using kubectl exec {NGINX_POD_NAME}, running a curl command to send a POST request to the chatqna-tei service on port 80, specifically to the /embed endpoint. The request includes a JSON payload {"inputs": "What was Deep Learning?"} with the Content-Type set to application/json. This sends the question to the service and expects a response.

In [12]:
!kubectl exec {NGINX_POD_NAME} -- \
  curl chatqna-tei:80/embed \
    -X POST \
    -d '{{"inputs":"What was Deep Learning?"}}' \
    -H 'Content-Type: application/json'


Error from server (NotFound): pods "**chatqna-nginx-xxxxxxxx**" not found


The answer will be the vector representation of the phrase *"What was Deep Learning?"*. This service returns the vector embedding for the `inputs` from the REST API. Your output should be similar to:


[[-0.023710195,-0.08148822,-0.011833995,-0.013103603,0.03225608,0.03217334,0.021610165,0.0024520482,0.0037452902,0.0009913084,-0.0007373425,-0.009049557,-0.022225471,0.059282735,0.0015446938,0.043743875,0.048941173,-0.031222869,0.015145732,0.04897871,-0.044813927,-0.038201377,-0.023877261,0.0149894,-0.008174257,-0.0019073898,-0.0390097,-0.007030209,-0.024132196,0.0071163094,0.025593162,-0.005130063,0.027108055,-0.0228188,-0.0056549883,0.05177521,-0.0333054,-0.020919004,-0.034191865,-0.012971215,-0.023654664,-0.02837265,0.009197619,0.057193104,-0.07975216,0.026485689,-0.12581713,0.009131276,-0.0313662,-0.001952751,-0.065003864,0.036310244,0.035184156,-0.015537369,0.019016786,0.0178488,0.019259064,-0.063626096,0.024626829,-0.053468592,0.042758156,0.033729296,0.0050318707,-0.030986601,-0.021742092,-0.032852773,0.028824205,0.033689663,-0.023948876,-0.0010843357,-0.035094015,0.028699957,-0.010649086,-0.030819722,-0.0474478,-0.021584447,0.028958177,-0.00060664554,0.06114755,0.0036269827,-0.0059561995,0.06797062,0.04597565,0.03640549,0.0064445073,0.0041352552,0.0036136354,-0.0087553905,-0.028449943,0.030705007,-0.0072118416,-0.05138315,0.06476314,0.045562487,0.026895812,0.0025623275,0.010394047,-0.013916111,-0.0139519945,-0.01382087,-0.026031459,-0.05969645,-0.0013513154,-0.01759602,-0.053991634,0.034480717,0.055390775,-0.031407505,-0.001847779,-0.013566701,0.008991266,-0.00018419209,-0.007190428,-0.038518123,-0.038866702,0.0013241224,-0.019291315,0.0028586101,-0.008632843,-0.034256846,0.02559171,0.035738718,0.0008387488,0.020518998,-0.03785482,0.014829773,0.0036668868,0.0067214966,0.0044187587,-0.031076934,0.01578846,0.04045302,-0.024264196,0.020984517,-0.020259375,-0.004442199,-0.008197779,0.019991359,-0.014317712,0.004950507,-0.0035367731,-0.054379597,0.02487655,-0.029523365,0.05321245,-0.017687801,0.012491009,0.002558927,-0.018345125,-0.009991941,-0.050696842,0.025198728,-0.0014215494,-0.03658106,0.03651136,0.04140016,-0.017643161,-0.0005718487,0.02182391,0.0111763375,0.055717193,-0.0062563014,-0.046000328,-0.0014167204,-0.027042687,-0.076925345,0.03289928,0.004259275,-0.0124244075,0.040374175,0.008832916,-0.018287478,0.006338838,0.0024678563,-0.023520628,0.060186297,0.03759297,0.0042930334,0.024531908,0.047765695,-0.05991994,0.04196931,0.004024324,-0.006406074,-0.03449224,-0.017062612,0.056651738,-0.016058922,0.03655039,0.00649314,-0.0769699,-0.035269324,-0.07046546,-0.043007113,0.057475436,-0.0055873967,-0.013843776,0.06104461,-0.008370892,0.051700592,0.044972915,-0.052151296,0.009810199,-0.01245134,-0.018547501,0.014398181,0.005242409,-0.034299035,0.0073411763,0.013766722,0.002870496,-0.029381251,-0.009327514,0.027498582,0.047042873,0.04285755,-0.0046034246,-0.006195708,0.018310737,-0.041717965,0.0016858273,-0.06260107,0.018344637,-0.009742541,0.016473629,0.085486054,0.024218677,-0.061662544,0.025454389,-0.04010062,0.0044191177,0.020616634,-0.005529166,0.019804625,0.023157865,-0.031184264,-0.0008052172,-0.009619368,0.014241649,-0.03177812,-0.04705182,0.010511968,-0.037947312,0.05845215,-0.008232851,-0.005377798,0.03225613,-0.021099757,-0.0097234165,-0.032244213,-0.010540359,-0.006832381,0.01543217,0.034417413,-0.026191328,-0.04267847,0.0039692065,-0.025228977,0.08764385,0.027860355,-0.04189183,0.054626107,-0.018519841,0.024220878,-0.007946564,-0.06748631,-0.037955496,-0.037656743,0.08753071,0.0044769873,0.048963398,0.00595179,0.063272014,0.0037539322,0.0037618428,-0.012514368,0.035229623,0.03914925,-0.011679136,-0.02861589,-0.032357816,0.074100435,-0.01773245,-0.0051406883,-0.0033784236,-0.025145702,-0.030435042,-0.008966449,0.0098071,0.016603075,0.052116033,0.054036777,-0.062122524,0.033059463,0.009359139,0.057286624,0.04912816,-0.02398432,-0.01657247,0.028583856,-0.016427357,0.04420282,-0.035156015,0.022438653,0.020802362,-0.0030624685,0.07000028,-0.017238729,-0.22062036,-0.002142006,0.024335966,-0.035231177,0.030594563,-0.029476438,-0.017723352,-0.010903864,0.02030686,-0.021617113,-0.022129303,-0.023269475,0.0348043,0.04402816,0.017283846,-0.020743044,0.008242776,-0.020794153,-0.039565664,0.03257843,-0.025172573,-0.03834919,-0.017783038,0.07175087,-0.030864824,0.03816106,-0.089035474,-0.029925061,-0.009083582,-0.00698752,-0.03933274,-0.0070275483,-0.030415988,0.067421846,0.03151299,0.021283513,0.047587164,-0.017141495,0.0052215853,0.019963583,-0.033051874,-0.056258913,0.0010080345,-0.030499624,0.061279185,0.012017668,-0.08298872,-0.04161445,-0.01283972,0.007441518,-0.020883625,-0.022643201,-0.07242374,0.010890937,0.006302764,-0.0062194513,-0.0029717432,-0.010633106,-0.03262164,0.040765356,-0.0012766268,-0.0037077277,-0.016123135,-0.06340036,-0.009561002,-0.029439839,-0.059008893,-0.03773212,0.021425677,0.056241333,0.008804404,0.06286266,-0.088788174,-0.052432347,0.01817534,-0.008947213,-0.0329187,-0.006669781,0.06191604,0.046460148,0.027003793,-0.05151388,0.032795202,0.0125918705,-0.016183345,-0.031516496,0.03933403,-0.029550089,-0.031253498,-0.032718692,0.06422632,-0.014436918,-0.043533012,-0.007633119,0.046506356,0.015001917,-0.023972446,0.014189182,-0.007379083,0.021105478,-0.03633717,-0.007962941,-0.04251248,-0.008576191,-0.04162688,-0.02230984,-0.0258014,0.013170887,0.09087547,-0.0020530745,-0.031969845,-0.0024558043,0.027985698,0.0060017705,-0.000562643,-0.022917597,0.03401587,-0.0000054323737,0.023011904,-0.03365612,-0.037789755,0.057241406,0.011771531,0.042114787,0.019056885,-0.03157243,0.00888694,0.011054548,0.013141342,-0.014344001,0.009336836,0.004760058,0.009488681,0.010106094,0.007067888,-0.027513519,-0.0113042295,-0.020051388,-0.011384742,0.021870881,-0.029006856,0.0019407797,0.03539332,-0.012933551,-0.016940223,0.047668412,-0.047148567,0.05118395,0.0009750632,-0.01483743,0.004283925,-0.029744295,0.03703819,-0.051139764,0.015311736,-0.01778989,0.032255325,-0.023936557,-0.028815426,-0.04313482,0.022725945,0.038985487,0.0065415343,0.013287402,0.061896574,0.061252307,-0.012683512,-0.05693563,0.029727783,-0.0066819093,0.05744297,-0.041322272,-0.019086666,-0.008557206,-0.023176523,0.06481992,0.028162986,-0.023733819,0.009322905,0.040884033,0.031288005,0.037437975,-0.009257693,0.052345254,0.03987488,0.02441971,-0.045225486,0.010410985,-0.07197068,0.042535298,-0.03850563,0.026438631,-0.0045903856,-0.011660455,-0.033316534,-0.027848803,-0.027894478,0.026681833,0.009340354,-0.021983463,-0.053925004,-0.021468934,-0.021770906,-0.033255637,0.04406892,-0.08345803,0.034425706,0.045847535,0.029117176,0.040450677,-0.005921423,-0.016552173,-0.08430133,-0.06921953,-0.009734963,-0.035490382,-0.0026459773,0.03596967,0.022605611,-0.03904714,0.012653206,0.03744217,-0.017207677,-0.0065009007,-0.059362598,-0.023905626,0.013897632,0.028750034,-0.0043723695,0.012497205,-0.008108207,-0.008667878,0.02465352,-0.009819706,-0.03860482,0.019075295,-0.022434661,0.025446549,-0.042336818,-0.013495604,0.04012873,0.04495422,-0.06326355,0.022407034,0.010998928,0.003672159,-0.010773816,0.02044654,-0.028323673,-0.020535188,0.0386491,0.0015296023,-0.045522105,-0.054048724,0.015725048,-0.059536386,-0.0214635,-0.042009655,-0.060442887,-0.0075640134,-0.035598457,0.037608746,-0.019807313,0.026765134,0.024931455,0.020261364,-0.017036837,-0.0057015936,0.06723001,-0.031585295,-0.029510725,-0.07189755,-0.04650206,0.10601134,0.039295647,-0.007118284,0.03158989,0.003853985,0.05133923,-0.050666686,-0.017866874,0.016640322,-0.0023069964,-0.03978742,-0.008242982,-0.057782274,0.015133074,-0.030543745,-0.016153658,-0.0209039,-0.0595867,-0.033143807,0.009307573,-0.018703107,-0.028327255,-0.0104911,0.015958965,0.04407742,0.042483002,0.007637008,-0.012460292,0.0391056,0.017478941,0.02813227,-0.005398513,0.021293351,-0.028082274,-0.045980457,0.04889096,-0.023501735,-0.0012040908,-0.04453157,0.025732191,0.035513382,-0.054322228,-0.01982989,0.027708812,-0.04937288,-0.03853107,0.014140877,-0.0135762505,0.03746455,0.053764377,0.0070808055,0.03306796,0.060022175,-0.0052013905,-0.014056995,0.044178016,0.040008448,0.012244494,0.052718624,-0.031585857,0.06194147,-0.030137295,-0.070742086,0.029907167,0.008841426,0.01443928,0.010861106,0.011635083,0.027964026,0.03316496,0.043541998,-0.02170728,0.08193672,0.046128508,0.07117198,-0.004806335,0.04338613,0.0039043112,-0.021151679,0.012276268,-0.016522592,0.021869713,-0.04306371,0.035293885,0.07935932,0.016924532,-0.022710469,0.0020245728,0.031235067,0.12376284,-0.036657907,-0.017311526,0.017166335,-0.021663109,-0.057725515,-0.012611832,0.01019963,-0.012944098,0.023884926,-0.015440525,-0.04515663,0.016207391,-0.02798075,0.024741275,-0.04269331,-0.010793481,0.018434757,0.004721982,0.026771681,-0.0152301425,-0.012998542,0.015372419,-0.023443772,-0.07275291,0.051202424,0.030133842,-0.050597835,-0.024432665,-0.0544559,0.010845295,0.08630643,-0.040947415,0.03845247,0.020904457,0.0128197735,-0.004514017,-0.008073077,0.02713288,0.019058729,-0.031990767,0.04960329,-0.093600035,-0.033925023,0.0061860476,0.02122237,-0.040091686,-0.01881811,0.0428063,-0.09461213,-0.049540088,0.07094072,0.0016965282,0.06530026,0.037488468,0.009942349,0.004161143,0.03004359,-0.0664208,0.038173754,-0.025619416,-0.01052384,0.006216995,0.047157485,0.074674875,0.042288795,0.038156427,0.016072009,-0.032578662,-0.045641612,-0.028890984,0.004865866,-0.0190527,-0.023803959,0.035474915,0.05975587,-0.07169196,-0.09100254,0.049740348,0.0012100978,0.03795402,0.01217608,-0.01831157,-0.07264504,0.02764745,-0.026418626,-0.013597805,0.05097279,0.039813474,0.011639431,0.018587641,-0.056167863,-0.016710455,-0.00079395995,0.0035948216,-0.030199083,-0.02533686,-0.004371663]]/

#### 3.2 Vector Database Microservice (POD:chatqna-redis-vector-db:80)
The Vector Database microservice is a crucial component in the RAG application as it stores and retrieves embeddings. This is especially useful in applications like ChatQnA (RAG), where relevant information must be retrieved quickly based on the user's query.

**Using Redis as a Vector Database**
In this Task, you use Redis as the vector database. You can find all of the supported alternatives in the OPEA vector store repository 

A Vector Database (VDB) is a specialized database designed to store and manage high-dimensional vectors—numeric representations of data points like words, sentences, or images. In AI and machine learning, these vectors are typically embeddings, which capture the meaning and relationships of data in a format that algorithms can process efficiently, as we have shown before.

#### 3.3 Data Preparation Microservice(POD:chatqna-data-prep:6007)
The Dataprep Microservice is responsible for preparing data in a digestible format for the application, converting it to embeddings, using the embedding microservice, and loading it to the database. This service preprocesses/transforms the data, making sure it is clean, organized, and suitable for further processing.

Specifically, this microservice receives data (such as documents), processes it by breaking it into chunks, sends it to the embedding microservice, and stores these vectors in the vector database. The microservice's functionality may depend on the specific vector database being used, as each database has its own requirements for data formatting

To test it and help the model answer the initial question *What was Nike revenue in 2023?*, you will need to upload a context file ([revenue report](https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/third_parties/pathway/src/data/nke-10k-2023.pdf)) to be processed.

Execute the following command to download a sample Nike revenue report to the nginx pod (if you are no longer logged in to the nginx pod, be sure to use the above command to log in again):

1. Download the document to the microservice:

In [None]:
!kubectl exec {NGINX_POD_NAME} -- \
  curl -C - -O https://raw.githubusercontent.com/opea-project/GenAIComps/main/comps/third_parties/pathway/src/data/nke-10k-2023.pdf

2. Feed the knowledge base (Vectord) with the document (It will take ~30 seconds):

In [None]:
!kubectl exec {NGINX_POD_NAME} -- \
  curl -X POST "chatqna-data-prep:6007/v1/dataprep/ingest" \
     -H "Content-Type: multipart/form-data" \
     -F "files=@./nke-10k-2023.pdf"

After running the previous command, you should receive a confirmation message like the one below. This command updated the knowledge base by uploading a local file for processing.
```
    {
        "status": 200,
        "message": "Data preparation succeeded"
    }
```

The data preparation microservice API can retrieve information about the list of files stored in the vector database.

3. Verify the document was uploaded:

In [None]:
!kubectl exec {NGINX_POD_NAME} -- \
curl -X POST chatqna-data-prep:6007/v1/dataprep/get \
     -H "Content-Type: application/json"

After running the previous command, you should receive the confirmation message.
```
    {
        "name": "nke-10k-2023.pdf",
        "id": "nke-10k-2023.pdf",
        "type": "File",
        "parent": ""
    }
```
Congratulations! You've successfully prepared your knowledge base. Now you'll explore the microservices involved in prompt handling.

### 4. Prompting
Once the knowledge base is set up, you can begin interacting with the application by asking it context-specific questions. RAG plays a crucial role in ensuring the responses are accurate and grounded in relevant data.

The process starts with the application retrieving the most relevant information from the knowledge base in response to the user's query. This step ensures the LLM has up-to-date and precise context to answer the user's query.

Next, the retrieved information is combined with the input prompt that is sent to the Large Language Model (LLM). This enriched prompt allows the model to generate answers that are informed by both the external data and its pre-trained knowledge.

Finally, you will see how the LLM utilizes the enriched prompt to generate a coherent and contextually accurate response. By leveraging RAG, the application effectively delivers answers that are tailored to the user's query, grounded in the most relevant and up-to-date information from the knowledge base.

The microservices involved in this stage are `embeddings`, `vector db`, `retriever`, `reranking` and finally the `LLM`.

#### 4.1 Retriever Microservice (POD:chatqna-retriever-usvc:7000)
The Retriever Microservice locates the most relevant information within the knowledge base and returns similar documents to the user's question. It is designed to work with a number of back-end systems that store knowledge and provide APIs to retrieve data that (hopefully!) best matches the intent the user had when asking his or her question. Different knowledge bases provide different APIs for retrieving relevant information. Vector databases provide vector similarity for embeddings from the source documents and a vector embedding for the user's question. Graph databases use graph locality to find matches. Relational databases use string and regular expression matching to find matches.

In this task, you use the Redis vector database and access the vector database through Redis retriever.

Of course, you need to have a vector embedding for the retrieval query. You can generate an embedding for the user's question, *"What was Nike revenue in 2023?"*, to test the retriever against the Nike revenue information you loaded in the previous step.

To create the embedding, use the `chatqna-tei` microservice (again, make sure you are logged in to the NGINX pod).

1. Create the embedding and save locally on the Nginx pod (`embed_question=/tmp/embed.txt`).The command also checks to see if your embedding was saved.You should be able to see the vectors the embeddings microservice generated. You are now able to use the retriever microservice to get the most similar information from your knowledge base.

In [None]:
cmd = f"""
kubectl exec {NGINX_POD_NAME} -- sh -c '
  embed_question=$(curl -X POST chatqna-tei:80/embed -d "{{\\"inputs\\": \\"What is Nike revenue in 2023?\\"}}" -H "Content-Type: application/json");
  echo "$embed_question" > /tmp/embed.txt;
  cat /tmp/embed.txt
'
"""
!{cmd}

You should get the details about the writing task:
```
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  9571  100  9524  100    47   115k    583 --:--:-- --:--:-- --:--:--  118k
```

3. Get and save similar vectors from the initial `embed_question` locally `similar_docs`:

In [None]:
# 1. Embed question (as you did)
embed_question = !kubectl exec {NGINX_POD_NAME} -- sh -c 'cat /tmp/embed.txt'
embed_question = embed_question[0].strip()

# 2. Safely create JSON inside the pod using printf
cmd_save_similar_docs = f"""
kubectl exec {NGINX_POD_NAME} -- sh -c '
  json=$(printf "{{\\\"text\\\": \\\"test\\\", \\\"embedding\\\": {embed_question}}}");
  curl -X POST chatqna-retriever-usvc:7000/v1/retrieval -H "Content-Type: application/json" -d "$json" > /tmp/similar_docs.json
'
"""

# 3. Run the command
!{cmd_save_similar_docs}

# 4. Now read the result
!kubectl exec {NGINX_POD_NAME} -- sh -c 'cat /tmp/similar_docs.json'


```
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16519  100  6967  100  9552   203k   278k --:--:-- --:--:-- --:--:--  504k
```

By looking at the previous output, you can see the most similar passages (TOP_3) from the document Nike revenue report  and the question *"What was the Nike revenue in 2023?"*.

<div class="alert alert-block alert-info">
⚠️ The following output has been formatted for better readability. Your results will be presented in plain text and may vary slightly due to the similarity search algorithm. However, you can double check that the retrieved documents will be relevant to your initial query.
</div>

```
{
    "id":"d224b5d9c935f7ced04180c7f6210518",
    "retrieved_docs":[{
        "downstream_black_list": [],
        "id": "ce9eac2f55a9f6e492c32b51f66e88ee",
        "text": "discounts, largely due to strategic pricing actions and product mix.\n• Selling and administrative expense increased 15% due to higher operating overhead and demand creation expense. The increase in operating overhead expense\nwas primarily due to higher wage-related costs and higher NIKE Direct variable costs, in part due to new store additions. Demand creation expense increased\nprimarily due to higher sports marketing expense and an increase in digital marketing.\n2023 FORM 10-K 37.Table of Contents\nEUROPE, MIDDLE EAST & AFRICA\n(Dollars in millions)\nFISCAL 2023 FISCAL 2022\n% CHANGE\n% CHANGE\nEXCLUDING\nCURRENCY\nCHANGES FISCAL 2021\n% CHANGE\n% CHANGE\nEXCLUDING\nCURRENCY\nCHANGES\nRevenues by:\nFootwear\n$\n8,260  $\n7,388 \n12 %\n25 % $\n6,970 \n6 %\n9 %\nApparel\n4,566 \n4,527 \n1 %\n14 %\n3,996 \n13 %\n16 %\nEquipment\n592 \n564 \n5 %\n18 %\n490 \n15 %\n17 %\nTOTAL REVENUES\n$\n13,418  $\n12,479 \n8 %\n21 % $\n11,456 \n9 %\n12 %\nRevenues by:\n \n \n \nSales to Wholesale Customers\n$\n8,522  $\n8,377 \n2 %\n15 % $\n7,812 \n7 %\n10 %\nSales through NIKE Direct\n4,896 \n4,102 \n19 %\n33 %\n3,644 \n13 %\n15 %\nTOTAL REVENUES\n$\n13,418  $\n12,479 \n8 %\n21 % $\n11,456 \n9 %\n12 %\nEARNINGS BEFORE INTEREST AND TAXES\n$\n3,531  $\n3,293 \n7 %\n$\n2,435 \n35 %  \nFISCAL 2023 COMPARED TO FISCAL 2022\n• EMEA revenues increased 21% on a currency-neutral basis, due to higher revenues in Men's, the Jordan Brand, Women's and Kids'. NIKE Direct revenues\nincreased 33%, driven primarily by strong digital sales growth of 43% and comparable store sales growth of 22%."
},
{
    "downstream_black_list": [],
    "id": "c26e3223cbefd0ec8b21e38708e10740",
    "text": "For the fiscal years ended May 31, 2023, 2022 and 2021, Global Brand Divisions revenues include NIKE Brand licensing and other miscellaneous revenues that are not\npart of a geographic operating segment. Converse Other revenues were primarily attributable to licensing businesses. Corporate revenues primarily consisted of foreign\ncurrency hedge gains and losses related to revenues generated by entities within the NIKE Brand geographic operating segments and Converse but managed through\nthe Company's central foreign exchange risk management program.\nAs of May 31, 2023 and 2022, the Company did not have any contract assets and had an immaterial amount of contract liabilities recorded in Accrued liabilities on the\nConsolidated Balance Sheets.\nSALES-RELATED RESERVES\nAs of May 31, 2023 and 2022, the Company's sales-related reserve balance, which includes returns, post-invoice sales discounts and miscellaneous claims, was\n$994 million and $1,015 million, respectively, recorded in Accrued liabilities on the Consolidated Balance Sheets. The estimated cost of inventory for expected product\nreturns was $226 million and $194 million as of May 31, 2023 and 2022, respectively, and was recorded in Prepaid expenses and other current assets on the\nConsolidated Balance Sheets.\nNOTE 15 — OPERATING SEGMENTS AND RELATED INFORMATION\nThe Company's operating segments are evidence of the structure of the Company's internal organization. The NIKE Brand segments are defined by geographic regions"
},
{
    "downstream_black_list": [],
    "id": "51e810222faae0fe7120bc5137e1b8bd",
    "text": "of Income. Total NIKE, Inc. EBIT for fiscal 2023 and fiscal 2022 is as follows:\nYEAR ENDED MAY 31,\n(Dollars in millions)\n2023\n2022\nNet income\n$\n5,070\n$\n6,046\nAdd: Interest expense (income), net\n(6)\n205\nAdd: Income tax expense\n1,131\n605\nEarnings before interest and taxes\n$\n6,195\n$\n6,856\nEBIT Margin: Calculated as total NIKE, Inc. EBIT divided by total NIKE, Inc. Revenues. Our EBIT Margin calculation for fiscal 2023 and fiscal 2022 is as follows:\nYEAR ENDED MAY 31,\n(Dollars in millions)\n2023\n2022\nNumerator\nEarnings before interest and taxes\n$\n6,195\n$\n6,856\nDenominator\nTotal NIKE, Inc. Revenues\n$\n51,217\n$\n46,710\nEBIT Margin\n12.1%\n14.7%\n2023 FORM 10-K 29.Table of Contents\nReturn on Invested Capital (\"ROIC\"): Represents a performance measure that management believes is useful information in understanding the Company's ability to\neffectively manage invested capital. Our ROIC calculation as of May 31, 2023 and 2022 is as follows:\nFOR THE TRAILING FOUR\nQUARTERS ENDED\n(Dollars in millions)\nMAY 31, 2023\nMAY 31, 2022\nNumerator\nNet income\n$\n5,070\n$\n6,046\nAdd: Interest expense (income), net\n(6)\n205\nAdd: Income tax expense\n1,131\n605\nEarnings before interest and taxes\n6,195\n6,856\nIncome tax adjustment\n(1,130)\n(624)\nEarnings before interest and after taxes\n$\n5,065\n$\n6,232\nAVERAGE FOR THE TRAILING FIVE\nQUARTERS ENDED\nMAY 31, 2023\nMAY 31, 2022\nDenominator\nTotal debt\n$\n12,491\n$\n12,722\nAdd: Shareholders' equity\n14,982\n14,425\nLess: Cash and equivalents and Short-term investments\n11,394\n13,748"
}
```

The application will use that information as context for prompting the LLM, but there is still one more step that you need to do to refine and check the quality of those retrieved documents: the `reranker`.



#### 4.3 Reranker Microservice (POD:chatqna-teirerank:80)
The Reranking Microservice, fueled by reranking models, is a straightforward yet immensely potent tool for semantic search. When provided with a query and a collection of documents, reranking swiftly reorders the documents based on their semantic relevance to the query, arranging them from most to least pertinent. This microservice significantly enhances overall accuracy. You will commonly use either a dense embedding model or a sparse lexical search index in a text retrieval system to retrieve relevant text documents based on the user's input. A reranking model improves the results by rearranging potential candidates into a final, optimized order.

OPEA has [multiple options](https://github.com/opea-project/GenAIComps/tree/main/comps/rerankings) for re-rankers. For this lab, you'll use the Hugging Face TEI for re-ranking. It is the `chatqna-teirerank` microservice in your cluster.

The `reranker` will use similar_docs from the previous stage and compare it with the question *What was Nike Revenue in 2023?* to check the quality of the retrieved documents.

Extract the 3 retrieved text snippets and save them in a new variable to be reranked:

In [None]:
# Install jq on pod
!kubectl exec {NGINX_POD_NAME} -- apk add jq

#Run the reranker

cmd_rerank = f"""
kubectl exec {NGINX_POD_NAME} -- sh -c '
# Check if the similar_docs.json file exists
similar_docs=$(cat /tmp/similar_docs.json)

# Extract the texts using jq (assuming the file is formatted correctly)
texts=$(echo "$similar_docs" | jq -r "[.retrieved_docs[].text | @json]")

# Call the rerank service with the correct query and texts
curl -X POST chatqna-teirerank:80/rerank -d "{{\\"query\\": \\"What was Nike Revenue in 2023?\\", \\"texts\\": $texts}}" -H "Content-Type: application/json" | jq
'
"""

!{cmd_rerank}


<div class="alert alert-block alert-info">
⚠️ The following response has been reformatted for better readability. Your results are displayed in plain text and <b>may vary slightly due to the similarity search algorithm</b>. The retrieved documents are ranked by similarity to your query, with the highest-ranked index representing the most relevant match. You can confirm that the top-ranked document corresponds to the one most closely aligned with your query.
</div>

```
[
  {
    "index": 1,
    "score": 0.9699065
  },
  {
    "index": 0,
    "score": 0.7286148
  },
  {
    "index": 2,
    "score": 0.59450954
  },
  {
    "index": 3,
    "score": 0.18186143
  }
]
```

The server responds with a JSON array containing objects with two fields: index and score. This indicates how the snippets are ranked based on their relevance to the query: `{"index": 1, "score": 0.9699065}` means the first text (index 1) has a high relevance score of approximately 0.9699. `{"index": 0, "score": 0.7286148}`, `{"index": 2, "score": 0.59450954}`, `{"index": 3, "score": 0.18186143}` indicates that the other snippets (index 0, 2, and 3) have a much lower score.

Just the first will be used to prompt the LLM.

#### 4.4 LLM microservice (POD:chatqna-vllm:80)
The heart of the RAG application is the Large Language Model (LLM), which is the primary focus of your efforts. As mentioned in the introduction, you leverage RAG to enhance LLM performance.

LLM offerings can generally be divided into two main categories: closed models and open source models that you can run locally. Each type has its own advantages and disadvantages. This microservice architecture provides the flexibility to choose between these models, accommodating a variety of LLMs that can be broadly classified as either closed or open source.

- **Closed Models:** These are proprietary models developed by major companies with significant infrastructure and resources, such as Amazon Web Services, OpenAI, and Google. Closed models typically offer high-quality, reliable outputs optimized through large-scale training on diverse datasets. However, they can have limitations in customization and may incur higher costs. Additionally, they require API access, which can restrict data sovereignty and limit usage for applications with strict data governance requirements.

- **Open Source Models:** These models are freely available. You can customize them to a high degree, enabling you to control the model's performance and adaptability to specific tasks. Running open source models locally or on private cloud infrastructure enhances data privacy and cost-effectiveness. However, you may need more technical expertise to run open source models to tune them optimally and larger computational resources to perform comparably to their closed counterparts.

OPEA allows the integration of any option. In this example, we have used the vLLM for Hugging Faces.

For test purposes, you can directly prompt the vLLM to see if the model can answer to the initial question *What was Nike revenue in 2023?*

1. Directly prompt the vLLM Microservice:

In [50]:
cmd = f"""
kubectl exec {NGINX_POD_NAME} -- sh -c 'curl -X POST chatqna-vllm:80/v1/chat/completions -H "Content-Type: application/json" -d "{{\\"model\\": \\"meta-llama/Meta-Llama-3-8B-Instruct\\", \\"messages\\": [{{\\"role\\": \\"user\\", \\"content\\": \\"What was Nike revenue in 2023?\\"}}], \\"max_new_tokens\\": 200, \\"do_sample\\": true}}"'
"""
!{cmd}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1187  100  1020  100   167     29      4  0:00:41  0:00:34  0:00:07   268{"id":"chatcmpl-499544fddd0e4d44aa5b54687411b5f5","object":"chat.completion","created":1746551623,"model":"meta-llama/Meta-Llama-3-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"I'm happy to help! However, I need to clarify that 2023 has not yet occurred, and therefore, Nike's revenue for that year is not available.\n\nBut, if you're interested in knowing Nike's revenue for previous years, I can provide you with that information. According to Nike's annual reports, here are the company's revenues for the past few years:\n\n* 2022: $53.3 billion\n* 2021: $32.3 billion\n* 2020: $32.3 billion\n* 2019: $39.1 billion\n* 2018: $39.1 billion\n\nPlease note that these figures are subject to change and may not

The model will give you the answer to the prompt like the following (Answer can vary):
```
"message": {
        "role": "assistant",
        "content": "I'm happy to help! However, I need to correct you that Nike's revenue for 2023 is not publicly available yet, as the company has not released its financial reports for the year 2023.\n\nNike's fiscal year ends on May 31st of each year, and the company typically releases its annual financial reports around August or September. Therefore, we will have to wait until then to get the actual revenue figures for Nike in 2023.\n\nIf you're looking for historical revenue data or other financial information for Nike, I can provide you with that. Just let me know!",
        "tool_calls": []
      }
```

This directly prompts the LLM without providing the context. We can see that the model actually gives the wrong answer. To check the overall RAG performance, we should test with the megaservice as we did at the beginning of this task, which will involve the entire flow.


In this section, you’ve gained insights into each component involved in a RAG application, including the megaservice, gateway, and individual microservices. By testing each microservice in the console, you’ve explored their functionalities and interactions within the overall architecture.

### 5. Test the deployment and verify RAG workflow 
NOTE: This is step is only applicable for this workshop, in a regular scenario you will just access to your public IP Address



#### 5.1 Understand RAG and use the UI

Now that you've verified all services are running, use the following commands to access the UI provided by the implementation. 

1. Modify the `chatqna-chatqna-ui-config` with the VM number and public URL of your machine:
NOTE : This is step is only done when running a workshop on ITAC

In [38]:
!kubectl patch configmap chatqna-chatqna-ui-config -n default --type merge -p "{\"data\":{\"CHAT_BASE_URL\":\"/opeak8s/$VM_NUMBER/$PUBLICURL/v1/chatqna\",\"UPLOAD_FILE_BASE_URL\":\"/opeak8s/$VM_NUMBER/$PUBLICURL/v1/dataprep/ingest\",\"GET_FILE\":\"/opeak8s/$VM_NUMBER/$PUBLICURL/v1/dataprep/get\",\"DELETE_FILE\":\"/opeak8s/$VM_NUMBER/$PUBLICURL/v1/dataprep/delete\"}}"

configmap/chatqna-chatqna-ui-config patched (no change)


2. Delete the `chatqna-chatqna-ui-xxxxx` pod (Modify with `chatqna-chatqna-ui-xxxxx` with your pod name):

In [39]:
!kubectl get pods

NAME                                       READY   STATUS    RESTARTS   AGE
chatqna-868d98c5bf-vb2bk                   1/1     Running   0          4d21h
chatqna-chatqna-ui-ffd74c8d8-pvngd         1/1     Running   0          4d19h
chatqna-data-prep-59849c8885-xn8pq         1/1     Running   0          4d21h
chatqna-nginx-6c855d856c-4nds9             1/1     Running   0          4d21h
chatqna-redis-vector-db-8566ffdb78-f8d62   1/1     Running   0          4d21h
chatqna-retriever-usvc-57c8c4c7d5-9rfrd    1/1     Running   0          4d21h
chatqna-tei-9c46456c7-z89lc                1/1     Running   0          4d21h
chatqna-teirerank-5d4c49cd8d-sq7jc         1/1     Running   0          4d21h
chatqna-vllm-59dc97d46-x9wmg               1/1     Running   0          4d21h


In [40]:
pod_name = !kubectl get pods --no-headers | grep chatqna-chatqna-ui | awk '{print $1}'
pod_name = pod_name[0]
print(f"Pod name: {pod_name}")

Pod name: chatqna-chatqna-ui-ffd74c8d8-pvngd


In [41]:
!kubectl delete pod {pod_name}

pod "chatqna-chatqna-ui-ffd74c8d8-pvngd" deleted


3. Set up port forwarding to chatqna-nginx

In [42]:
import os

os.system("kubectl port-forward --address 0.0.0.0 deployment/chatqna-nginx 8887:80 &")

0

4. Get your machine's `VM_NUMBER` and `PUBLICURL`

In [43]:
# Get environment variables from the shell
vm_number = !echo $VM_NUMBER
public_url = !echo $PUBLICURL

# Extract the actual string values (they're returned as lists)
vm_number = vm_number[0]
public_url = public_url[0]
print(f"VM Number: {vm_number}")
print(f"Public URL: {public_url}")

VM Number: 3
Public URL: 1372ea4e2247e1bfa2cd7927203b9940


Unable to listen on port 8887: Listeners failed to create with the following errors: [unable to create listener: Error listen tcp4 0.0.0.0:8887: bind: address already in use]
error: unable to listen on any of the requested ports: [{8887 80}]


5. Run the cell below to get the URL to access the ChatQnA UI

You’ll be accessing the chatbot UI through a browser using a URL formatted like this:

https://tiber-opea-workshop- `<SERVER_NUMBER>` .eglb.intel.com/opeak8s/ `<VM_NUMBER>` / `<PUBLICURL>` /

Here’s how to determine each part of the URL:

- **Server Number (`<SERVER_NUMBER>`):** This is the number in your current Jupyter notebook URL.
> Example: If you're on `https://tiber-opea-workshop-8.eglb.intel.com/...,` then your server number is `8`.
  
- **VM Number (`<VM_NUMBER>`):** This value is printed in the previous cell using the environment variable `VM_NUMBER`.

- **Public URL (`<PUBLICURL>`):** This value is also printed in the previous cell using the environment variable `PUBLICURL`.

Once the correct values are filled in, you can open the final URL in a browser to interact with the chatbot.

!["opea-ui"](./Images/opea_ui.png)

In [44]:
# Get the workshop number
server_number = input("Enter the server number: ")

# Build the full URL
url = f"https://tiber-opea-workshop-{server_number}.eglb.intel.com/opeak8s/{vm_number}/{public_url}/"

# Print the URL
print("To access the UI, open any browser and enter the following URL:")
print(url)

Enter the server number:  8


To access the UI, open any browser and enter the following URL:
https://tiber-opea-workshop-8.eglb.intel.com/opeak8s/3/1372ea4e2247e1bfa2cd7927203b9940/
Handling connection for 8887
Handling connection for 8887


To verify the UI, go ahead and ask:

```
What was Nike's revenue in 2023?
```

!["chatqna-ui-nke-revenue"](./Images/NIKE_REVENUE.png)

The answer is correct again because we already indexed our knowledge base on the previous step.

Let's try something different. Will the app be able to answer about OPEA:

```
What is OPEA?
```

!["chatqna-ui-opea"](./Images/opea_wrong.png)

Notice that the initial answer provided by the chatbot is outdated or lacks specific information about OPEA. This is because OPEA is a relatively new project and wasn’t part of the dataset used to train the language model. Since most language models are static—they rely on data available at the time of training—they can’t automatically incorporate recent developments or information about new projects like OPEA.

However, RAG offers a solution by enabling real-time context. Through the UI, you’ll see an icon that allows you to upload relevant context information. This action initiates a process where the document is sent to the **DataPrep** microservice to generate **embeddings**, and the data is then ingested into the **Vector Database**.

By uploading a new document or link, you're effectively expanding the chatbot's knowledge base to include the latest information, which helps improve the relevance and accuracy of the responses.

!["chatqna-ui-upload"](./Images/Attach_document.png)

The deployment allows you to upload either a file or a site. For this case, use the OPEA site:
- Click on the **upload icon** to open the right panel
- Click on **Paste Link**
- Copy/paste the text `https://opea-project.github.io/latest/introduction/index.html` to the entry box
- Click **Confirm** to start the indexing process

When the indexing completes, you'll see an icon added below the text box, labeled https://opea-project.github.io/latest/introduction/index.html 

!["chatqna-ui-upload-status"](./Images/Document_uploaded.png)

Ask *"What is OPEA?"* again to see the updated answer.

!["chatqna-ui-opea-rag"](./Images/what_is_opea.png)

This time, the chatbot responds correctly based on the data it added to the prompt from the new source, the OPEA website.

When you are finished using the UI, end the port-forwarding by using the following commands in your terminal.
1. Get the job number of the port-forwarding process:

In [None]:
!jobs

In the output, you should see all of the jobs that are running:
```
[1]+  Running                 kubectl port-forward --address 0.0.0.0 deployment/chatqna-nginx 8887:80 &
```
2. End the job using the `kill` command followed by `%` and the job number from the output above:

In [None]:
!kill %1

## Conclusion
In this task, you explored the foundational structure of a RAG application, covering how each component operates and interacts within the system. From question inference to answer generation, each part plays a critical role in OPEA's RAG workflow, enhancing response relevance through retrieval and accurate language modeling. This hands-on session gives a glimpse into how OPEA utilizes RAG to streamline complex queries and elevate model accuracy through seamless integration across components.