# Build RAG using Zilliz Cloud Pipelines

> (Note) Zilliz Cloud Pipelines is about to deprecate. Please stay tuned for detailed instructions on alternative solutions.


[Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is AI-powered retrieval service. It simplifies the maintenance of information retrieval system by providing ingestion and search pipelines as easy-to-use API service. As an AI application developer, with quality optimization and devops taken care of, you can focus on building AI applications tailored to your specific use case.

In this notebook, we show how to use [Zilliz Cloud Pipelines](https://zilliz.com/zilliz-cloud-pipelines) to build a simple yet scalable [Retrieval Augmented Generation (RAG)](https://zilliz.com/use-cases/llm-retrieval-augmented-generation) application. Retrieval is at the heart of RAG solution, which typically involves maintaining a knowledge base with document parsing and chunking, hosting an embedding model and using vector database as retrieval engine. With Zilliz Cloud Pipelines, you don't need to deal with such a complex tech stack. Everything can be done with an API call.

We first create the an Ingestion pipeline for document indexing and a Search pipeline for knowledge retrieval. Then we run Ingestion pipeline by API call to import documents to establish the knowledge base. Finally, we build an RAG application that runs Search pipeline to conduct Retrieval Augmented Generation.

![](../../pics/rag_and_pipeline.png)

## Setup
### Prerequisites
Please make sure you have a Serverless cluster in Zilliz Cloud. If not already, you can [sign up for free](https://cloud.zilliz.com/signup?utm_source=referral&utm_medium=partner&utm_campaign=2023-12-21_github-docs_zilliz-pipeline-rag_github).

To learn how to create a Serverless cluster and get your CLOUD_REGION, CLUSTER_ID, API_KEY and PROJECT_ID, please refer to this [page](https://docs.zilliz.com/docs/create-cluster) for more details.

With the Serverless Cluster created, please get the cluster id, API key and project id as shown and fill in the following code:

![](../../pics/zilliz_api_key_cluster_id.jpeg)

In [9]:
import os

CLOUD_REGION = 'gcp-us-west1'
CLUSTER_ID = 'your CLUSTER_ID'
API_KEY = 'your API_KEY'
PROJECT_ID = 'your PROJECT_ID'

### Create an ingestion pipeline
[Ingestion pipelines](https://docs.zilliz.com/docs/understanding-pipelines#ingestion-pipelines) can transform unstructured data into searchable vector embeddings and store them in Zilliz Cloud Vector Database.

In the Ingestion pipeline, you can specify functions to customize its behavior. The input data that Ingestion pipeline expects also depends on the specified functions. Currently, Ingestion pipeline allows two types of functions:

- The `INDEX_DOC` function expects a document as input. It splits the input text document into chunks and generates a vector embedding for each chunk. This function maps an input field (doc_url) to four output fields (doc_name, chunk_id, chunk_text, and embedding) in the auto-generated collection.
- The `PRESERVE` function stores a user-defined input as additional [scalar](https://milvus.io/docs/scalar_index.md) field in the auto-generated collection. This is typically used to store meta information of the document, such as publisher info, tags that describes the property of the file.

In the following example we create an Ingestion pipeline with an `INDEX_DOC` function and a `PRESERVE` function. As part of creating the Ingestion pipeline, a vector database collection named `my_rag_collection` will be created in the cluster. It contains five fields:
- `doc_name`, `chunk_id`, `chunk_text`, `embedding` as defined by `INDEX_DOC` function
- `version` as defined by `PRESERVE` function

In [2]:
import requests

headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

create_pipeline_url = f"https://controller.api.{CLOUD_REGION}.zillizcloud.com/v1/pipelines"

collection_name = 'my_rag_collection'

data = {
    "projectId": PROJECT_ID,
    "name": "my_ingestion_pipeline",
    "description": "A pipeline that splits a text file into chunks and generates embeddings. It also stores the doc version with each chunk.",
    "type": "INGESTION",
    "functions": [
        {
            "name": "index_my_doc",
            "action": "INDEX_DOC",
            "inputField": "doc_url",
            "language": "ENGLISH",
            "chunkSize": 500,
        },
        {
            "name": "keep_doc_info",
            "action": "PRESERVE",
            "inputField": "version",
            "outputField": "version",
            "fieldType": "VarChar"
        }
    ],
    "clusterId": f"{CLUSTER_ID}",
    "newCollectionName": f"{collection_name}"
}

response = requests.post(create_pipeline_url, headers=headers, json=data)
print(response.json())
ingestion_pipe_id = response.json()["data"]["pipelineId"]

{'code': 200, 'data': {'pipelineId': 'pipe-d5a147fbdbdb7a13cae98a', 'name': 'my_ingestion_pipeline', 'type': 'INGESTION', 'description': 'A pipeline that splits a text file into chunks and generates embeddings. It also stores the doc version with each chunk.', 'status': 'SERVING', 'functions': [{'action': 'INDEX_DOC', 'name': 'index_my_doc', 'inputField': 'doc_url', 'language': 'ENGLISH', 'chunkSize': 500}, {'action': 'PRESERVE', 'name': 'keep_doc_info', 'inputField': 'version', 'outputField': 'version', 'fieldType': 'VarChar'}], 'clusterId': 'in03-423dca989cc7410', 'newCollectionName': 'my_rag_collection', 'totalTokenUsage': 0}}


After successful creation, it will return a pipeline ID. We will run this pipeline later with pipeline ID to ingest a document.

### Create a search pipeline
[Search pipelines](https://docs.zilliz.com/docs/understanding-pipelines#search-pipelines) enables semantic search by converting a query string into a vector embedding and then retrieving top-K nearest neighbour vectors, each vector represents a chunk of ingested document and carries other associated information such as file name and preserved properties.

A Search pipeline contains one type of function `SEARCH_DOC_CHUNK`, in which you need to set the the cluster and collection to search from.



In [3]:
data = {
    "projectId": PROJECT_ID,
    "name": "my_search_pipeline",
    "description": "A pipeline that receives text and search for semantically similar doc chunks",
    "type": "SEARCH",
    "functions": [
        {
            "name": "search_chunk_text_and_title",
            "action": "SEARCH_DOC_CHUNK",
            "inputField": "query_text",
            "clusterId": f"{CLUSTER_ID}",
            "collectionName": f"{collection_name}"
        }
    ]
}

response = requests.post(create_pipeline_url, headers=headers, json=data)

print(response.json())
search_pipe_id = response.json()["data"]["pipelineId"]

{'code': 200, 'data': {'pipelineId': 'pipe-9d84aa1ed59d6b641c51f6', 'name': 'my_search_pipeline', 'type': 'SEARCH', 'description': 'A pipeline that receives text and search for semantically similar doc chunks', 'status': 'SERVING', 'functions': [{'action': 'SEARCH_DOC_CHUNK', 'name': 'search_chunk_text_and_title', 'inputField': 'query_text', 'clusterId': 'in03-423dca989cc7410', 'collectionName': 'my_rag_collection'}], 'totalTokenUsage': 0}}


Similarly, after successful creation, it will return a pipeline ID. We will run this pipeline later and will use this pipeline ID.

In addition to the creating pipelines through [RESTful API](https://docs.zilliz.com/docs/create-piplines-rest) as introduced in this notebook, you can also create pipelines through [Web UI](https://docs.zilliz.com/docs/create-piplines-gui) with a few clicks.

### Run ingestion pipeline

Ingestion pipeline accepts files from Object Storage Service such as [AWS S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html) or [Google Cloud Storage (GCS)](https://cloud.google.com/storage/docs/uploads-downloads). Supported file types include `.txt`, `.pdf`, `.md`, `.html`, `.epub`, `.csv`, `.doc`, `.docx`, `.xls`, `.xlsx`, `.ppt`, `.pptx`.


In the following demo, we run ingestion pipeline with the file [milvus_doc.md](https://publicdataset.zillizcloud.com/milvus_doc.md) stored on Google Cloud Storage and attach its version info with each indexed doc chunk. The file explains how to use Milvus (an open-source vector database).

In [4]:
gcs_url = 'https://publicdataset.zillizcloud.com/milvus_doc.md'  # public or pre-signed url of a file stored on AWS S3 or Google Cloud Storage


run_pipeline_url = f"https://controller.api.{CLOUD_REGION}.zillizcloud.com/v1/pipelines/{ingestion_pipe_id}/run"

data = {
    "data":
        {
            "doc_url": f"{gcs_url}",
            "version": '2.3'
        }
}

response = requests.post(run_pipeline_url, headers=headers, json=data)

print(response.json())

{'code': 200, 'data': {'token_usage': 1247, 'doc_name': 'milvus_doc.md', 'num_chunks': 10}}


Now we have successfully ingested the document by splitting it into doc chunks and uploading the generated embedding into the vector database collection. If you want to inspect the data in the collection, you can use the Data Preview tool in [Zilliz Cloud web UI](https://cloud.zilliz.com).

## Build RAG application with Search pipeline

### Run search pipeline
The first step in building an RAG app is to retrieve information pieces (doc chunks) most relevant to the question from a knowledge base (typically a vector database collection).

This is as simple as running a Search pipeline that we just created above. Following is how to run a Search pipeline with query text and specifications, and we wrap this run with a function that can be used in the RAG app we will show shortly.

In [5]:
import pprint


def retrieval_with_pipeline(question, search_pipe_id, top_k=2, verbose=False):
    run_pipeline_url = f"https://controller.api.{CLOUD_REGION}.zillizcloud.com/v1/pipelines/{search_pipe_id}/run"

    data = {
        "data": {
            "query_text": question
        },
        "params": {
            "limit": top_k,
            "offset": 0,
            "outputFields": [
                "chunk_text",
                "chunk_id",
                "doc_name",
                "version"
            ],
        }
    }
    response = requests.post(run_pipeline_url, headers=headers, json=data)
    if verbose:
        pprint.pprint(response.json())
    results = response.json()["data"]["result"]
    retrieved_texts = [{'chunk_text': result['chunk_text'], 'version': result['version']} for result in results]
    return retrieved_texts


question = 'Can users delete entities by complex boolean expressions?'
retrieval_with_pipeline(question, search_pipe_id, top_k=2, verbose=True)

{'code': 200,
 'data': {'result': [{'chunk_id': 3,
                      'chunk_text': '# Delete Entities\n'
                                    '## Prepare boolean expression\n'
                                    '### Complex boolean expression\n'
                                    'To filter entities that meet specific '
                                    'conditions, define complex boolean '
                                    'expressions.  \n'
                                    'Filter entities whose word_count is '
                                    'greater than or equal to 11000:  \n'
                                    '```python\n'
                                    'expr = "word_count >= 11000"\n'
                                    '```  \n'
                                    'Filter entities whose book_name is not '
                                    'Unknown:  \n'
                                    '```python\n'
                                    'expr = "book_n

[{'chunk_text': '# Delete Entities\n## Prepare boolean expression\n### Complex boolean expression\nTo filter entities that meet specific conditions, define complex boolean expressions.  \nFilter entities whose word_count is greater than or equal to 11000:  \n```python\nexpr = "word_count >= 11000"\n```  \nFilter entities whose book_name is not Unknown:  \n```python\nexpr = "book_name != Unknown"\n```  \nFilter entities whose primary key values are greater than 5 and word_count is smaller than or equal to 9999:  \n```python\nexpr = "book_id > 5 && word_count <= 9999"\n```',
  'version': '2.3'},
 {'chunk_text': '# Delete Entities\nThis topic describes how to delete entities in Milvus.  \nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.  \nDeleted entit

We can see that when we ask a question, this search run can return the top k knowledge fragments we need. This is also a basis for forming RAG.

### Build a chatbot powered by RAG 
With the above convenient helper function `retrieval_with_pipeline`, we can retrieve the knowledge ingested into the vector database.
Below, we show a simple RAG app that can answer based on the knowledge we have ingested previously. It uses OpenAI `gpt-3.5-turbo` as LLM and a simple prompt. To test it, you can replace with your own OpenAI API Key.

In [6]:
from openai import OpenAI

client = OpenAI()
client.api_key = os.getenv('OPENAI_API_KEY')  # your OpenAI API key


class Chatbot:
    def __init__(self, search_pipe_id):
        self._search_pipe_id = search_pipe_id

    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text with Zilliz Cloud Pipelines.
        """
        results = retrieval_with_pipeline(query, self._search_pipe_id, top_k=2)
        return results

    def generate_answer(self, query: str, context_str: list) -> str:
        """
        Generate answer based on context, which is from the result of Search pipeline run.
        """
        completion = client.chat.completions.create(
            model="gpt-3.5-turbo",
            temperature=0,
            messages=
            [
                {"role": "user",
                 "content":
                     f"We have provided context information below. \n"
                     f"---------------------\n"
                     f"{context_str}"
                     f"\n---------------------\n"
                     f"Given this information, please answer the question: {query}"
                 }
            ]
        ).choices[0].message.content
        return completion

    def chat_with_rag(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_answer(query, context_str)
        return completion

    def chat_without_rag(self, query: str) -> str:
        return client.chat.completions.create(
            model="gpt-3.5-turbo",
            temperature=0,
            messages=
            [
                {"role": "user",
                 "content": query
                 }
            ]
        ).choices[0].message.content

chatbot = Chatbot(search_pipe_id)

This implements an RAG chatbot, it will use Search pipeline to retrieve the most relevant chunks from ingested documents, and enhance the answer quality with it. Let's see how it works in action!

### Chat with RAG

In [7]:
question = 'In Milvus 2.3, can users delete entities by complex boolean expressions?'
chatbot.chat_with_rag(question)

'Yes, in Milvus 2.3, users can delete entities by complex boolean expressions.'

The ground truth content in the original knowledge text is:
> **Milvus supports deleting entities by primary key or complex boolean expressions**. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.



We can tell that the RAG we built successfully answers this question that requires deep domain knowledge.

In [8]:
chatbot.chat_without_rag(question)

'No, in Milvus 2.3, users cannot delete entities by complex boolean expressions. The delete operation in Milvus 2.3 is performed based on the entity IDs. Users need to provide the specific entity IDs that they want to delete.'

In opposite, the LLM without RAG doesn't have domain knowledge required for this question, even worse, it outputs incorrect answer. This is a typical example of the so called [hallucination](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)) problem of LLM.

That's how to use Zilliz Cloud Pipelines to build RAG applications. To learn more, you can refer to https://docs.zilliz.com/docs/pipelines for detailed information.

If you have any question, feel free to contact us at support@zilliz.com