<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Managed Index with Zilliz Cloud Pipelines

[Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is a scalable API service for retrieval. You can use Zilliz Cloud Pipelines as managed index in `llama-index`. This service can transform documents into vector embeddings and store them in Zilliz Cloud for effective semantic search.

## Setup

1. Install llama-index

In [None]:
# ! pip install llama-index

2. Configure credentials of your [OpenAI](https://platform.openai.com) & [Zilliz Cloud](https://cloud.zilliz.com/signup?utm_source=twitter&utm_medium=social%20&utm_campaign=2023-12-22_social_pipeline-llamaindex_twitter) accounts.

In [None]:
from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")

ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:")
ZILLIZ_TOKEN = getpass("Enter your Zilliz API Key:")

## Indexing documents

### From Signed URL

Zilliz Cloud Pipelines accepts files from AWS S3 and Google Cloud Storage. You can generate a presigned url from the Object Storage and use `from_document_url()` or `insert_doc_url()` to ingest the file. It can automatically index the document and store the doc chunks as vectors on Zilliz Cloud.

In [None]:
from llama_index.indices import ZillizCloudPipelineIndex

zcp_index = ZillizCloudPipelineIndex.from_document_url(
    # a public or pre-signed url of a file stored on AWS S3 or Google Cloud Storage
    url="https://publicdataset.zillizcloud.com/milvus_doc.md",
    cluster_id=ZILLIZ_CLUSTER_ID,
    token=ZILLIZ_TOKEN,
    # optional
    metadata={"version": "2.3"},  # used for filtering
    collection_name="zcp_llamalection",  # change this value will specify customized collection name
)

# Insert more docs, eg. a Milvus v2.2 document
zcp_index.insert_doc_url(
    url="https://publicdataset.zillizcloud.com/milvus_doc_22.md",
    metadata={"version": "2.2"},
)

- It is optional to add metadata for each document. The metadata can be used to filter doc chunks during retrieval.

### From Local File

Coming soon.

### From Raw Text

Coming soon.

## Working as Query Engine

To conduct semantic search with `ZillizCloudPipelineIndex`, you can use it `as_query_engine()` by specifying a few parameters:
- search_top_k: How many text nodes/chunks to retrieve. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2).
- filters: Metadata filters. Optional, defaults to None.
- output_metadata: What metadata fields to return with the retrieved text node. Optional, defaults to [].

In [None]:
# # If you don't have zcp_index object and have an existing collection, you can construct it by:
#
# from llama_index.indices import ZillizCloudPipelineIndex
# zcp_index = ZillizCloudPipelineIndex(
#         cluster_id=ZILLIZ_CLUSTER_ID,
#         token=ZILLIZ_TOKEN,
#         collection_name="zcp_llamalection"
#     )

from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters

query_engine_milvus23 = zcp_index.as_query_engine(
    search_top_k=3,
    filters=MetadataFilters(
        filters=[
            ExactMatchFilter(key="version", value="2.3")
        ]  # version == "2.3"
    ),
    output_metadata=["version"],
)

Then the query engine is ready for Semantic Search or Retrieval Augmented Generation with Milvus 2.3 documents:

- **Retrieve** (Semantic search powered by Zilliz Cloud Pipelines):

In [None]:
> The query engine with filters retrieves only text nodes with "version 2.3" tag.

[NodeWithScore(node=TextNode(id_='446268394525283746', embedding=None, metadata={'version': '2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='c3254bc65319b52914d6e68fbce69161fcf0e2998e4619287a8560258a2fe53d', text='Delete Entities\nThis topic describes how to delete entities in Milvus.\nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.\nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.\nBefore deleting entities by comlpex boolean expressions, make sure the collection has be

- **Query** (RAG powered by Zilliz Cloud Pipelines as retriever and OpenAI's LLM):

In [None]:
response = query_engine_milvus23.query(question)
print(response.response)

Yes, users can delete entities by filtering non-primary fields. Milvus supports deleting entities by complex boolean expressions, which can include conditions based on non-primary fields. Users can define complex boolean expressions to filter entities based on specific conditions and then delete those entities using the expression.
