<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/examples/managed/vectaraDemo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Managed Index with Zilliz Cloud Pipeline

[Zilliz Cloud Pipelines](https://docs.zilliz.com/docs/pipelines) is a robust solution that efficiently transforms unstructured data into a vector database for effective semantic search.

## Setup

1. Install llama-index

In [None]:
# ! pip install llama-index

2. Set your [OpenAI](https://platform.openai.com) & [Zilliz Cloud](https://cloud.zilliz.com/) accounts

In [1]:
from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API Key:")

ZILLIZ_CLUSTER_ID = getpass("Enter your Zilliz Cluster ID:")
ZILLIZ_TOKEN = getpass("Enter your Zilliz Token:")

## Indexing documents

### From Signed URL

Zilliz Cloud Pipeline is able to ingest & automatically index a document given a presigned url.

In [2]:
from llama_index.indices import ZillizCloudPipelineIndex

zcp_index = ZillizCloudPipelineIndex.from_document_url(
    url="https://publicdataset.zillizcloud.com/milvus_doc.md",  # a public or pre-signed url of a file stored on s3 or gcs
    cluster_id=ZILLIZ_CLUSTER_ID,
    token=ZILLIZ_TOKEN,
    metadata={"version": "milvus 2.3", "year": "2023"},  # optional
)

- It is optional to add metadata for each document.

### From Local File

Coming soon.

### From Raw Text

Coming soon.

## Working as Query Engine

A Zilliz Cloud Pipeline's Index can work as a Query Engine in Llama-Index.
It allows users to customize some parameters:
- search_top_k: How many text nodes/chunks retrieved. Optional, defaults to `DEFAULT_SIMILARITY_TOP_K` (2).
- output_metadata: What metadata fields included in each retrieved text node. Optional, defaults to [].

In [3]:
# # Get index without ingestion:
# zcp_index = ZillizCloudPipelineIndex(
#         cluster_id=ZILLIZ_CLUSTER_ID,
#         token=ZILLIZ_TOKEN,
#         # collection_name='zcp_llamalection'
#     )

query_engine = zcp_index.as_query_engine(
    search_top_k=3, output_metadata=["version", "year"]  # optional
)

Then you can use the query engine for Semantic Search or Retrieval Augmented Generation:

- **Retrieve** (Semantic search powered by Zilliz Cloud Pipeline's Index):

In [5]:
question = (
    "In Milvus 2.3, can users delete entities by complex boolean expressions?"
)
query_engine.retrieve(question)

[NodeWithScore(node=TextNode(id_='446268394525203238', embedding=None, metadata={'year': '2023', 'version': 'milvus 2.3'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='0038e98e24c36a9ab33c792fd48e50e7fea1d3a1b0ea793036ec932ab9b6cf6b', text='Delete Entities\nThis topic describes how to delete entities in Milvus.\nMilvus supports deleting entities by primary key or complex boolean expressions. Deleting entities by primary key is much faster and lighter than deleting them by complex boolean expressions. This is because Milvus executes queries first when deleting data by complex boolean expressions.\nDeleted entities can still be retrieved immediately after the deletion if the consistency level is set lower than Strong.\nEntities deleted beyond the pre-specified span of time for Time Travel cannot be retrieved again.\nFrequent deletion operations will impact the system performance.\nBefore deleting entities by comlpex boolean expressions, make sur

- **Query** (RAG powered by Zilliz Cloud Pipeline's Index & OpenAI's LLM):

In [6]:
response = query_engine.query(question)
print(response.response)

Yes, users can delete entities by complex boolean expressions in Milvus 2.3.
