<a href="https://colab.research.google.com/github/psaikiran2890/Google-Colab-projects/blob/main/unstructured_data_workshop.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Unstructured Data Workshop: Let's Build DeepSearcher!

- [DeepSearcher GitHub](https://github.com/zilliztech/deep-searcher/tree/master)
- [Zilliz.com dataset](https://drive.google.com/file/d/1nj8ilZVEZ6X98n1QKHZ1c0HIPtcp-RUj/view?usp=sharing)
- [Prebuilt Milvus Lite db](https://drive.google.com/file/d/1uE0q9MkpYkAlg4hjUSaxLQIdGSRadNcM/view?usp=sharing)


In [None]:
!pip install pymilvus



In [None]:
!gdown 1nj8ilZVEZ6X98n1QKHZ1c0HIPtcp-RUj

Downloading...
From (original): https://drive.google.com/uc?id=1nj8ilZVEZ6X98n1QKHZ1c0HIPtcp-RUj
From (redirected): https://drive.google.com/uc?id=1nj8ilZVEZ6X98n1QKHZ1c0HIPtcp-RUj&confirm=t&uuid=1b231886-89c2-4a89-9a16-815342bfd0ba
To: /content/zilliz_dot_com.parquet
100% 673M/673M [00:06<00:00, 99.9MB/s]


In [None]:
from more_itertools import chunked
from openai import OpenAI
import pandas as pd
from pymilvus import MilvusClient
from pymilvus import FieldSchema, DataType, CollectionSchema
from sentence_transformers import SentenceTransformer

## Create Vector DB
### Read prewrangled data

In [None]:
ds = pd.read_parquet("/content/zilliz_dot_com.parquet")
ds.head(), ds.count()

(                                                text  ...                                             vector
 0  Zilliz: Vector Database built for enterprise-g...  ...  [-0.045393813, -0.018639432, -0.054784577, 0.0...
 1  [Definitive Guide to Choosing a Vector Databas...  ...  [-0.05608401, -0.029924845, -0.05607348, -0.00...
 2  [Get Started Free](https://cloud.zilliz.com/si...  ...  [-0.085992694, -0.0033674983, -0.07611589, 0.0...
 3  ### Optimized Milvus\n\n  A fully managed serv...  ...  [-0.031480514, -0.0017416211, -0.035602074, 0....
 4  ### Start building better vector search applic...  ...  [-0.012724874, -0.03382215, 0.004587016, 0.027...
 
 [5 rows x 3 columns],
 text      312427
 source    312427
 vector    312427
 dtype: int64)

### Open DB and Iterate over data

In [None]:
#!gdown 1uE0q9MkpYkAlg4hjUSaxLQIdGSRadNcM

In [None]:
# Connect to Milvus client given URI
milvus_client = MilvusClient(uri="zilliz_dot_com.db")

In [None]:
dim = 768 // 2
collection_name = "default"

schema = CollectionSchema(
    [
        FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
        FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=4048),
        FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=4048),
        FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=dim),
    ]
)

index_params = milvus_client.prepare_index_params()
index_params.add_index(
    field_name="vector", index_type="AUTOINDEX", metric_type="COSINE"
)

if milvus_client.has_collection(collection_name=collection_name):
    milvus_client.drop_collection(collection_name=collection_name)

milvus_client.create_collection(
    collection_name,
    dim,
    primary_field_name="id",
    vector_field_name="vector",
    # metric_type="COSINE",
    auto_id=True,
    schema=schema,
    index_params=index_params,
)

print(milvus_client.list_collections())

['default']


In [None]:
# NOTE: Don't run this cell more than once or duplicate entries will be inserted!

# TODO: tqdm progress bar
batch_id = 0
for x in chunked(ds.to_dict("records"), 1024):
    milvus_client.insert(
        collection_name=collection_name,
        data=x,
    )
    batch_id += 1
    if batch_id % 10 == 0:
        print(f"batch {batch_id} of 306")

batch 10 of 306
batch 20 of 306
batch 30 of 306
batch 40 of 306
batch 50 of 306
batch 60 of 306
batch 70 of 306
batch 80 of 306
batch 90 of 306
batch 100 of 306
batch 110 of 306
batch 120 of 306
batch 130 of 306
batch 140 of 306
batch 150 of 306
batch 160 of 306
batch 170 of 306
batch 180 of 306
batch 190 of 306
batch 200 of 306
batch 210 of 306
batch 220 of 306
batch 230 of 306
batch 240 of 306
batch 250 of 306
batch 260 of 306
batch 270 of 306
batch 280 of 306
batch 290 of 306
batch 300 of 306


In [None]:
milvus_client.get_collection_stats("default")

{'row_count': 312427}

### Test out search

In [None]:
embedding_model = "all-MiniLM-L6-v2"
document_encoder = SentenceTransformer(embedding_model)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
query_vec = document_encoder.encode(["comparison between Milvus and weaviate"])

search_results = milvus_client.search(
    collection_name=collection_name,
    data=query_vec,
    output_fields=["source"],
    limit=10,  # Max number of search results to return
    search_params={"metric_type": "COSINE", "params": {}},  # Search parameters
)[0]

In [None]:
search_results

[{'id': 456788018628485391,
  'distance': 0.6580127477645874,
  'entity': {'source': 'https://zilliz.com/ai-faq/how-do-vector-databases-like-milvus-or-weaviate-handle-storage-of-vectors-and-indexes-under-the-hood-eg-do-they-use-memorymapped-files-proprietary-storage-engines-etc'}},
 {'id': 456787974348347961,
  'distance': 0.5470083355903625,
  'entity': {'source': 'https://zilliz.com/blog/milvus-2-3-beta-new-features-and-updates'}},
 {'id': 456787974900695509,
  'distance': 0.5143593549728394,
  'entity': {'source': 'https://zilliz.com/blog/milvus-2-2-5-new-features-and-updates'}},
 {'id': 456787975979700422,
  'distance': 0.4884350597858429,
  'entity': {'source': 'https://zilliz.com/learn/milvus-vector-database-quickstart'}},
 {'id': 456787974458974518,
  'distance': 0.4879022240638733,
  'entity': {'source': 'https://zilliz.com/blog/a-different-angle-retrieval-optimized-embedding-models'}},
 {'id': 456787974788233535,
  'distance': 0.4729863703250885,
  'entity': {'source': 'https:

## Agent Workflow
### Prompts

In [None]:
SUB_QUERY_PROMPT = """To answer this question more comprehensively, please break down the original question into up to four sub-questions. Return as list of str.
If this is a very simple question and no decomposition is necessary, then keep the only one original question in the python code list.

Original Question: {original_query}


<EXAMPLE>
Example input:
"Explain deep learning"

Example output:
[
    "What is deep learning?",
    "What is the difference between deep learning and machine learning?",
    "What is the history of deep learning?"
]
</EXAMPLE>

Provide your response in a python code list of str format:
"""

# DEBUG
print(
    SUB_QUERY_PROMPT.format(
        original_query="What is the relationship between Zilliz and Milvus?"
    )
)

To answer this question more comprehensively, please break down the original question into up to four sub-questions. Return as list of str.
If this is a very simple question and no decomposition is necessary, then keep the only one original question in the python code list.

Original Question: What is the relationship between Zilliz and Milvus?


<EXAMPLE>
Example input:
"Explain deep learning"

Example output:
[
    "What is deep learning?",
    "What is the difference between deep learning and machine learning?",
    "What is the history of deep learning?"
]
</EXAMPLE>

Provide your response in a python code list of str format:



In [None]:
RERANK_PROMPT = """Based on the query questions and the retrieved chunk, to determine whether the chunk is helpful in answering any of the query question, you can only return "YES" or "NO", without any other information.

Query Questions: {query}
Retrieved Chunk: {retrieved_chunk}

Is the chunk helpful in answering the any of the questions?
"""

In [None]:
REFLECT_PROMPT = """Determine whether additional search queries are needed based on the original query, previous sub queries, and all retrieved document chunks. If further research is required, provide a Python list of up to 3 search queries. If no further research is required, return an empty list.

If the original query is to write a report, then you prefer to generate some further queries, instead return an empty list.

Original Query: {question}

Previous Sub Queries: {mini_questions}

Related Chunks:
{mini_chunk_str}

Respond exclusively in valid List of str format without any other text."""

In [None]:
SUMMARY_PROMPT = """You are a AI content analysis expert, good at summarizing content. Please summarize a specific and detailed answer or report based on the previous queries and the retrieved document chunks.

Original Query: {question}

Previous Sub Queries: {mini_questions}

Related Chunks:
{mini_chunk_str}

"""

### Test reasoning model

In [None]:
api_key = "INSERT YOUR API KEY HERE!"
client = OpenAI(api_key=api_key)

completion = client.chat.completions.create(
    model="o1-mini",
    messages=[
        {
            "role": "user",
            "content": "Write a one-sentence bedtime story about a unicorn.",
        }
    ],
)

print(completion.choices[0].message.content)

In the moonlit meadow, a gentle unicorn whispered dreams of sparkling stars to every child as they drifted peacefully to sleep.


### Initial subqueries

In [None]:
# Overwrite this as you please
query_str = "Write a report comparing Milvus with other vector databases."

In [None]:
prompt = SUB_QUERY_PROMPT.format(original_query=query_str)

completion = client.chat.completions.create(
    model="o1-mini", messages=[{"role": "user", "content": prompt}]
)

response = completion.choices[0].message.content

In [None]:
# subqueries = initial_subqueries
subqueries = list(set(response.splitlines()[2:-2]))
subqueries = [s.strip()[1:-2].strip() for s in subqueries]
initial_subqueries = subqueries

for s in subqueries:
    print(s)

What are the other commonly used vector databases?
What is Milvus and what are its key features?
How does Milvus compare to other vector databases in terms of performance and scalability?
What are the use cases and advantages of Milvus compared to its competitors


### Main Loop

In [None]:
previous_chunks = {}
previous_subqueries = set()
collection_name = 'default'

# Iterations
while len(subqueries):
    # Fetch chunks
    print("Encoding subqueries...")
    query_vec = document_encoder.encode(subqueries)

    print("Performing vector search...")
    search_results = milvus_client.search(
        collection_name=collection_name,
        data=query_vec,
        output_fields=["text"],
        limit=3,  # Max number of search results to return
        search_params={"metric_type": "COSINE", "params": {}},  # Search parameters
    )

    new_chunks = {}
    for res in search_results:
        # NOTE: Takes care of deduplication
        this_chunks = {x["id"]: x["entity"]["text"] for x in res}
        new_chunks = new_chunks | this_chunks

    # Filter subqueries
    # raise Exception()
    def filter_chunk(c):
        prompt = RERANK_PROMPT.format(query=query_str, retrieved_chunk=c)
        completion = client.chat.completions.create(
            model="o1-mini", messages=[{"role": "user", "content": prompt}]
        )

        response = completion.choices[0].message.content
        print(f"filter query: {response}")
        return "Y" in response or "y" in response

    new_chunks = {x: y for x, y in new_chunks.items() if filter_chunk(y)}
    previous_chunks = previous_chunks | new_chunks
    previous_subqueries.update(subqueries)

    # Reflect
    print('Reflecting on knowledge gaps...')
    prompt = REFLECT_PROMPT.format(
        question=query_str,
        mini_questions=" ".join(previous_subqueries),
        mini_chunk_str=" ".join(previous_chunks.values()),
    )
    completion = client.chat.completions.create(
            model="o1-mini", messages=[{"role": "user", "content": prompt}]
    )
    response = completion.choices[0].message.content
    subqueries = list(set(response.splitlines()[2:-2]))
    subqueries = [s.strip()[1:-2].strip() for s in subqueries]

    if len(subqueries):
        print('')
        for s in subqueries:
            print(s)
    else:
        print('No new subqueries, terminating reasoning loop...')

    print('\n\n')

Encoding subqueries...
Performing vector search...
filter query: YES
filter query: YES
filter query: YES
filter query: NO
filter query: NO
filter query: NO
filter query: YES
filter query: YES
filter query: YES
filter query: YES
filter query: YES
Reflecting on knowledge gaps...

How does Milvus integrate with popular machine learning and AI frameworks?
What are the pricing and licensing models of Milvus compared to other vector databases?
What are the latest performance benchmarks comparing Milvus with its main competitors



Encoding subqueries...
Performing vector search...
filter query: NO
filter query: NO
filter query: NO
filter query: NO
filter query: YES
filter query: YES
filter query: NO
filter query: NO
filter query: YES
Reflecting on knowledge gaps...
No new subqueries, terminating reasoning loop...





### Generate report

In [None]:
print('Generating report...')
prompt = SUMMARY_PROMPT.format(
    question=query_str,
    mini_questions=" ".join(previous_subqueries),
    mini_chunk_str=" ".join(previous_chunks.values()),
)
completion = client.chat.completions.create(
        model="o1-mini", messages=[{"role": "user", "content": prompt}]
)
response = completion.choices[0].message.content

Generating report...


In [None]:
print(response)

## Comparative Report: Milvus vs. Other Vector Databases

### Introduction

Vector databases have become essential in managing and querying high-dimensional data, particularly for applications involving machine learning, artificial intelligence, and similarity searches. This report provides a comprehensive comparison of Milvus, a leading open-source vector database, against other prominent vector databases in the market. The comparison spans various dimensions, including performance benchmarks, integration capabilities, pricing and licensing models, use cases, and scalability.

### Overview of Vector Databases

Vector databases are specialized systems designed to store, index, and query vector embeddings efficiently. They are crucial for applications such as image and text similarity search, recommendation systems, and natural language processing. The market offers a diverse range of vector databases, each with unique features and optimizations:

- **Purpose-Built Vector Databases**: 


## Comparative Report: Milvus vs. Other Vector Databases

### Introduction

Vector databases have become essential in managing and querying high-dimensional data, particularly for applications involving machine learning, artificial intelligence, and similarity searches. This report provides a comprehensive comparison of Milvus, a leading open-source vector database, against other prominent vector databases in the market. The comparison spans various dimensions, including performance benchmarks, integration capabilities, pricing and licensing models, use cases, and scalability.

### Overview of Vector Databases

Vector databases are specialized systems designed to store, index, and query vector embeddings efficiently. They are crucial for applications such as image and text similarity search, recommendation systems, and natural language processing. The market offers a diverse range of vector databases, each with unique features and optimizations:

- **Purpose-Built Vector Databases**:
  - **Milvus**: An open-source, highly scalable vector database built for billion-scale vector similarity search.
  - **Zilliz Cloud**: A fully managed service offering Milvus capabilities in the cloud.
  
- **Vector Search Libraries**:
  - **Faiss**: Developed by Facebook, optimized for fast similarity search on large datasets.
  - **Annoy**: A C++ library with Python bindings, suitable for approximate nearest neighbors search.
  
- **Lightweight Vector Databases**:
  - **Chroma**: Offers simple vector search functionalities with ease of integration.
  - **Milvus Lite**: A lightweight version of Milvus tailored for local implementations.
  
- **Traditional Databases with Vector Add-Ons**:
  - **Apache Cassandra**: A distributed NoSQL database that now supports vector embeddings and similarity searches.
  - **OpenSearch**: An open-source search and analytics suite that includes vector search capabilities.
  - **Rockset**: A real-time search and analytics database enhanced with vector search add-ons.

### Milvus: An Overview

**Milvus** is an open-source vector database developed by [Zilliz](https://zilliz.com/what-is-milvus), renowned for its flexibility, reliability, and high performance. Key features include:

- **Distributed Architecture**: Separates storage and compute, leveraging cloud-native object storage (e.g., S3, MinIO) for scalability.
- **Flexible Indexing**: Supports multiple indexing algorithms like IVF, HNSW, and ANNOY, allowing customization based on specific use cases.
- **Memory Optimization**: Utilizes memory-mapped files for efficient management of large datasets, reducing manual memory overhead.
- **Integration Capabilities**: Seamlessly integrates with popular machine learning and AI frameworks, enhancing its utility in AI-driven applications.
- **Active Community**: Maintained by a vibrant open-source community with significant GitHub activity and industry recognition.

### Performance and Scalability

Performance benchmarks are critical in evaluating vector databases. According to available data:

| **Engine**                | **Performance (ms)** | **Dataset Size (million)** |
|---------------------------|----------------------|-----------------------------|
| Elasticsearch (ES)        | 600                  | 1                           |
| ES + Alibaba Cloud        | 900                  | 20                          |
| **Milvus**                | **27**               | **1000+**                   |
| SPTAG                     | Not good             | -                           |
| ES + nmslib, Faiss        | 90                   | 150                         |

**Milvus** outperforms its competitors significantly, handling over a billion 128-dimensional vectors with a retrieval time of just 27 milliseconds. In contrast, traditional solutions like Elasticsearch (ES) struggle with scalability and speed, particularly as dataset sizes increase. Milvus's separation of storage and compute enhances its scalability, making it adept at managing massive datasets without performance degradation.

### Integration with Machine Learning and AI Frameworks

Milvus seamlessly integrates with various machine learning and AI frameworks, facilitating the development of AI-driven applications. Its comprehensive set of intuitive APIs allows developers to:

- Choose appropriate indexing algorithms based on application requirements.
- Utilize distributed solutions for scalable deployment.
- Monitor system performance through integrated monitoring services.

**Milvus Lite** offers a powerful library alternative to basic vector search libraries like Faiss, providing superior performance and query capabilities while natively integrating mainstream vector search algorithms.

### Pricing and Licensing Models

Understanding the pricing and licensing is crucial for organizations when selecting a vector database:

- **Milvus**:
  - **Open-Source**: Available under an open-source license, allowing free use and modification.
  - **Zilliz Cloud**: Offers a fully managed service with flexible pricing options to accommodate different team sizes and budgets. A free tier is available for users to start with minimal cost.
  
- **Faiss and Annoy**:
  - **Open-Source Libraries**: Both are available under permissive licenses (FAISS under MIT), allowing free use in various projects.
  
- **Weaviate**:
  - **Open-Source Database**: Offers both community editions and enterprise solutions with additional features and support.
  
- **Pinecone**:
  - **Closed-Source Service**: Operates on a subscription-based model with pricing tiers based on usage and feature requirements.

**Milvus** provides a competitive advantage with its open-source model and the availability of Zilliz Cloud, which ensures scalability without significant upfront costs.

### Use Cases and Advantages

Milvus is tailored for a variety of applications, leveraging its high performance and scalability:

- **Real-Time Vector Similarity Search**: Ideal for applications requiring instant retrieval from large datasets, such as mobile security services.
- **Recommendation Systems**: Enhances personalization by efficiently matching user preferences with vast item pools.
- **Natural Language Processing (NLP)**: Facilitates tasks like semantic search and information retrieval by managing high-dimensional text embeddings.
- **Image and Video Retrieval**: Supports efficient indexing and searching of media content based on visual features.

**Advantages of Milvus** over competitors include:

- **Superior Performance**: Demonstrated faster retrieval times on massive datasets.
- **Scalability**: Handles billion-scale vectors effortlessly due to its distributed architecture.
- **Flexibility**: Supports multiple indexing algorithms, catering to diverse application needs.
- **Ease of Integration**: Compatible with various AI and ML frameworks, streamlining development workflows.
- **Active Community and Support**: Backed by a robust open-source community and comprehensive support through Zilliz.

### Comparison with Other Vector Databases

#### Apache Cassandra

**Apache Cassandra**:
- **Strengths**: High scalability, fault tolerance, and distributed architecture.
- **Vector Capabilities**: Introduced vector embeddings and similarity search in Cassandra 5.0.
- **Advantages Over Milvus**: Excels in handling structured and semi-structured data with robust distributed features.
- **Limitations**: While it now supports vector search, it remains primarily a NoSQL database, potentially lacking the specialized optimizations found in Milvus.

#### OpenSearch

**OpenSearch**:
- **Strengths**: Comprehensive search and analytics capabilities with support for various data types.
- **Vector Capabilities**: Includes vector search as an add-on, leveraging memory-mapped files for performance optimization.
- **Advantages Over Milvus**: Combines vector search with traditional search functionalities, suitable for hybrid search scenarios.
- **Limitations**: Milvus offers more specialized and scalable vector search capabilities, particularly for AI-driven applications.

#### Weaviate

**Weaviate**:
- **Strengths**: Custom storage engine for vector-first workloads, simplicity with an all-in-one design.
- **Vector Capabilities**: Uses proprietary storage formats and integrates vector and inverted indexes for hybrid search.
- **Advantages Over Milvus**: Easier deployment with self-contained architecture, no need for external metadata systems.
- **Limitations**: Less flexible in indexing options compared to Milvus and may not scale as effectively for billion-scale datasets.

#### Pinecone

**Pinecone**:
- **Strengths**: Managed vector database service with a focus on ease of use and scalability.
- **Vector Capabilities**: Handles vector indexing and similarity search with minimal configuration.
- **Advantages Over Milvus**: Provides a fully managed solution, reducing operational overhead.
- **Limitations**: Being a closed-source service, it may involve higher costs and less flexibility compared to Milvus's open-source model.

### Pricing and Licensing Models

- **Milvus**:
  - **Open-Source**: Free to use under an open-source license.
  - **Zilliz Cloud**: Offers a managed service with flexible pricing, including a free tier to accommodate various budgets.
  
- **Faiss and Annoy**: Both are free, open-source libraries under permissive licenses.
  
- **Weaviate**: Provides both free community editions and paid enterprise options with additional features.
  
- **Pinecone**: Operates on a subscription-based model with tiered pricing based on usage.

Milvus stands out by providing cost-effective solutions through its open-source availability and the scalable, pay-as-you-go model of Zilliz Cloud.

### Performance and Scalability

Milvus is engineered for high performance and scalability:

- **Performance**: Achieves significantly lower retrieval times (e.g., 27 ms for 1 billion vectors) compared to competitors like Elasticsearch and ES combined with Faiss or nmslib.
- **Scalability**: Supports billion-scale datasets through its distributed architecture, decoupling storage from computation to allow seamless scaling.
- **Indexing Flexibility**: Offers multiple indexing algorithms (IVF, HNSW, ANNOY) to optimize for different use cases and performance requirements.
- **Resource Optimization**: Utilizes memory-mapped files and external metadata databases (etcd, MySQL) to efficiently manage resources and maintain performance.

### Use Cases and Advantages

**Milvus** excels in scenarios requiring real-time vector similarity search and handling vast datasets:

- **Real-Time Virus Detection**: Enables mobile security services to perform instant vector similarity searches on large-scale datasets.
- **Recommendation Systems**: Enhances personalization by efficiently querying and matching user preferences with extensive item databases.
- **AI and Machine Learning**: Integrates seamlessly with ML frameworks, supporting advanced AI-driven applications like natural language processing and image retrieval.
- **Enterprise Applications**: Trusted by thousands of corporations for production systems at scale, showcasing reliability and performance in mission-critical environments.

### Future Roadmap and Community Support

Milvus aims to evolve into a complete database solution for unstructured data processing:

- **Upcoming Enhancements**:
  - **Integration with Unstructured Data ETL**: Streamlining data ingestion and preprocessing workflows.
  - **Extended Cloud Support**: Broadening compatibility with major cloud platforms like Microsoft Azure and Google Cloud.
  - **Metadata Support**: Enhancing support for traditional metadata types such as lists and JSON objects.
  
- **Community and Development**: Maintained by a passionate community of technologists, Milvus continues to receive updates and optimizations. Future plans include heterogeneous hardware acceleration to reduce CPU overhead and introducing AI-enabled system parameter tuning for cost-effective vector retrieval.

### Conclusion

Milvus distinguishes itself as a premier vector database through its exceptional performance, scalability, and flexibility. Its open-source nature, combined with the managed Zilliz Cloud service, offers both cost-effective and scalable solutions suitable for a wide range of applications. Compared to other vector databases like Apache Cassandra, OpenSearch, Weaviate, and Pinecone, Milvus provides superior speed, specialized features for AI-driven tasks, and robust community support. As the demand for efficient vector search capabilities continues to grow, Milvus stands well-positioned to meet the evolving needs of enterprises and developers alike.

### Contact and Further Information

For more information on Milvus and to explore its capabilities:

- **Official Website**: [Milvus](https://www.milvus.io/)
- **GitHub Repository**: [Milvus on GitHub](https://github.com/milvus-io/milvus)
- **Zilliz Cloud**: [Start Free Trial](https://cloud.zilliz.com/signup)
- **Contact**: Reach out to Jingyu Zhang via email at [pr@zilliz.com](pr@zilliz.com)

### References

- Milvus Performance Benchmark: [Milvus vs. Competitors](https://github.com/milvus-io/milvus)
- Apache Cassandra Documentation: [Vector Search Concepts](https://cassandra.apache.org/doc/latest/cassandra/vector-search/concepts.html)
- Weaviate Overview: [Weaviate Documentation](https://weaviate.io/developers/weaviate)
- OpenSearch Information: [OpenSearch Official Site](https://opensearch.org/)
- Faiss Library: [Faiss by Facebook AI](https://zilliz.com/learn/faiss)
- Annoy Library: [Annoy Documentation](https://zilliz.com/learn/approximate-nearest-neighbor-oh-yeah-ANNOY)

# Other Resources
* https://milvus.io/docs/openai_agents_milvus.md
