How to increase throughput for standalone instance? #25550

shauryagoel · 2023-07-12T14:16:27Z

shauryagoel
Jul 12, 2023

I started milvus-v2.2.9 in a standalone way using docker-compose without any CPU limits. I created an index containing 1 million embeddings using IVF_SQ8 and wrapped the search function of milvus (collection.search(...), _async=True in it), inside an async function. Now, when I send 100s of asynchronous calls to my function, I get CPU usage of around 250% with certain latency. If I increase the number of async calls to my function, I still get CPU usage of around 250%, but, with increased latency. My machine has 16 vCPUs.

I tried increasing the value of maxReadConcurrentRatio to 100.0 in milvus.yaml and still got no improvement in CPU usage.

Is there a way to increase CPU utilisation without impacting latency? Is there any configuration option that I am missing?

Answered by yhmo

Jul 13, 2023

Reasons:

1 million embeddings is a pretty small dataset. If you increase the data to 10 million, you will see higher CPU usage.
Standalone has only one query node, it searches one segment at the same time. If you use a cluster, and deploy several query nodes, they can search several segments parallelly. Large data set has more segments, suitable for parallel searching.
The client testing stress is not enough. Use multiple processes to call collection.search(), each process opens a connection. More intensive, more queries.

View full answer

yhmo · 2023-07-13T03:13:00Z

yhmo
Jul 13, 2023
Collaborator

Reasons:

1 million embeddings is a pretty small dataset. If you increase the data to 10 million, you will see higher CPU usage.
Standalone has only one query node, it searches one segment at the same time. If you use a cluster, and deploy several query nodes, they can search several segments parallelly. Large data set has more segments, suitable for parallel searching.
The client testing stress is not enough. Use multiple processes to call collection.search(), each process opens a connection. More intensive, more queries.

7 replies

shauryagoel Jul 13, 2023
Author

Is there a doc on how to create and handle multiple connections? I am able to find how to create a single connection, but, not how to handle multiple connections.

Also, why can't a single connection provide enough stress to the Milvus server? What is the bottleneck in that case?

yhmo Jul 13, 2023
Collaborator

A script to create 10 processes parallelly send search requests to server. Each process uses a connection to call API.

import random
import multiprocessing
import time
import os

from pymilvus import (
    connections,
    FieldSchema, CollectionSchema, DataType,
    Collection,
    utility
)


_HOST = '127.0.0.1'
_PORT = '19530'

# Const names
_COLLECTION_NAME = 'demo'
_VECTOR_FIELD_NAME = 'vectors'
_DIM = 128


def search(repeat):
    print(f"process {os.getpid()} begin")

    connection_name = f"proc_{os.getpid()}"
    connections.connect(host=_HOST, port=_PORT, alias=connection_name) # use process id as connection's name

    collection = Collection(_COLLECTION_NAME, using=connection_name) # use the connection to call api for this collection
    search_vectors = [[random.random() for _ in range(_DIM)] for _ in range(1)]
    search_param = {
        "data": search_vectors,
        "anns_field": _VECTOR_FIELD_NAME,
        "param": {"metric_type": 'L2', "params": {"nprobe": 64}},
        "limit": 10}

    # repeat search
    for i in range(repeat):
        results = collection.search(**search_param)
        # for i, result in enumerate(results):
        #     print("\nSearch result for {}th vector: ".format(i))
        #     for j, res in enumerate(result):
        #         print("Top {}: {}".format(j, res))

    print(f"process {os.getpid()} end")

def create_collection():
    connections.connect(host=_HOST, port=_PORT, alias="default")  # create a connection with name='default'

    if utility.has_collection(_COLLECTION_NAME):
        utility.drop_collection(_COLLECTION_NAME)

    field1 = FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True)
    field2 = FieldSchema(name=_VECTOR_FIELD_NAME, dtype=DataType.FLOAT_VECTOR, dim=_DIM)
    schema = CollectionSchema(fields=[field1, field2])
    collection = Collection(name=_COLLECTION_NAME, schema=schema)
    print("collection created")

    # insert batch by batch
    for i in range(100):
        row_count = 10000
        data = [
            [[random.random() for _ in range(_DIM)] for _ in range(row_count)],  # vector
        ]
        collection.insert(data)
        print(f"{row_count} entities inserted")

    collection.flush()
    print("collection has {} entities".format(collection.num_entities))

    index_param = {
        "index_type": "IVF_FLAT",
        "params": {"nlist":512},
        "metric_type": "L2"}
    collection.create_index(field_name=_VECTOR_FIELD_NAME, index_params=index_param)
    print("index created")
    collection.load()

if __name__ == '__main__':
    create_collection()

    # multiple processes to perform search requests
    start = time.time()
    proc_num = 10
    requests_per_process = 1000
    print("process number", int(proc_num))
    process_list = []
    for i in range(int(proc_num)):
        p = multiprocessing.Process(target=search, args=(requests_per_process,))
        p.start()
        process_list.append(p)
        print("create process", i)

    print("wait processes...")
    for p in process_list:
        p.join()

    end = time.time()
    print(f"finish {proc_num*requests_per_process} search requests, time cost: {end - start} seconds")
    qps = proc_num*requests_per_process/(end - start)
    print(f"qps: {qps}/second")

yhmo Jul 13, 2023
Collaborator

Change the proc_num to 1 to observe cpu usage.
Change the insert data to 10 million to observe cpu usage.

shivanshu20 May 15, 2024

@yhmo Hi, I am working on parallel processing on similarity search.But it is taking long time for my app.Can you suggest the best ways for achieving in less time.

yhmo May 17, 2024
Collaborator

Just set consistency_level to "Eventually".
Search performance is affected by many factors:

number of vectors, dimension of vectors
segment max size
shards_num
are there a lot of partitions?
index parameters
replica_number
search parameters
is there a filtering expression?
is there output any other fields?
CPU cores and frequency, DISKANN is affected by hard disk ability
sometimes proxy number could be bottleneck
etcd performance should be good

Different situations get different performances. So, it is hard to suggest a "best way". Most of the time we test and observe, case by case.

yhmo · 2023-07-14T03:43:34Z

yhmo
Jul 14, 2023
Collaborator

@shauryagoel
A similar question: #25571

0 replies

xiaofan-luan · 2024-05-18T14:15:56Z

xiaofan-luan
May 18, 2024
Maintainer

use latest 2.3 should solve your problem.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to increase throughput for standalone instance? #25550

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

How to increase throughput for standalone instance? #25550

shauryagoel Jul 12, 2023

Replies: 3 comments · 7 replies

yhmo Jul 13, 2023 Collaborator

shauryagoel Jul 13, 2023 Author

yhmo Jul 13, 2023 Collaborator

yhmo Jul 13, 2023 Collaborator

shivanshu20 May 15, 2024

yhmo May 17, 2024 Collaborator

yhmo Jul 14, 2023 Collaborator

xiaofan-luan May 18, 2024 Maintainer

shauryagoel
Jul 12, 2023

Replies: 3 comments 7 replies

yhmo
Jul 13, 2023
Collaborator

shauryagoel Jul 13, 2023
Author

yhmo Jul 13, 2023
Collaborator

yhmo Jul 13, 2023
Collaborator

yhmo May 17, 2024
Collaborator

yhmo
Jul 14, 2023
Collaborator

xiaofan-luan
May 18, 2024
Maintainer