# Partition Key

WMilvus collection can have one or more partitions, which in turn contains one or more data segments.

In Milvus, there can be up to 4096 partitions per collection.

We can create, list and delete the partitions manually.

Milvus has added the ability to automatically create partitions based on the hashed values of a specific field in a collection. This helps us to have Milvus automatically manage the partitions and also helps us to search quickly within the collection. A partition key is a field in the collection whose values are used to create and define the partitions in a milvus collection.

https://milvus.io/docs/use-partition-key.md#Use-Partition-Key

In [109]:
from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
import random
import string

In [110]:
connections.connect(
  alias="default",
  host='localhost',
  port='19530'
)
utility.drop_collection("partition_key_collection")

In [111]:
## Field Schema
song_name = FieldSchema(
  name="song_name",
  dtype=DataType.VARCHAR,
  max_length=200,
)
song_id = FieldSchema(
  name="song_id",
  dtype=DataType.INT64,
  is_primary=True,
)
listen_count = FieldSchema(
  name="listen_count",
  dtype=DataType.INT64,
)
song_vec = FieldSchema(
  name="song_vec",
  dtype=DataType.FLOAT_VECTOR,
  dim=2
)
language = FieldSchema(
  name="language",
  dtype=DataType.VARCHAR,
  max_length=64,
  is_partition_key=True
)

# Collection schema
collection_schema = CollectionSchema(
  fields=[song_name, song_id, listen_count, song_vec, language],
  description="Album Songs"
)

# Create collection
collection = Collection(
    name="partition_key_collection",
    schema=collection_schema,
    partition_key_field="language",
    using='default')


utility.list_collections()

['dynamic_schema_example', 'Album1', 'partition_key_collection']

If you notice carefully, there is an additional setting in this field schema called is_partition_key which is set to true. So this particular field will be the partition key for the schema.

Note that while creating the collection, we have an additional parameter called partition_key_field and the value for this is language, which is the partition key that we had defined earlier.

In [112]:
# Prepare data to be inserted
num_entities=100
data = [
    ["song"+str(i+1) for i in range(num_entities)],  # song_name
    [i+1 for i in range(num_entities)],  # song_id
    [random.randint(50,500) for _ in range(num_entities)],  # listen_count
    [[random.random(), random.random()] for _ in range(num_entities)],  # song_vec
    [random.choice(["english", "turkish", "french"]) for _ in range(num_entities)]  # language
]

In [113]:
# Insert sample data
collection.insert(data)

(insert count: 100, delete count: 0, upsert count: 0, timestamp: 450514377014247429, success count: 100, err count: 0)

In [114]:
# Create an index for the collection
index_params = {
    "index_type": "IVF_FLAT",
    "params": {"nlist": 128},
    "metric_type": "L2"
}

collection.create_index(field_name="song_vec", index_params=index_params)

Status(code=0, message=)

In [115]:
# Load the collection into memory
collection.load()

By filtering the data based on the partition key, we have to use the search method within the collection object, and for the expression, we will have to specify the expression with the partition key name and the matching value.

In this example, we will filter the results where the partition key value is English and the partition key is language field.

In [116]:
## Vector Similarity Search
results = collection.search(
	data=[[0.1, 0.2]], 
	anns_field="song_vec", 
	param={"metric_type": "L2", "params": {"search_k": 64}},
	limit=5, 
	expr='language=="english"', # Limit the search for a partition key
	output_fields=['song_name']
)

Note that special care must be taken to ensure that the partition key must not result in too many partitions, which could go beyond the limit.

In [118]:
for result in results[0]:
    print (result)

id: 20, distance: 0.017796462401747704, entity: {'song_name': 'song20'}
id: 31, distance: 0.028050808236002922, entity: {'song_name': 'song31'}
id: 2, distance: 0.04357391595840454, entity: {'song_name': 'song2'}
id: 14, distance: 0.04771279916167259, entity: {'song_name': 'song14'}
id: 98, distance: 0.04890989139676094, entity: {'song_name': 'song98'}
