diff --git a/README.md b/README.md index 9493022..e7f3af2 100644 --- a/README.md +++ b/README.md @@ -123,11 +123,15 @@ SELECT * FROM articles WHERE id = '3f2e1a4b-...' -- Collections CREATE COLLECTION articles CREATE COLLECTION articles HYBRID +CREATE COLLECTION articles HNSW { payload_m: 16 } CREATE COLLECTION articles QUANTIZE SCALAR CREATE COLLECTION articles QUANTIZE TURBO CREATE COLLECTION articles QUANTIZE TURBO BITS 2 CREATE COLLECTION articles QUANTIZE TURBO BITS 1.5 ALWAYS RAM CREATE INDEX ON COLLECTION articles FOR year TYPE integer +CREATE INDEX ON COLLECTION articles FOR tenant_id TYPE keyword WITH { is_tenant: true, on_disk: true } +CREATE INDEX ON COLLECTION articles FOR doc_id TYPE uuid +CREATE INDEX ON COLLECTION articles FOR title TYPE text WITH { tokenizer: 'word', min_token_len: 2, lowercase: true } SHOW COLLECTIONS SHOW COLLECTION articles DROP COLLECTION articles diff --git a/docs/collections.md b/docs/collections.md index fbc4d14..ebb0862 100644 --- a/docs/collections.md +++ b/docs/collections.md @@ -93,9 +93,10 @@ CREATE COLLECTION HYBRID CREATE COLLECTION USING MODEL '' CREATE COLLECTION USING HYBRID CREATE COLLECTION USING HYBRID DENSE MODEL '' +CREATE COLLECTION HNSW { payload_m: } ``` -Any of the above forms can be followed by an optional `QUANTIZE` clause — see [Quantization](#quantization--quantize-clause) below. +Any of the above forms can be followed by an optional `QUANTIZE` clause and/or `HNSW { payload_m: }`. **Examples:** @@ -119,8 +120,25 @@ Hybrid collection with a custom dense model: CREATE COLLECTION research_papers USING HYBRID DENSE MODEL 'BAAI/bge-base-en-v1.5' ``` +Dense collection with payload-aware HNSW links: +```sql +CREATE COLLECTION research_papers HNSW {payload_m: 16} +``` + When `USING MODEL` is omitted, the collection uses the **default embedding model's dimensions** (384 for `all-MiniLM-L6-v2`). If the collection already exists, the command succeeds with a message and does nothing. +### HNSW clause + +QQL currently supports one explicit HNSW knob during collection creation: + +- `payload_m` — enables payload-aware HNSW connectivity used by Qdrant for filtered / tenant-aware workloads + +Example: + +```sql +CREATE COLLECTION tenant_docs USING HYBRID HNSW {payload_m: 16} +``` + --- ## Quantization — QUANTIZE clause @@ -239,6 +257,7 @@ Creates a payload index on a collection field. Payload indexes speed up `WHERE` **Syntax:** ``` CREATE INDEX ON COLLECTION FOR TYPE +CREATE INDEX ON COLLECTION FOR TYPE WITH { ... } ``` **Supported schema types:** @@ -252,19 +271,41 @@ CREATE INDEX ON COLLECTION FOR TYPE | `text` | Full-text search (enables `MATCH` operators) | | `geo` | Geospatial coordinates | | `datetime` | Date/time values | +| `uuid` | UUID payload values | **Examples:** ```sql CREATE INDEX ON COLLECTION articles FOR category TYPE keyword +CREATE INDEX ON COLLECTION articles FOR tenant_id TYPE keyword WITH {is_tenant: true, on_disk: true, enable_hnsw: true} CREATE INDEX ON COLLECTION articles FOR year TYPE integer +CREATE INDEX ON COLLECTION articles FOR doc_id TYPE uuid CREATE INDEX ON COLLECTION articles FOR title TYPE text +CREATE INDEX ON COLLECTION articles FOR title TYPE text WITH {tokenizer: 'word', min_token_len: 2, max_token_len: 20, lowercase: true, phrase_matching: true} CREATE INDEX ON COLLECTION articles FOR meta.author TYPE keyword ``` +**Advanced options currently supported:** + +- `keyword` / `uuid` + - `is_tenant: true|false` + - `on_disk: true|false` + - `enable_hnsw: true|false` +- `text` + - `tokenizer: 'prefix'|'whitespace'|'word'|'multilingual'` + - `min_token_len: ` + - `max_token_len: ` + - `lowercase: true|false` + - `ascii_folding: true|false` + - `phrase_matching: true|false` + - `stopwords: 'english'` or `stopwords: ['a', 'the']` + - `on_disk: true|false` + - `enable_hnsw: true|false` + **Rules:** - The collection must already exist. Raises an error otherwise. - Indexes are idempotent — creating the same index twice succeeds silently. +- Advanced `WITH { ... }` options are currently supported only for `keyword`, `uuid`, and `text`. --- diff --git a/docs/programmatic.md b/docs/programmatic.md index e44b0f7..ad85fa4 100644 --- a/docs/programmatic.md +++ b/docs/programmatic.md @@ -88,7 +88,7 @@ result = run_query( ) print(result.data["topology"]) # "dense" or "hybrid" print(result.data["vectors"]) # {"": {...}} or {"dense": {...}, ...} -print(result.data["payload_schema"]) # {"field": "keyword", ...} or None +print(result.data["payload_schema"]) # {"field": {"type": "keyword", ...}, ...} or None ``` --- diff --git a/docs/reference.md b/docs/reference.md index 135f7fa..7944386 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -192,5 +192,6 @@ Expected output: **500 tests passing**. | `Vector elements must be numeric; got invalid value: ...` | A non-numeric value (string or null) was present in the vector array for `UPDATE SET VECTOR` | Ensure all vector elements are floats: `UPDATE … [0.1, 0.2, …, 0.N]` | | `GROUP_SIZE must be a positive integer, got N` | `GROUP_SIZE 0` or a negative value was specified | Use a positive integer: `GROUP_SIZE 3` | | `Qdrant error during SCROLL: ...` | Qdrant rejected scroll request | Verify collection state, filter, and cursor (`AFTER`) value | -| `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime` | +| `Unknown index type '...'` | Invalid schema type in CREATE INDEX | Use one of: `keyword`, `integer`, `float`, `bool`, `text`, `geo`, `datetime`, `uuid` | +| `Unknown CREATE INDEX option '...'` | Unsupported advanced option for the chosen payload index type | Check which `WITH { ... }` keys are supported for `keyword`, `uuid`, or `text` | | `Qdrant error during CREATE INDEX: ...` | Qdrant rejected the index creation | Check field name and collection state | diff --git a/src/qql/ast_nodes.py b/src/qql/ast_nodes.py index 9c92adf..3333230 100644 --- a/src/qql/ast_nodes.py +++ b/src/qql/ast_nodes.py @@ -172,6 +172,7 @@ class CreateCollectionStmt: hybrid: bool = False # if True, create with dense + sparse named vectors model: str | None = None # dense model; None → use config default quantization: QuantizationConfig | None = None # optional QUANTIZE clause + payload_m: int | None = None # optional HNSW { payload_m: N } clause @dataclass(frozen=True) @@ -179,6 +180,7 @@ class CreateIndexStmt: collection: str field_name: str schema: str + options: dict[str, Any] | None = None @dataclass(frozen=True) diff --git a/src/qql/cli.py b/src/qql/cli.py index 457a497..78b5200 100644 --- a/src/qql/cli.py +++ b/src/qql/cli.py @@ -38,6 +38,7 @@ Create a new collection. Add HYBRID for dense+sparse BM25 vectors. Optional: [yellow]USING MODEL[/yellow] '' Optional: [yellow]USING HYBRID[/yellow] [DENSE MODEL ''] + Optional: [yellow]HNSW[/yellow] { payload_m: } Optional: [yellow]QUANTIZE SCALAR[/yellow] [QUANTILE <0.0–1.0>] [ALWAYS RAM] Optional: [yellow]QUANTIZE BINARY[/yellow] [ALWAYS RAM] Optional: [yellow]QUANTIZE PRODUCT[/yellow] [ALWAYS RAM] (4× compression) @@ -46,6 +47,11 @@ [yellow]DROP COLLECTION[/yellow] Delete a collection and all its points. + [yellow]CREATE INDEX ON COLLECTION[/yellow] [yellow]FOR[/yellow] [yellow]TYPE[/yellow] + Create a payload index for filtering or text search. + Optional: [yellow]WITH[/yellow] { is_tenant, on_disk, enable_hnsw } for keyword/uuid + Optional: [yellow]WITH[/yellow] { tokenizer, min_token_len, max_token_len, lowercase, ascii_folding, phrase_matching, stopwords, on_disk, enable_hnsw } for text + [yellow]SHOW COLLECTIONS[/yellow] List all collections in the connected Qdrant instance. @@ -420,8 +426,16 @@ def _format_collection_diagnostics(data: dict) -> str: schema = data["payload_schema"] if schema: lines.append(" Payload indexes:") - for field, dtype in schema.items(): - lines.append(f" {field}: {dtype}") + for field, index_info in schema.items(): + if isinstance(index_info, dict): + line = f" {field}: {index_info.get('type')}" + params = index_info.get("params") + if params: + rendered = ", ".join(f"{k}={v}" for k, v in params.items()) + line += f" ({rendered})" + lines.append(line) + else: + lines.append(f" {field}: {index_info}") else: lines.append(" Payload indexes : none") diff --git a/src/qql/executor.py b/src/qql/executor.py index 65fdd71..450099b 100644 --- a/src/qql/executor.py +++ b/src/qql/executor.py @@ -18,8 +18,12 @@ Fusion, FusionQuery, HasIdCondition, + HnswConfigDiff, IsEmptyCondition, IsNullCondition, + KeywordIndexParams, + KeywordIndexType, + Language, LookupLocation, MatchAny, MatchExcept, @@ -51,6 +55,12 @@ SearchParams, SparseVector, SparseVectorParams, + StopwordsSet, + TextIndexParams, + TextIndexType, + TokenizerType, + UuidIndexParams, + UuidIndexType, VectorParams, ) @@ -336,6 +346,12 @@ def _execute_create(self, node: CreateCollectionStmt) -> ExecutionResult: if node.quantization is not None else "" ) + hnsw_config = ( + HnswConfigDiff(payload_m=node.payload_m) + if node.payload_m is not None + else None + ) + hnsw_label = f", payload_m={node.payload_m}" if node.payload_m is not None else "" # ── Hybrid collection: named dense + sparse vectors ──────────────── if node.hybrid: @@ -352,12 +368,14 @@ def _execute_create(self, node: CreateCollectionStmt) -> ExecutionResult: } if quant_config is not None: create_kwargs["quantization_config"] = quant_config + if hnsw_config is not None: + create_kwargs["hnsw_config"] = hnsw_config self._create_collection_and_wait(**create_kwargs) return ExecutionResult( success=True, message=( f"Collection '{node.collection}' created " - f"(hybrid: {dims}-dim dense + BM25 sparse, cosine distance{quant_label})" + f"(hybrid: {dims}-dim dense + BM25 sparse, cosine distance{quant_label}{hnsw_label})" ), ) @@ -370,10 +388,12 @@ def _execute_create(self, node: CreateCollectionStmt) -> ExecutionResult: } if quant_config is not None: create_kwargs["quantization_config"] = quant_config + if hnsw_config is not None: + create_kwargs["hnsw_config"] = hnsw_config self._create_collection_and_wait(**create_kwargs) return ExecutionResult( success=True, - message=f"Collection '{node.collection}' created ({dims}-dimensional vectors, cosine distance{quant_label})", + message=f"Collection '{node.collection}' created ({dims}-dimensional vectors, cosine distance{quant_label}{hnsw_label})", ) def _execute_create_index(self, node: CreateIndexStmt) -> ExecutionResult: @@ -388,14 +408,16 @@ def _execute_create_index(self, node: CreateIndexStmt) -> ExecutionResult: "text": PayloadSchemaType.TEXT, "geo": PayloadSchemaType.GEO, "datetime": PayloadSchemaType.DATETIME, + "uuid": PayloadSchemaType.UUID, } try: - field_schema = schema_map[node.schema] + schema_map[node.schema] except KeyError as e: raise QQLRuntimeError( "Unknown index type '" - f"{node.schema}'. Expected one of: keyword, integer, float, bool, text, geo, datetime" + f"{node.schema}'. Expected one of: keyword, integer, float, bool, text, geo, datetime, uuid" ) from e + field_schema = self._build_payload_index_schema(node) try: self._client.create_payload_index( @@ -406,10 +428,11 @@ def _execute_create_index(self, node: CreateIndexStmt) -> ExecutionResult: except UnexpectedResponse as e: raise QQLRuntimeError(f"Qdrant error during CREATE INDEX: {e}") from e + option_label = f" with options {node.options}" if node.options else "" return ExecutionResult( success=True, message=( - f"Created index on '{node.collection}.{node.field_name}' as '{node.schema}'" + f"Created index on '{node.collection}.{node.field_name}' as '{node.schema}'{option_label}" ), ) @@ -503,7 +526,7 @@ def _execute_show_collection(self, node: ShowCollectionStmt) -> ExecutionResult: # ── Payload schema / indexes ─────────────────────────────────────── payload_indexes = {} for field_name, idx_info in (info.payload_schema or {}).items(): - payload_indexes[field_name] = str(idx_info.data_type) + payload_indexes[field_name] = self._serialize_payload_index_info(idx_info) # ── Sharding / replication ───────────────────────────────────────── sharding = { @@ -829,6 +852,193 @@ def _execute_recommend(self, node: RecommendStmt) -> ExecutionResult: data=results, ) + def _build_payload_index_schema(self, node: CreateIndexStmt) -> Any: + options = node.options or {} + if node.schema == "keyword": + self._validate_index_option_keys( + node.schema, + options, + {"is_tenant", "on_disk", "enable_hnsw"}, + ) + if not options: + return PayloadSchemaType.KEYWORD + return KeywordIndexParams( + type=KeywordIndexType.KEYWORD, + is_tenant=self._index_bool_option(options, "is_tenant"), + on_disk=self._index_bool_option(options, "on_disk"), + enable_hnsw=self._index_bool_option(options, "enable_hnsw"), + ) + + if node.schema == "uuid": + self._validate_index_option_keys( + node.schema, + options, + {"is_tenant", "on_disk", "enable_hnsw"}, + ) + if not options: + return PayloadSchemaType.UUID + return UuidIndexParams( + type=UuidIndexType.UUID, + is_tenant=self._index_bool_option(options, "is_tenant"), + on_disk=self._index_bool_option(options, "on_disk"), + enable_hnsw=self._index_bool_option(options, "enable_hnsw"), + ) + + if node.schema == "text": + self._validate_index_option_keys( + node.schema, + options, + { + "tokenizer", + "min_token_len", + "max_token_len", + "lowercase", + "ascii_folding", + "phrase_matching", + "stopwords", + "on_disk", + "enable_hnsw", + }, + ) + if not options: + return PayloadSchemaType.TEXT + min_token_len = self._index_int_option(options, "min_token_len") + max_token_len = self._index_int_option(options, "max_token_len") + if ( + min_token_len is not None + and max_token_len is not None + and min_token_len > max_token_len + ): + raise QQLRuntimeError( + "CREATE INDEX text option min_token_len cannot be greater than max_token_len" + ) + return TextIndexParams( + type=TextIndexType.TEXT, + tokenizer=self._text_tokenizer_option(options), + min_token_len=min_token_len, + max_token_len=max_token_len, + lowercase=self._index_bool_option(options, "lowercase"), + ascii_folding=self._index_bool_option(options, "ascii_folding"), + phrase_matching=self._index_bool_option(options, "phrase_matching"), + stopwords=self._text_stopwords_option(options), + on_disk=self._index_bool_option(options, "on_disk"), + enable_hnsw=self._index_bool_option(options, "enable_hnsw"), + ) + + if options: + raise QQLRuntimeError( + f"CREATE INDEX type '{node.schema}' does not support advanced options yet" + ) + + schema_map = { + "keyword": PayloadSchemaType.KEYWORD, + "integer": PayloadSchemaType.INTEGER, + "float": PayloadSchemaType.FLOAT, + "bool": PayloadSchemaType.BOOL, + "text": PayloadSchemaType.TEXT, + "geo": PayloadSchemaType.GEO, + "datetime": PayloadSchemaType.DATETIME, + "uuid": PayloadSchemaType.UUID, + } + return schema_map[node.schema] + + def _validate_index_option_keys( + self, + schema: str, + options: dict[str, Any], + allowed: set[str], + ) -> None: + unknown_keys = set(options) - allowed + if unknown_keys: + allowed_list = ", ".join(sorted(allowed)) + raise QQLRuntimeError( + f"Unknown CREATE INDEX option '{sorted(unknown_keys)[0]}' for type '{schema}'. " + f"Expected one of: {allowed_list}" + ) + + def _index_bool_option(self, options: dict[str, Any], key: str) -> bool | None: + value = options.get(key) + if value is None: + return None + if not isinstance(value, bool): + raise QQLRuntimeError(f"CREATE INDEX option '{key}' must be a boolean") + return value + + def _index_int_option(self, options: dict[str, Any], key: str) -> int | None: + value = options.get(key) + if value is None: + return None + if not isinstance(value, int) or isinstance(value, bool) or value <= 0: + raise QQLRuntimeError( + f"CREATE INDEX option '{key}' must be a positive integer" + ) + return value + + def _text_tokenizer_option(self, options: dict[str, Any]) -> TokenizerType | None: + value = options.get("tokenizer") + if value is None: + return None + if not isinstance(value, str): + raise QQLRuntimeError("CREATE INDEX option 'tokenizer' must be a string") + tokenizer_map = { + "prefix": TokenizerType.PREFIX, + "whitespace": TokenizerType.WHITESPACE, + "word": TokenizerType.WORD, + "multilingual": TokenizerType.MULTILINGUAL, + } + try: + return tokenizer_map[value.lower()] + except KeyError as e: + raise QQLRuntimeError( + "CREATE INDEX option 'tokenizer' must be one of: " + "prefix, whitespace, word, multilingual" + ) from e + + def _text_stopwords_option( + self, options: dict[str, Any] + ) -> Language | StopwordsSet | None: + value = options.get("stopwords") + if value is None: + return None + if isinstance(value, str): + try: + return Language(value.lower()) + except ValueError as e: + raise QQLRuntimeError( + "CREATE INDEX option 'stopwords' must be a known language name or a list of strings" + ) from e + if isinstance(value, list) and all(isinstance(item, str) for item in value): + return StopwordsSet(custom=value) + raise QQLRuntimeError( + "CREATE INDEX option 'stopwords' must be a string language name or a list of strings" + ) + + def _serialize_payload_index_info(self, idx_info: Any) -> dict[str, Any]: + params = idx_info.params + data = {"type": str(idx_info.data_type)} + if params is None or not hasattr(params, "model_dump"): + return data + details: dict[str, Any] = {} + for key, value in params.model_dump(exclude_none=True).items(): + if key == "type": + continue + details[key] = self._serialize_payload_index_value(value) + if details: + data["params"] = details + return data + + def _serialize_payload_index_value(self, value: Any) -> Any: + if hasattr(value, "value"): + return value.value + if isinstance(value, dict): + return { + key: self._serialize_payload_index_value(item) + for key, item in value.items() + } + if isinstance(value, list): + return [self._serialize_payload_index_value(item) for item in value] + return value + def _build_search_params(self, with_clause: SearchWith | None) -> SearchParams | None: if with_clause is None: return None diff --git a/src/qql/lexer.py b/src/qql/lexer.py index 3e1b277..72b09c0 100644 --- a/src/qql/lexer.py +++ b/src/qql/lexer.py @@ -30,6 +30,7 @@ class TokenKind(Enum): RAM = auto() TURBO = auto() BITS = auto() + HNSW = auto() CREATE = auto() INDEX = auto() ON = auto() @@ -129,6 +130,7 @@ class TokenKind(Enum): "RAM": TokenKind.RAM, "TURBO": TokenKind.TURBO, "BITS": TokenKind.BITS, + "HNSW": TokenKind.HNSW, "CREATE": TokenKind.CREATE, "INDEX": TokenKind.INDEX, "ON": TokenKind.ON, diff --git a/src/qql/parser.py b/src/qql/parser.py index 2beed4b..0f8851b 100644 --- a/src/qql/parser.py +++ b/src/qql/parser.py @@ -23,7 +23,6 @@ NotExpr, NotInExpr, OrExpr, - QuantizationSearchWith, QuantizationConfig, QuantizationType, QuantizationSearchWith, @@ -194,15 +193,36 @@ def _parse_create(self) -> CreateCollectionStmt | CreateIndexStmt: # ── Optional QUANTIZE clause ────────────────────────────────── quantization: QuantizationConfig | None = None - if self._peek().kind == TokenKind.QUANTIZE: - self._advance() # consume QUANTIZE - quantization = self._parse_quantize_clause() + payload_m: int | None = None + seen_quantize = False + seen_hnsw = False + while self._peek().kind in (TokenKind.QUANTIZE, TokenKind.HNSW): + if self._peek().kind == TokenKind.QUANTIZE: + if seen_quantize: + raise QQLSyntaxError( + "QUANTIZE clause may only appear once", + self._peek().pos, + ) + self._advance() # consume QUANTIZE + quantization = self._parse_quantize_clause() + seen_quantize = True + continue + + if seen_hnsw: + raise QQLSyntaxError( + "HNSW clause may only appear once", + self._peek().pos, + ) + self._advance() # consume HNSW + payload_m = self._parse_collection_hnsw_clause() + seen_hnsw = True return CreateCollectionStmt( collection=collection, hybrid=hybrid, model=model, quantization=quantization, + payload_m=payload_m, ) self._expect(TokenKind.INDEX) @@ -213,7 +233,32 @@ def _parse_create(self) -> CreateCollectionStmt | CreateIndexStmt: field_name = self._parse_field_path() self._expect(TokenKind.TYPE) schema = self._expect(TokenKind.IDENTIFIER).value.lower() - return CreateIndexStmt(collection=collection, field_name=field_name, schema=schema) + options: dict[str, Any] | None = None + if self._peek().kind == TokenKind.WITH: + self._advance() + options = self._parse_dict() + return CreateIndexStmt( + collection=collection, + field_name=field_name, + schema=schema, + options=options, + ) + + def _parse_collection_hnsw_clause(self) -> int: + config = self._parse_dict() + unknown_keys = set(config) - {"payload_m"} + if unknown_keys: + raise QQLSyntaxError( + "Unknown HNSW parameter " + f"'{sorted(unknown_keys)[0]}'. Expected: payload_m", + 0, + ) + if "payload_m" not in config: + raise QQLSyntaxError("HNSW clause requires payload_m", 0) + payload_m = config["payload_m"] + if not isinstance(payload_m, int) or isinstance(payload_m, bool) or payload_m <= 0: + raise QQLSyntaxError("payload_m must be a positive integer", 0) + return payload_m def _parse_quantize_clause(self) -> QuantizationConfig: """Parse: (SCALAR | BINARY | PRODUCT) [QUANTILE ] [ALWAYS RAM] diff --git a/tests/test_executor.py b/tests/test_executor.py index c35016d..543c45f 100644 --- a/tests/test_executor.py +++ b/tests/test_executor.py @@ -244,6 +244,15 @@ def test_create_new_collection(self, executor, mock_client): mock_client.create_collection.assert_called_once() assert result.success is True + def test_create_collection_passes_payload_m(self, executor, mock_client): + from qdrant_client.models import HnswConfigDiff + + node = CreateCollectionStmt(collection="new_col", payload_m=24) + executor.execute(node) + kw = mock_client.create_collection.call_args.kwargs + assert isinstance(kw["hnsw_config"], HnswConfigDiff) + assert kw["hnsw_config"].payload_m == 24 + def test_create_existing_collection_is_noop(self, executor, mock_client): mock_client.collection_exists.return_value = True node = CreateCollectionStmt(collection="existing") @@ -261,6 +270,74 @@ def test_create_index_calls_qdrant(self, executor, mock_client): mock_client.create_payload_index.assert_called_once() assert result.success is True + def test_create_index_supports_keyword_options(self, executor, mock_client): + from qdrant_client.models import KeywordIndexParams + + mock_client.collection_exists.return_value = True + node = CreateIndexStmt( + collection="articles", + field_name="tenant_id", + schema="keyword", + options={"is_tenant": True, "on_disk": True, "enable_hnsw": False}, + ) + executor.execute(node) + field_schema = mock_client.create_payload_index.call_args.kwargs["field_schema"] + assert isinstance(field_schema, KeywordIndexParams) + assert field_schema.is_tenant is True + assert field_schema.on_disk is True + assert field_schema.enable_hnsw is False + + def test_create_index_supports_text_options(self, executor, mock_client): + from qdrant_client.models import TextIndexParams, TokenizerType + + mock_client.collection_exists.return_value = True + node = CreateIndexStmt( + collection="articles", + field_name="title", + schema="text", + options={ + "tokenizer": "word", + "min_token_len": 2, + "max_token_len": 20, + "lowercase": True, + "phrase_matching": True, + }, + ) + executor.execute(node) + field_schema = mock_client.create_payload_index.call_args.kwargs["field_schema"] + assert isinstance(field_schema, TextIndexParams) + assert field_schema.tokenizer == TokenizerType.WORD + assert field_schema.min_token_len == 2 + assert field_schema.max_token_len == 20 + assert field_schema.lowercase is True + assert field_schema.phrase_matching is True + + def test_create_index_supports_uuid_options(self, executor, mock_client): + from qdrant_client.models import UuidIndexParams + + mock_client.collection_exists.return_value = True + node = CreateIndexStmt( + collection="articles", + field_name="doc_id", + schema="uuid", + options={"on_disk": True}, + ) + executor.execute(node) + field_schema = mock_client.create_payload_index.call_args.kwargs["field_schema"] + assert isinstance(field_schema, UuidIndexParams) + assert field_schema.on_disk is True + + def test_create_index_rejects_unknown_option(self, executor, mock_client): + mock_client.collection_exists.return_value = True + node = CreateIndexStmt( + collection="articles", + field_name="tenant_id", + schema="keyword", + options={"tokenizer": "word"}, + ) + with pytest.raises(QQLRuntimeError, match="Unknown CREATE INDEX option"): + executor.execute(node) + def test_create_index_nonexistent_collection_raises(self, executor, mock_client): mock_client.collection_exists.return_value = False node = CreateIndexStmt(collection="ghost", field_name="category", schema="keyword") @@ -498,6 +575,8 @@ def test_show_collection_with_payload_schema(self, executor, mock_client, mocker from qdrant_client.models import ( CollectionStatus, Distance, + KeywordIndexParams, + KeywordIndexType, PayloadSchemaType, VectorParams, ) @@ -506,6 +585,11 @@ def test_show_collection_with_payload_schema(self, executor, mock_client, mocker idx_info = mocker.MagicMock() idx_info.data_type = PayloadSchemaType.KEYWORD + idx_info.params = KeywordIndexParams( + type=KeywordIndexType.KEYWORD, + is_tenant=True, + on_disk=True, + ) mock_info = mocker.MagicMock() mock_info.status = CollectionStatus.GREEN @@ -532,7 +616,12 @@ def test_show_collection_with_payload_schema(self, executor, mock_client, mocker result = executor.execute(node) assert result.success is True - assert result.data["payload_schema"] == {"category": "keyword"} + assert result.data["payload_schema"] == { + "category": { + "type": "keyword", + "params": {"is_tenant": True, "on_disk": True}, + } + } def test_show_collection_handles_missing_payload_schema(self, executor, mock_client, mocker): from qdrant_client.models import (