From ca963550ece7693f648bab8e7ccc6eafe4716617 Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Wed, 20 Aug 2025 14:18:27 -0700 Subject: [PATCH 1/2] DEV: enhance TAG docs per GH issues --- .../advanced-concepts/tags.md | 190 +++++++++++++++--- .../ai/search-and-query/indexing/_index.md | 100 ++++++++- 2 files changed, 259 insertions(+), 31 deletions(-) diff --git a/content/develop/ai/search-and-query/advanced-concepts/tags.md b/content/develop/ai/search-and-query/advanced-concepts/tags.md index 8e36a2626..6915a4274 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/tags.md +++ b/content/develop/ai/search-and-query/advanced-concepts/tags.md @@ -20,7 +20,16 @@ weight: 6 Tag fields provide exact match search capabilities with high performance and memory efficiency. Use tag fields when you need to filter documents by specific values without the complexity of full-text search tokenization. -Tag fields interpret text as a simple list of *tags* delimited by a [separator](#separator-options) character (comma "`,`" by default). This approach enables simpler [tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping/#tokenization-rules-for-tag-fields" >}}) and encoding, making tag indexes much more efficient than full-text indexes. Note: even though tag and text fields both use text, they are two separate field types and so you don't query them the same way. +Tag fields interpret text as a simple list of *tags* delimited by a [separator](#separator-options) character. This approach enables simpler [tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping/#tokenization-rules-for-tag-fields" >}}) and encoding, making tag indexes much more efficient than full-text indexes. Note: even though tag and text fields both use text, they are two separate field types and so you don't query them the same way. + +{{% alert title="Important: Different defaults for HASH vs JSON" color="warning" %}} +- The default separator for hash documents is a comma (`,`). +- There is no default separator for JSON documents. You must explicitly specify one if needed. + +Specifying a tag from the text `"foo,bar"` behaves differently: +- For hash documents, two tags are created: `"foo"` and `"bar"`. +- For JSON documents, one tag is created: `"foo,bar"` (unless you add `SEPARATOR ","`). +{{% /alert %}} ## Tag fields vs text fields @@ -69,9 +78,35 @@ FT.CREATE ... SCHEMA ... {field_name} TAG [SEPARATOR {sep}] [CASESENSITIVE] ### Separator options -- **Hash documents**: Default separator is comma (`,`). You can use any printable ASCII character -- **JSON documents**: No default separator - you must specify one explicitly if needed -- **Custom separators**: Use semicolon (`;`), pipe (`|`), or other characters as needed +The separator behavior differs significantly between hash and JSON documents: + +**Hash documents** + +- The default separator is the comma (`,`). +- Strings are automatically splits at commas. For example, + the string `"red,blue,green"` becomes three tags: `"red"`, `"blue"`, and `"green"`. +- You can use any printable ASCII character as a custom separator. + +**JSON documents** + +- There is no default separator; it's effectively `null`. +- Treats the entire string as single tag unless you specify a separator with the `SEPARATOR` option. For example, + the string `"red,blue,green"` becomes one tag: `"red,blue,green"` +- Add `SEPARATOR ","` to your schema to allow splitting. +- You should use JSON arrays instead of comma-separated strings + +**Why the difference?** + +JSON has native array support, so the preferred approach is: + +```json +{"colors": ["red", "blue", "green"]} // Use with $.colors[*] AS colors TAG +``` +Rather than: + +```json +{"colors": "red,blue,green"} // Requires SEPARATOR "," +``` ### Case sensitivity @@ -80,33 +115,76 @@ FT.CREATE ... SCHEMA ... {field_name} TAG [SEPARATOR {sep}] [CASESENSITIVE] ### Examples -**Basic tag field with JSON:** -```sql -JSON.SET key:1 $ '{"colors": "red, orange, yellow"}' -FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.colors AS colors TAG SEPARATOR "," - -> FT.SEARCH idx '@colors:{orange}' -1) "1" -2) "key:1" -3) 1) "$" - 2) "{\"colors\":\"red, orange, yellow\"}" -``` +**Hash examples** -**Case-sensitive tags with Hash:** -```sql -HSET product:1 categories "Electronics,Gaming,PC" -FT.CREATE products ON HASH PREFIX 1 product: SCHEMA categories TAG CASESENSITIVE +1. Basic hash tag field (automatic comma splitting): -> FT.SEARCH products '@categories:{PC}' -1) "1" -2) "product:1" -``` + ```sql + HSET product:1 categories "Electronics,Gaming,PC" + FT.CREATE products ON HASH PREFIX 1 product: SCHEMA categories TAG -**Custom separator:** -```sql -HSET book:1 genres "Fiction;Mystery;Thriller" -FT.CREATE books ON HASH PREFIX 1 book: SCHEMA genres TAG SEPARATOR ";" -``` + > FT.SEARCH products '@categories:{Gaming}' + 1) "1" + 2) "product:1" + ``` + +1. Hash with custom separator: + + ```sql + HSET book:1 genres "Fiction;Mystery;Thriller" + FT.CREATE books ON HASH PREFIX 1 book: SCHEMA genres TAG SEPARATOR ";" + ``` + +1. Case-sensitive hash tags: + + ```sql + HSET product:1 categories "Electronics,Gaming,PC" + FT.CREATE products ON HASH PREFIX 1 product: SCHEMA categories TAG CASESENSITIVE + + > FT.SEARCH products '@categories:{PC}' # Case matters + 1) "1" + 2) "product:1" + ``` + +**JSON examples** + +1. JSON with string and explicit separator (not recommended): + + ```sql + JSON.SET key:1 $ '{"colors": "red, orange, yellow"}' + FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.colors AS colors TAG SEPARATOR "," + + > FT.SEARCH idx '@colors:{orange}' + 1) "1" + 2) "key:1" + 3) 1) "$" + 2) "{\"colors\":\"red, orange, yellow\"}" + ``` + +1. JSON with array of strings (recommended approach): + + ```sql + JSON.SET key:1 $ '{"colors": ["red", "orange", "yellow"]}' + FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.colors[*] AS colors TAG + + > FT.SEARCH idx '@colors:{orange}' + 1) "1" + 2) "key:1" + 3) 1) "$" + 2) "{\"colors\":[\"red\",\"orange\",\"yellow\"]}" + ``` + +1. JSON without separator (single tag): + + ```sql + JSON.SET key:1 $ '{"category": "Electronics,Gaming"}' + FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.category AS category TAG + # No SEPARATOR specified - entire string becomes one tag + + > FT.SEARCH idx '@category:{Electronics,Gaming}' # Must match exactly + 1) "1" + 2) "key:1" + ``` ## Query tag fields @@ -271,6 +349,62 @@ FT.SEARCH products "@tags:{ Top\\ Rated\\ Product }" See [Query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax#tag-filters" >}}) for complete escaping rules. +## Performance and architecture considerations + +### Multiple TAG fields versus a single TAG field + +You can structure your data in two ways: + +1. Multiple single-value TAG fields + + ```sql + FT.CREATE products ON JSON PREFIX 1 product: SCHEMA + $.color AS color TAG + $.brand AS brand TAG + $.type AS type TAG + + JSON.SET product:1 $ '{"color": "blue", "brand": "ASUS", "type": "laptop"}' + + # Query specific fields + FT.SEARCH products '@color:{blue} @brand:{ASUS}' + ``` + +1. Single multi-value TAG field + + ```sql + FT.CREATE products ON JSON PREFIX 1 product: SCHEMA + $.tags[*] AS tags TAG + + JSON.SET product:1 $ '{"tags": ["color:blue", "brand:ASUS", "type:laptop"]}' + + # Query with prefixed values + FT.SEARCH products '@tags:{color:blue} @tags:{brand:ASUS}' + ``` + +### Performance comparison + +Both approaches have similar performance characteristics: + +- Memory usage is comparable: TAG indexes are highly compressed regardless of structure. +- Query speed is similar: both use the same underlying inverted index structure. +- Index efficiency; TAG fields store only document IDs (1-2 bytes per entry). + +### Choose TAG fields based on your use case + +Use multiple TAG fields when: + +- You need field-specific queries (`@color:{blue}` vs `@brand:{ASUS}`). +- Your schema is stable and well-defined. +- You want cleaner, more readable queries. +- You need different configurations per field (for example, case-sensitive versus case-insensitive). + +Use single TAG field when: + +- You have dynamic or unknown tag categories. +- You want maximum flexibility for adding new tag types. +- Your application manages tag prefixing/namespacing. +- You have many sparse categorical fields. + ## An e-commerce use case ```sql diff --git a/content/develop/ai/search-and-query/indexing/_index.md b/content/develop/ai/search-and-query/indexing/_index.md index 1162e9a79..59d72cce7 100644 --- a/content/develop/ai/search-and-query/indexing/_index.md +++ b/content/develop/ai/search-and-query/indexing/_index.md @@ -167,14 +167,71 @@ For more information about search queries, see [Search query syntax]({{< relref [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) queries require `attribute` modifiers. Don't use JSONPath expressions in queries because the query parser doesn't fully support them. {{% /alert %}} +## Understanding TAG field behavior: hash versus JSON + +TAG fields behave differently depending on whether you're indexing hash or JSON documents. This difference is a common source of confusion. + +### Hash documents + +```sql +# HASH: Comma is the default separator +HSET product:1 category "Electronics,Gaming,PC" +FT.CREATE products ON HASH PREFIX 1 product: SCHEMA category TAG + +# Result: Creates 3 separate tags: "Electronics", "Gaming", "PC" +FT.SEARCH products '@category:{Gaming}' # ✅ Finds the document +``` + +### JSON documents + +```sql +# JSON: No default separator - the entire string becomes one tag +JSON.SET product:1 $ '{"category": "Electronics,Gaming,PC"}' +FT.CREATE products ON JSON PREFIX 1 product: SCHEMA $.category AS category TAG + +# Result: Creates 1 tag: "Electronics,Gaming,PC" +FT.SEARCH products '@category:{Gaming}' # ❌ Does NOT find the document +FT.SEARCH products '@category:{Electronics,Gaming,PC}' # ✅ Finds the document +``` + +### Making JSON documents behave like hash documents + +To get hash-like behavior in JSON, explicitly add `SEPARATOR ","`: + +```sql +JSON.SET product:1 $ '{"category": "Electronics,Gaming,PC"}' +FT.CREATE products ON JSON PREFIX 1 product: SCHEMA $.category AS category TAG SEPARATOR "," + +# Result: Creates 3 separate tags: "Electronics", "Gaming", "PC" +FT.SEARCH products '@category:{Gaming}' # ✅ Now finds the document +``` + +### Recommended approach for JSON + +Instead of comma-separated strings, use JSON arrays: + +```sql +JSON.SET product:1 $ '{"category": ["Electronics", "Gaming", "PC"]}' +FT.CREATE products ON JSON PREFIX 1 product: SCHEMA $.category[*] AS category TAG + +# Result: Creates 3 separate tags: "Electronics", "Gaming", "PC" +FT.SEARCH products '@category:{Gaming}' # ✅ Finds the document +``` + ## Index JSON arrays as TAG -The preferred method for indexing a JSON field with multivalued terms is using JSON arrays. Each value of the array is indexed, and those values must be scalars. If you want to index string or boolean values as TAGs within a JSON array, use the [JSONPath]({{< relref "/develop/data-types/json/path" >}}) wildcard operator. +For JSON documents, you have two approaches to create TAG fields with multiple values: -To index an item's list of available colors, specify the JSONPath `$.colors.*` in the `SCHEMA` definition during index creation: +### Approach 1: JSON arrays (recommended) + +The preferred method for indexing multiple tag values is using JSON arrays. Each array element becomes a separate tag value. Use the [JSONPath]({{< relref "/develop/data-types/json/path" >}}) wildcard operator `[*]` to index array elements. ```sql -127.0.0.1:6379> FT.CREATE itemIdx2 ON JSON PREFIX 1 item: SCHEMA $.colors.* AS colors TAG $.name AS name TEXT $.description as description TEXT +# Create index with array indexing +127.0.0.1:6379> FT.CREATE itemIdx2 ON JSON PREFIX 1 item: SCHEMA $.colors[*] AS colors TAG $.name AS name TEXT $.description as description TEXT + +# The JSON data uses arrays +# Each array element ("black", "silver") becomes a separate tag ``` Now you can search for silver headphones: @@ -187,6 +244,43 @@ Now you can search for silver headphones: 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" ``` +### Approach 2: strings with explicit separators + +You can also use comma-separated strings, but you must explicitly specify the `SEPARATOR`: + +```sql +# JSON with comma-separated string +JSON.SET item:1 $ '{"colors": "black,silver,gold"}' + +# Index with explicit separator +FT.CREATE itemIdx3 ON JSON PREFIX 1 item: SCHEMA $.colors AS colors TAG SEPARATOR "," + +# Now you can search individual colors +FT.SEARCH itemIdx3 "@colors:{silver}" +``` + +{{% alert title="Important: JSON vs HASH behavior" color="warning" %}} +- **JSON without SEPARATOR**: `"black,silver"` becomes one tag: `"black,silver"`. +- **JSON with SEPARATOR ","**: `"black,silver"` becomes two tags: `"black"` and `"silver"`. +- **Hash (default)**: `"black,silver"` becomes two tags: `"black"` and `"silver"`. + +For JSON, always specify `SEPARATOR ","` if you want to split comma-separated strings, or use arrays instead. +{{% /alert %}} + +### Which approach to choose? + +Use JSON arrays when: + +- You control the data structure. +- You want clean, structured data. +- You need to store complex values (strings with spaces, punctuation). + +Use strings with separators when: + +- You're migrating from hashes to JSON. +- You receive data as delimited strings. +- You need compatibility with existing systems. + ## Index JSON arrays as TEXT Starting with RediSearch v2.6.0, full text search can be done on an array of strings or on a JSONPath leading to multiple strings. From ea8829595394a5d4601f042852d1754b2d8693e1 Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Tue, 26 Aug 2025 06:35:43 -0700 Subject: [PATCH 2/2] Apply suggestion from doc review --- content/develop/ai/search-and-query/advanced-concepts/tags.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/develop/ai/search-and-query/advanced-concepts/tags.md b/content/develop/ai/search-and-query/advanced-concepts/tags.md index 6915a4274..ff1132684 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/tags.md +++ b/content/develop/ai/search-and-query/advanced-concepts/tags.md @@ -386,7 +386,7 @@ You can structure your data in two ways: Both approaches have similar performance characteristics: - Memory usage is comparable: TAG indexes are highly compressed regardless of structure. -- Query speed is similar: both use the same underlying inverted index structure. +- Query speed is also comparable, though single-value tags may offer a slight edge: both use the same underlying inverted index structure. - Index efficiency; TAG fields store only document IDs (1-2 bytes per entry). ### Choose TAG fields based on your use case