redis-developer · rbs333 · Apr 6, 2026 · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/README.md b/README.md
@@ -154,9 +154,11 @@ The layered approach emerged from TDD — writing tests first revealed natural b
 - [x] Computed fields: `price * 0.9 AS discounted`
 - [x] Vector KNN search: `vector_distance(field, :param)`
 - [x] Hybrid search (filters + vector)
-- [x] Full-text search: `LIKE 'prefix%'` (prefix), `fulltext(field, 'terms')` function
+- [x] Full-text search: exact phrase, fuzzy, proximity, OR/union, LIKE patterns, BM25 scoring (see below)
 - [x] GEO field queries with full operator support (see below)
 - [x] Date functions: `YEAR()`, `MONTH()`, `DAY()`, `DATE_FORMAT()`, etc. (see below)
+- [x] `IS NULL` / `IS NOT NULL` via `ismissing()` (requires Redis 7.4+, see below)
+- [x] `exists()` function for field presence checks (see below)
 
 ## What's Not Implemented (Yet...)
 
@@ -166,6 +168,112 @@ The layered approach emerged from TDD — writing tests first revealed natural b
 - [ ] DISTINCT
 - [ ] Index creation from SQL (CREATE INDEX)
 
+### TEXT Search
+
+Full-text search on TEXT fields with multiple search modes:
+
+| Feature | SQL Syntax | RediSearch Output | Notes |
+|---------|-----------|-------------------|-------|
+| Exact phrase | `title = 'gaming laptop'` | `@title:"gaming laptop"` | Stopwords stripped |
+| Tokenized search | `fulltext(title, 'gaming laptop')` | `@title:(gaming laptop)` | Stopwords stripped |
+| Fuzzy LD=1 | `fuzzy(title, 'laptap')` | `@title:%laptap%` | |
+| Fuzzy LD=2 | `fuzzy(title, 'laptap', 2)` | `@title:%%laptap%%` | |
+| Fuzzy LD=3 | `fuzzy(title, 'laptap', 3)` | `@title:%%%laptap%%%` | |
+| OR / union | `fulltext(title, 'laptop OR tablet')` | `@title:(laptop\|tablet)` | |
+| Prefix | `title LIKE 'lap%'` | `@title:lap*` | |
+| Suffix | `title LIKE '%top'` | `@title:*top` | |
+| Contains | `title LIKE '%apt%'` | `@title:*apt*` | |
+| Proximity (slop) | `fulltext(title, 'gaming laptop', 2)` | `@title:(gaming laptop) => { $slop: 2; }` | |
+| Proximity + order | `fulltext(title, 'gaming laptop', 2, true)` | `@title:(gaming laptop) => { $slop: 2; $inorder: true; }` | |
+| Optional term | `fulltext(title, 'laptop ~gaming')` | `@title:(laptop ~gaming)` | |
+| BM25 score | `SELECT score() AS relevance FROM idx` | `FT.SEARCH ... WITHSCORES` | |
+| Negation | `NOT fulltext(title, 'refurbished')` | `-@title:refurbished` | |
+
+**Examples:**
+
+```sql
+-- Exact phrase match (stopwords like "of" are stripped automatically)
+SELECT * FROM products WHERE title = 'bank of america'
+-- Produces: @title:"bank america"
+
+-- Fuzzy search for typos (Levenshtein distance 2)
+SELECT * FROM products WHERE fuzzy(title, 'laptap', 2)
+
+-- OR search across terms
+SELECT * FROM products WHERE fulltext(title, 'laptop OR tablet OR phone')
+
+-- Proximity: terms within 3 words of each other, in order
+SELECT * FROM products WHERE fulltext(title, 'gaming laptop', 3, true)
+
+-- Suffix/contains pattern matching
+SELECT * FROM products WHERE title LIKE '%phone%'
+
+-- BM25 relevance scoring
+SELECT title, score() AS relevance FROM products WHERE fulltext(title, 'laptop')
+
+-- Multi-field search
+SELECT * FROM products WHERE fulltext(title, 'laptop') OR fulltext(description, 'laptop')
+```
+
+**Stopword handling:**
+
+Both `=` (exact phrase) and `fulltext()` (tokenized search) automatically strip [Redis default stopwords](https://redis.io/docs/latest/develop/ai/search-and-query/advanced-concepts/stopwords/) before sending queries to RediSearch. This is necessary because RediSearch does not index stopwords, so including them in queries causes syntax errors or failed matches. A `UserWarning` is emitted when stopwords are removed.
+
+For example, `WHERE title = 'bank of america'` produces `@title:"bank america"` because "of" is a default stopword and is never stored in the inverted index. The stripped phrase still matches correctly because the indexer assigns consecutive token positions after dropping stopwords.
+
+To include stopwords in your queries, create your index with `STOPWORDS 0`:
+
+```
+FT.CREATE myindex ON HASH PREFIX 1 doc: STOPWORDS 0 SCHEMA title TEXT
+```
+
+**Notes:**
+- `=` on TEXT fields performs **exact phrase** matching (double-quoted)
+- `fulltext()` performs **tokenized** AND search (parenthesized)
+- Both operators strip stopwords and emit a warning when they do
+- `fuzzy()` and `fulltext()` only work on TEXT fields; using them on TAG or NUMERIC raises `ValueError`
+- OR must be **uppercase**: `'laptop OR tablet'` triggers union; lowercase `'laptop or tablet'` is treated as a regular three-word AND search
+- Special characters (`@`, `|`, `-`, `*`, `+`, etc.) in search terms are automatically escaped
+
+### IS NULL / IS NOT NULL (ismissing)
+
+Check for missing (absent) fields using standard SQL `IS NULL` / `IS NOT NULL` syntax. Requires **Redis 7.4+** (RediSearch 2.10+) with `INDEXMISSING` declared on the field.
+
+| SQL | RediSearch Output |
+|-----|-------------------|
+| `WHERE email IS NULL` | `ismissing(@email)` |
+| `WHERE email IS NOT NULL` | `-ismissing(@email)` |
+
+```sql
+-- Find users without an email
+SELECT * FROM users WHERE email IS NULL
+
+-- Find users with an email
+SELECT * FROM users WHERE email IS NOT NULL
+
+-- Combine with other filters
+SELECT * FROM users WHERE category = 'eng' AND email IS NULL
+```
+
+**Note:** The field must be declared with `INDEXMISSING` in the index schema. A warning is emitted at translation time as a reminder.
+
+### exists() — Field Presence Check
+
+Check whether a field has a value using `exists()` in SELECT or HAVING. This uses `FT.AGGREGATE` with `APPLY exists(@field)`.
+
+```sql
+-- Check if fields exist (returns 1 or 0)
+SELECT name, exists(email) AS has_email FROM users
+
+-- Filter to only rows where a field exists
+SELECT name FROM users HAVING exists(email) = 1
+
+-- Combine with other computed fields
+SELECT name, exists(email) AS has_email, exists(phone) AS has_phone FROM users
+```
+
+**Note:** `exists()` is different from `IS NOT NULL` — it works via `FT.AGGREGATE APPLY` and doesn't require `INDEXMISSING` on the field, but returns `1`/`0` rather than filtering rows directly.
+
 ### DATE/DATETIME Handling
 
 Redis does not have a native DATE field type. Dates are stored as **NUMERIC fields** with Unix timestamps.

diff --git a/sql_redis/executor.py b/sql_redis/executor.py
@@ -103,7 +103,50 @@ class QueryResult:
     count: int
 
 
-class Executor:
+class _ScoreParseMixin:
+    """Shared helpers for score-related response parsing."""
+
+    @staticmethod
+    def _has_return_0(args: list[str]) -> bool:
+        """Return True when the args contain 'RETURN 0' (no document fields)."""
+        try:
+            idx = args.index("RETURN")
+            return args[idx + 1] == "0"
+        except (ValueError, IndexError):
+            return False
+
+    @staticmethod
+    def _resolve_score_alias(
+        score_alias: str | None,
+        args: list[str],
+        first_row_fields: set[str] | None = None,
+    ) -> str:
+        """Determine a stable score column name that won't collide with
+        document fields.  The alias is resolved once and reused for every
+        row so all rows share the same column name.
+
+        When a RETURN clause is present, the returned field names are used
+        for collision detection.  When RETURN is absent (SELECT *), the
+        caller should pass ``first_row_fields`` — the union of all field
+        names across all result rows — so we can detect collisions even
+        when different documents have different field sets."""
+        alias = score_alias or "__score"
+        # Extract RETURN field names from args to detect collision
+        try:
+            idx = args.index("RETURN")
+            count = int(args[idx + 1])
+            return_fields = set(args[idx + 2 : idx + 2 + count])
+        except (ValueError, IndexError):
+            # Normalize bytes keys to str so collision detection works
+            # regardless of decode_responses setting.
+            raw = first_row_fields or set()
+            return_fields = {k.decode() if isinstance(k, bytes) else k for k in raw}
+        while alias in return_fields:
+            alias = f"__score_{alias}"
+        return alias
+
+
+class Executor(_ScoreParseMixin):
     """Executes SQL queries against Redis."""
 
     def __init__(self, client: redis.Redis, schema_registry: SchemaRegistry) -> None:
@@ -166,12 +209,55 @@ def execute(self, sql: str, *, params: dict | None = None) -> QueryResult:
         rows = []
 
         if translated.command == "FT.SEARCH":
-            # FT.SEARCH format: [count, key1, [fields1], key2, [fields2], ...]
-            # Skip document keys (odd indices), take field lists (even indices after count)
-            for i in range(2, len(raw_result), 2):
-                row_data = raw_result[i]
-                row = dict(zip(row_data[::2], row_data[1::2]))
-                rows.append(row)
+            # Use the explicit score_alias signal rather than scanning args
+            # for the literal token "WITHSCORES", which could false-positive
+            # if a returned field happened to be named "WITHSCORES".
+            with_scores = translated.score_alias is not None
+            # RETURN 0 suppresses document fields (like NOCONTENT);
+            # with WITHSCORES the reply is [count, id, score, id, score, ...]
+            no_content = self._has_return_0(translated.args)
+
+            # Pre-resolve score alias; may be deferred for SELECT *
+            score_alias: str | None = None
+
+            if with_scores and no_content:
+                # WITHSCORES + RETURN 0: [count, id1, score1, id2, score2, ...]
+                # Stride of 2: key, score (no field array)
+                score_alias = self._resolve_score_alias(
+                    translated.score_alias, translated.args
+                )
+                for i in range(1, len(raw_result) - 1, 2):
+                    score = raw_result[i + 1]
+                    row = {score_alias: score}
+                    rows.append(row)
+            elif with_scores:
+                # WITHSCORES format: [count, key1, score1, [fields1], key2, score2, [fields2], ...]
+                # Stride of 3: key, score, field_list
+                # First pass: collect all field names across all rows so the
+                # alias avoids collisions with any document field, not just
+                # the first row's fields.
+                all_field_names: set[str] = set()
+                parsed_rows: list[tuple[dict, Any]] = []
+                for i in range(1, len(raw_result) - 2, 3):
+                    score = raw_result[i + 1]
+                    row_data = raw_result[i + 2]
+                    row = dict(zip(row_data[::2], row_data[1::2]))
+                    all_field_names.update(row.keys())
+                    parsed_rows.append((row, score))
+                resolved_alias = self._resolve_score_alias(
+                    translated.score_alias,
+                    translated.args,
+                    first_row_fields=all_field_names,
+                )
+                for row, score in parsed_rows:
+                    row[resolved_alias] = score
+                    rows.append(row)
+            else:
+                # Standard format: [count, key1, [fields1], key2, [fields2], ...]
+                for i in range(2, len(raw_result), 2):
+                    row_data = raw_result[i]
+                    row = dict(zip(row_data[::2], row_data[1::2]))
+                    rows.append(row)
         else:
             # FT.AGGREGATE format: [count, [fields1], [fields2], ...]
             for row_data in raw_result[1:]:
@@ -181,7 +267,7 @@ def execute(self, sql: str, *, params: dict | None = None) -> QueryResult:
         return QueryResult(rows=rows, count=count)
 
 
-class AsyncExecutor:
+class AsyncExecutor(_ScoreParseMixin):
     """Async version of Executor for use with redis.asyncio clients."""
 
     def __init__(
@@ -258,11 +344,46 @@ async def execute(self, sql: str, *, params: dict | None = None) -> QueryResult:
         rows = []
 
         if translated.command == "FT.SEARCH":
-            # FT.SEARCH format: [count, key1, [fields1], key2, [fields2], ...]
-            for i in range(2, len(raw_result), 2):
-                row_data = raw_result[i]
-                row = dict(zip(row_data[::2], row_data[1::2]))
-                rows.append(row)
+            with_scores = translated.score_alias is not None
+            no_content = self._has_return_0(translated.args)
+
+            score_alias: str | None = None
+
+            if with_scores and no_content:
+                # WITHSCORES + RETURN 0: [count, id1, score1, id2, score2, ...]
+                score_alias = self._resolve_score_alias(
+                    translated.score_alias, translated.args
+                )
+                for i in range(1, len(raw_result) - 1, 2):
+                    score = raw_result[i + 1]
+                    row = {score_alias: score}
+                    rows.append(row)
+            elif with_scores:
+                # WITHSCORES format: [count, key1, score1, [fields1], ...]
+                # First pass: collect all field names across all rows so the
+                # alias avoids collisions with any document field.
+                all_field_names: set[str] = set()
+                parsed_rows: list[tuple[dict, Any]] = []
+                for i in range(1, len(raw_result) - 2, 3):
+                    score = raw_result[i + 1]
+                    row_data = raw_result[i + 2]
+                    row = dict(zip(row_data[::2], row_data[1::2]))
+                    all_field_names.update(row.keys())
+                    parsed_rows.append((row, score))
+                resolved_alias = self._resolve_score_alias(
+                    translated.score_alias,
+                    translated.args,
+                    first_row_fields=all_field_names,
+                )
+                for row, score in parsed_rows:
+                    row[resolved_alias] = score
+                    rows.append(row)
+            else:
+                # Standard format: [count, key1, [fields1], key2, [fields2], ...]
+                for i in range(2, len(raw_result), 2):
+                    row_data = raw_result[i]
+                    row = dict(zip(row_data[::2], row_data[1::2]))
+                    rows.append(row)
         else:
             # FT.AGGREGATE format: [count, [fields1], [fields2], ...]
             for row_data in raw_result[1:]: