From cf2a5e54a636e4702a743c1c1a82b639ab084730 Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Thu, 2 Oct 2025 07:23:22 -0700 Subject: [PATCH 1/4] DOC-5777: search: document new SCORERs --- .../advanced-concepts/scoring.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/content/develop/ai/search-and-query/advanced-concepts/scoring.md b/content/develop/ai/search-and-query/advanced-concepts/scoring.md index c09bb64e80..5f0b820358 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/scoring.md +++ b/content/develop/ai/search-and-query/advanced-concepts/scoring.md @@ -19,7 +19,7 @@ weight: 8 When searching, documents are scored based on their relevance to the query. The score is a floating point number between 0.0 and 1.0, where 1.0 is the highest score. The score is returned as part of the search results and can be used to sort the results. -Redis Open Source comes with a few very basic scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/sorting" >}}). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query. +Redis Open Source comes with a few scoring functions to evaluate document relevance. They are all based on document scores and term frequency. This is regardless of the ability to use [sortable fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/sorting" >}}). Scoring functions are specified by adding the `SCORER {scorer_name}` argument to a search query. If you prefer a custom scoring function, it is possible to add more functions using the [extension API]({{< relref "/develop/ai/search-and-query/administration/extensions" >}}). @@ -78,16 +78,28 @@ Term frequencies are normalized by the length of the document, expressed as the FT.SEARCH myIndex "foo" SCORER TFIDF.DOCNORM ``` -## BM25 (default) +## BM25STD (default) A variation on the basic `TFIDF` scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25). The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`. +{{ note }} +The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated. +{{ /note }} + ``` -FT.SEARCH myIndex "foo" SCORER BM25 +FT.SEARCH myIndex "foo" SCORER BM25STD ``` +## BM25STD.NORM + +A variation of `BM25STD`, where the scores are normalized by the minimum and maximum score. + +## BM25STD.TANH + +A variation of `BM25STD.NORM`, where the scores are normalised by linear function `tanh(x)`. `BMSTDSTD.TANH` can take an optional argument, `BM25STD_TANH_FACTOR Y`, which is used to smooth the function and the score values. The default value for `Y` is 4. + ## DISMAX A simple scorer that sums up the frequencies of matched terms. In the case of union clauses, it will give the maximum value of those matches. No other penalties or factors are applied. From ecb2d1d9bf6ae81940b51ad8888067149f6948ab Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Thu, 2 Oct 2025 07:32:13 -0700 Subject: [PATCH 2/4] Update the admin. overview page --- .../administration/overview.md | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/content/develop/ai/search-and-query/administration/overview.md b/content/develop/ai/search-and-query/administration/overview.md index d59e5ad0d6..c0db4b9cd7 100644 --- a/content/develop/ai/search-and-query/administration/overview.md +++ b/content/develop/ai/search-and-query/administration/overview.md @@ -239,9 +239,23 @@ These are the pre-bundled scoring functions available in Redis: * Identical to the default TFIDF scorer, with one important distinction: -* **BM25** +* **BM25STD (default)** - A variation on the basic TF-IDF scorer. See [this Wikipedia article for more information](https://en.wikipedia.org/wiki/Okapi_BM25). + A variation on the basic `TFIDF` scorer, see [this Wikipedia article for more info](https://en.wikipedia.org/wiki/Okapi_BM25). + + The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`. + + {{ note }} + The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated. + {{ /note }} + +* **BM25STD.NORM** + + A variation of `BM25STD`, where the scores are normalized by the minimum and maximum score. + +* **BM25STD.TANH** + + A variation of `BM25STD.NORM`, where the scores are normalised by linear function `tanh(x)`. `BMSTDSTD.TANH` can take an optional argument, `BM25STD_TANH_FACTOR Y`, which is used to smooth the function and the score values. The default value for `Y` is 4. * **DISMAX** From ca6770b78609b7f3fd810e51e63dbe37f98539d6 Mon Sep 17 00:00:00 2001 From: David Dougherty Date: Thu, 2 Oct 2025 08:19:00 -0700 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: andy-stark-redis <164213578+andy-stark-redis@users.noreply.github.com> --- .../develop/ai/search-and-query/administration/overview.md | 4 ++-- .../develop/ai/search-and-query/advanced-concepts/scoring.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/content/develop/ai/search-and-query/administration/overview.md b/content/develop/ai/search-and-query/administration/overview.md index c0db4b9cd7..c9ace1efb2 100644 --- a/content/develop/ai/search-and-query/administration/overview.md +++ b/content/develop/ai/search-and-query/administration/overview.md @@ -245,9 +245,9 @@ These are the pre-bundled scoring functions available in Redis: The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`. - {{ note }} + {{< note >}} The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated. - {{ /note }} + {{< /note >}} * **BM25STD.NORM** diff --git a/content/develop/ai/search-and-query/advanced-concepts/scoring.md b/content/develop/ai/search-and-query/advanced-concepts/scoring.md index 5f0b820358..1e9edc33fb 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/scoring.md +++ b/content/develop/ai/search-and-query/advanced-concepts/scoring.md @@ -84,9 +84,9 @@ A variation on the basic `TFIDF` scorer, see [this Wikipedia article for more in The relevance score for each document is multiplied by the presumptive document score and a penalty is applied based on slop as in `TFIDF`. -{{ note }} +{{< note >}} The `BM25` scorer was renamed `BM25STD` in Redis Open Source 8.4. `BM25` is deprecated. -{{ /note }} +{{< /note >}} ``` FT.SEARCH myIndex "foo" SCORER BM25STD From 80810bf3dfbe5e31538592da25a616e954b87f04 Mon Sep 17 00:00:00 2001 From: "David W. Dougherty" Date: Fri, 3 Oct 2025 07:38:01 -0700 Subject: [PATCH 4/4] Apply suggestions from code review --- .../ai/search-and-query/advanced-concepts/scoring.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/content/develop/ai/search-and-query/advanced-concepts/scoring.md b/content/develop/ai/search-and-query/advanced-concepts/scoring.md index 5f0b820358..c7dc124c8a 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/scoring.md +++ b/content/develop/ai/search-and-query/advanced-concepts/scoring.md @@ -94,12 +94,22 @@ FT.SEARCH myIndex "foo" SCORER BM25STD ## BM25STD.NORM -A variation of `BM25STD`, where the scores are normalized by the minimum and maximum score. +A variation of `BM25STD`, where the scores are normalized by the minimum and maximum scores. + +`BM25STD.NORM` uses min–max normalization across the collection, making it more accurate in distinguishing documents when term frequency distributions vary significantly. Because it depends on global statistics, results adapt better to collection-specific characteristics, but this comes at a performance cost: min and max values must be computed and updated whenever the collection changes. This method is recommended when ranking precision is critical and the dataset is relatively stable. ## BM25STD.TANH A variation of `BM25STD.NORM`, where the scores are normalised by linear function `tanh(x)`. `BMSTDSTD.TANH` can take an optional argument, `BM25STD_TANH_FACTOR Y`, which is used to smooth the function and the score values. The default value for `Y` is 4. +`BM25STD.TANH` applies a smooth transformation using the `tanh(x/factor)` function, which avoids collection-dependent statistics and yields faster, more efficient scoring. While this makes it more scalable and consistent across different datasets, the trade-off is reduced accuracy in cases where min–max normalization provides sharper separation. This method is recommended when performance and throughput are prioritized over fine-grained ranking sensitivity. + +Following is an example of how to use `BM25STD_TANH_FACTOR Y` in a query. + +``` +FT.SEARCH idx "term" SCORER BM25STD.TANH BM25STD_TANH_FACTOR 12 WITHSCORES +``` + ## DISMAX A simple scorer that sums up the frequencies of matched terms. In the case of union clauses, it will give the maximum value of those matches. No other penalties or factors are applied.