Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Latest commit

 

History

History
executable file
·
20 lines (12 loc) · 1.59 KB

bm25.md

File metadata and controls

executable file
·
20 lines (12 loc) · 1.59 KB

Definition

BM25 is a search engine's classic sorting function that measures the relevance of a set of keywords to a document. BM25 is defined as

                IDF * TF * (k1 + 1)
BM25 = sum ----------------------------
           TF + k1 * (1 - b + b * D / L)

Sum sum of all keywords, TF (term frequency) for a keyword in the document appear in the frequency, D is the number of words in the document, L is the average number of words in all documents, k1 and b are constants, in the The riot defaults to 2.0 and 0.75, but it can be at engine initialization in the EngineOpts.IndexerOpts.BM25Parameters amendment. IDF (inverse document frequency) measure keywords are common, riot engine using a smooth IDF formula

            The total number of documents
IDF = log2( ------------------------  + 1 )
        The number of documents that appear for this keyword

Use

Indexer is responsible for calculating the BM25, in order to be able to calculate the BM25 value of the document, you must save the word frequency of all the keywords in the document, which needs [EngineOpts.IndexerOpts.IndexType] (/types/indexer_init_options.go) at engine initialization to at least FrequenciesIndex (LocsIndex also calculates BM25, but this index also holds where words appear and consumes more memory).

Then you can call IndexedDoc.BM25 in your [Custom Scoring Rules] (/docs/en/ custom_scoring_criteria.md) gets this value as the scoring data. If you want to rely entirely on the BM25 score, you can use the default rating rule, which is RankByBM25.