To further understand relevancy scoring in search engines based on statistical measurement called TF-IDF.
Sorting a query result is always a hard task to approach. Should you sort it by name, created date, last updated date, or some other factor? If you sort the query results in a product search by name, it’s likely that the first product to appear would not be what the customer was looking to buy.
When creating a Search Engine like product search in the example above, sorting the resulting documents is not always straightforward.
Sorting usually happens by calculating a relevancy or similarity score between the documents in the corpus and the user query. Relevancy score is the backbone of a Search Engine.
Understanding how to calculate relevancy score is the first step you must take to create a good Search Engine.
In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1] It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.