Skip to content

Simple Search Engine: Understand relevancy scoring in search engines based on statistical measurement called TF-IDF

Notifications You must be signed in to change notification settings

omgshalihin/searchEngineTFIDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

My Focus

To further understand relevancy scoring in search engines based on statistical measurement called TF-IDF.

Purpose

Sorting a query result is always a hard task to approach. Should you sort it by name, created date, last updated date, or some other factor? If you sort the query results in a product search by name, it’s likely that the first product to appear would not be what the customer was looking to buy.

When creating a Search Engine like product search in the example above, sorting the resulting documents is not always straightforward.

Sorting usually happens by calculating a relevancy or similarity score between the documents in the corpus and the user query. Relevancy score is the backbone of a Search Engine.

Understanding how to calculate relevancy score is the first step you must take to create a good Search Engine.

What is TF-IDF?

In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1] It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.

About

Simple Search Engine: Understand relevancy scoring in search engines based on statistical measurement called TF-IDF

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages