My Focus

To further understand relevancy scoring in search engines based on statistical measurement called TF-IDF.

Purpose

Sorting a query result is always a hard task to approach. Should you sort it by name, created date, last updated date, or some other factor? If you sort the query results in a product search by name, it’s likely that the first product to appear would not be what the customer was looking to buy.

When creating a Search Engine like product search in the example above, sorting the resulting documents is not always straightforward.

Sorting usually happens by calculating a relevancy or similarity score between the documents in the corpus and the user query. Relevancy score is the backbone of a Search Engine.

Understanding how to calculate relevancy score is the first step you must take to create a good Search Engine.

What is TF-IDF?

In information retrieval, tf–idf (also TF*IDF, TFIDF, TF–IDF, or Tf–idf), short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1] It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My Focus

Purpose

What is TF-IDF?

About

Releases

Packages

Languages

omgshalihin/searchEngineTFIDF

Folders and files

Latest commit

History

Repository files navigation

My Focus

Purpose

What is TF-IDF?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages