Skip to content

emiljoswin/search-engine

Repository files navigation

**********
README
**********

The project aimed at building an internet search engine. 
The intial step was to implement a crawler. The reason why I wrote a crawler myself was that

The crawler crawled on the depth limited DFS basis. The depth was provided by the user. Due to bandwidth limitations.
the crawling was limited to < 50 pages. The crawler produced the 'big matrix' which represented the structure of the
crawled pages and also stored the contents of the pages for indexing.

A library called Whoosh was used to index and perform search on the text. It used Okapi BM - 25 search function.
It is a simple text search library which accounted on the occurence of words and not their relative proximities.

The page rank of the pages were calculated from the 'big matrix' and the result of the search in the index was combined
with the page rank to determine the total rank of pages. The results were displayed in the non-increasing order to the user.

Emil

About

search engine comprising of simple custom made crawler.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published