Skip to content

parthsatra/web.crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

WebCrawler

A basic prototype of Web Search Engine where one can search for all the articles related to a particular topic on Wikipedia. An attempt to build an end-to-end system.

##What lies beneath?

To build such a system we have used the following technologies:

  1. Nodejs

  2. Kafka

  3. Storm / Trident

  4. ElasticSeach

  5. Distributed Remote Procedure Calls (DRPC)

  6. Probabilistic Data Structures - CountMin-Sketch and BloomFilters

About

Web Crawler for wikipedia with basic page ranking capabilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages