A simple to use search engine for search data.
Project consist of different modules:
- crawler: crawls the pages and save in database
- search-engine: search API on stored data
- forward-extractor: run a mapreduce for calculate incomming link for a website to have better results
For run this project on your own local machine or server you should install Zookeeper and hbase and hadoop and elasticsearch and kafka.
For installing dependencies for this project, read wikis and configure it properly depend on your servers.
For running application, you can use .sh
files inside bin
folder.
Test all projects with below command:
mvn test
- Spark - Used to run mapreduces
- Kafka - Used to handling links queue
- ElasticSearch - Used to run search queries
- Redis - Used to check duplicated pages
- HBase - Used to store data
- DropWizard - Used to monitoring
- JSoup - Used to parse the pages
- Caffeine - Used to store requested urls to send request politely
- Jackson - Used to serializing objects
- Maven - Used to Dependency Management
- Amin Borjian - github
- Danial Erfanian - github
- Ehsan Karimi - github
- MohammadReza Pakzadian - github
See also the list of contributors who participated in this project.