Scalable Search and Web Crawling
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
docs
README.md

README.md

SHAN

## Scalable Search and Web Crawling

The objective of this work was to take the concepts of information retrieval to implement a scalable framework for the general task of indexing unstructured documents and retrieve them from the web. Our case study was to take Wikipedia data as crawlable and indexable target. After crawling and indexing, a GUI, deployed in the cloud, displays the results and allows the user to do personalised queries. Shan (山) is the chinese character for mountain. It can also be composed concatenating the first letter of the components: Solr Hadoop Apache Nutch.