Cloud Computing Search Engine - MiniGoogle

A Google style web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank, and UI. Click Here to View Demo. Spring 2013

Skills

Language: Java
Web: HTML, CSS, Servlet, JSP, jQuery, AJAX
Cloud: Hadoop, MapReduce, Amazon EC2, Amazon EMR, FreePastry
Database: Amazon S3, Berkeley DB

Contribution

Developed a scalable, Google-style crawler that distributed requests across multiple crawling peers over Pastry nodes.
Developed a TF-IDF indexer for inverted index computation and a PageRank engine for link analysis based on MapReduce.
Improved search relevancy by weighting ten ranking parameters, utilizing AJAX feedback and SVM classifier for tuning.
Implemented features for fault tolerance with Berkeley DB revert, RESTful web services with Yahoo, Amazon, YouTube, Yelp, Wiki, MaxMind, EBay API.

About

Course: CIS 555, Internet & Web Systems, Spring 2013, University of Pennsylvania
Teamwork: Yayang Tian, Michael Collis, Angela Wu, Krishna Choksi

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Snapshots		Snapshots
WebContent		WebContent
doc		doc
lib		lib
src		src
README		README
README.md		README.md
build.xml		build.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snapshots

Snapshots

WebContent

WebContent

doc

doc

lib

lib

src

src

README

README

README.md

README.md

build.xml

build.xml

Repository files navigation

Cloud Computing Search Engine - MiniGoogle

Skills

Contribution

About

Snapshots

About

Releases

Packages

qicst23/cloud-computing-search-engine

Folders and files

Latest commit

History

Repository files navigation

Cloud Computing Search Engine - MiniGoogle

Skills

Contribution

About

Snapshots

About

Resources

Stars

Watchers

Forks