A Google style web search engine computing Hadoop MapReduce on Amazon EC2 consisting of crawler, indexer, PageRank, and UI.
Click Here to View Demo.
Spring 2013
Language: Java
Web: HTML, CSS, Servlet, JSP, jQuery, AJAX
Cloud: Hadoop, MapReduce, Amazon EC2, Amazon EMR, FreePastry
Database: Amazon S3, Berkeley DB
- Developed a scalable, Google-style crawler that distributed requests across multiple crawling peers over Pastry nodes.
- Developed a TF-IDF indexer for inverted index computation and a PageRank engine for link analysis based on MapReduce.
- Improved search relevancy by weighting ten ranking parameters, utilizing AJAX feedback and SVM classifier for tuning.
- Implemented features for fault tolerance with Berkeley DB revert, RESTful web services with Yahoo, Amazon, YouTube, Yelp, Wiki, MaxMind, EBay API.
- Course:
CIS 555, Internet & Web Systems, Spring 2013, University of Pennsylvania
- Teamwork:
Yayang Tian, Michael Collis, Angela Wu, Krishna Choksi