Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


  • A search engine developed by Kelly Sadwin ( and Jen Westling ( for Ithaca College COMP 490: Search Engines and Recommender Systems
  • Uses Python 3.4
  • Contains a boolean search engine ( and a ranked retrieval search engine (, plus a program to evaluate the accuracy of the ranked retrieval engine
  • Uses code from CS490SearchAndRecommend ( for web scraping and indexing ( and with minor edits
  • Provided with base SQLite3 file by Professor Doug Turnbull (, but includes edits and additions
  • Uses module to return Google search results ( by Mario Vilas ( for the purpose of building a corpus of pages

How to use

  • Use of this program requires a folder of files provided in class (should be in the directory data/item/). From these three .txt files, the webcrawler, using,, and, will create a corpus.
  • After making the corpus, any of the search engines (,, are usable.
  • The Boolean Search Engine ( performs five query types: single token, AND, OR, NEAR, and phrasal. The last 4 query types only support 2-word queries.
  • The Ranked Retrieval Engine ( takes queries of any length and returns the top 5 documents and item types based on a ranking system of the user's choice (nnn or ltc) for either documents or queries.
  • The Evaluation Engine ( runs queries for each of the items listed in the initial text files and averages the performance across all items for each weighting system (nnn.nnn, nnn.ltc, ltc.nnn, ltc.ltc, plus randomly returned results) to evaluate the performance of the Ranked Retrieval Engine. Spoiler alert: it's pretty good!

Some quirks

  • The web crawler will make Google suspicious of your browsing habits. We faked a header for a regular browser and there are delays incorporated, but many people on our network had to enter a captcha the next time they used Google. As far as we know, no IP addresses were banned or anything drastic, but you have been warned.
  • We have included pickles of our own inverted indices for faster running times. If you build your own database, it will not get along with our pickles. We have shared my database but not the text files (it is over 2000 files) which renders our database useless to you. It was really useful to share with each other, though! Maybe delete them before you run any of the search engines. The constructor automatically builds you your own pickle.
  • The query "Pirates of the Caribbean: Dead Man's Chest" crashed multiple groups' search engines, even after stripping punctuation. We just skipped that one. We figured it wasn't that big of a deal.


COMP 490






No releases published


No packages published