Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
194 lines (120 sloc) 5.08 KB
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Ivory: A Hadoop toolkit for web-scale information retrieval research</title>
<style type="text/css" media="screen">@import url( docs/style.css );</style>
</head>
<body>
<div id="wrap">
<div id="container" class="one-column" >
<!-- header START -->
<div id="header">
<div id="caption">
<h1 id="title" style="color: white;">Ivory</h1>
<div id="tagline">A Hadoop toolkit for web-scale information retrieval research</div>
</div>
<div class="fixed"></div>
</div>
<!-- header END -->
<!-- navigation START -->
<div id="navigation">
<ul id="menus">
<li class="page_item"><a class="home" title="Home" href="./index.html">Home</a></li>
<li class="page_item"><a href="docs/api/index.html" title="API">API</a></li>
<li class="page_item"><a href="docs/publications.html" title="Publications">Publications</a></li>
<li class="page_item"><a href="docs/regression.html" title="Experiments">Experiments</a></li>
<li class="page_item"><a href="docs/team.html" title="Team">Team</a></li>
</ul>
<div class="fixed"></div>
</div>
<!-- navigation END -->
<!-- content START -->
<div id="content">
<!-- main START -->
<div id="main">
<!--- START MAIN CONTENT HERE -->
<div class="post">
<div class="content">
<p>Ivory is a Hadoop toolkit for web-scale information retrieval
research.</p>
<p>In order to temper expectations: Ivory is a research system, not a
full-featured search engine! It's aimed at information retrieval
researchers who generally know their way around retrieval algorithms,
postings lists, etc. If you want to, for example, play with the latest
research coming out of SIGIR and related venues, then Ivory is for
you. On the other hand, if you just want search capabilities as a
"black
box", <a href="http://lucene.apache.org/java/docs/index.html">Lucene</a>
is a likely a better
choice. <a href="http://katta.sourceforge.net/">Katta</a> is a
framework for serving distributed Lucene indexes that plays well with
Hadoop clusters.</p>
<p>Ivory was specifically designed to work with Hadoop "out of the
box" on
the <a href="http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?page=ClueWeb09%20Wiki">ClueWeb09
collection</a>, a 1 billion page (25 TB) Web crawl distributed by
Carnegie Mellon University. The initial release of Ivory is meant to
serve as a reference implementation of indexing and retrieval
algorithms that can operate at the multi-terabyte scale. Another
interesting experimental aspect of Ivory is the retrieval
architecture: we've been playing with retrieval engines that directly
read postings from HDFS.</p>
</div></div>
<div class="post">
<h2>Download</h2>
<div class="content">
<p>Ivory is available on <a href="http://github.com/lintool/Ivory">github</a>.</p>
</div></div>
<div class="post">
<h2>Documentation</h2>
<div class="content">
<ul>
<li><a href="docs/api/index.html">Ivory API javadoc</a></li>
<li>Getting started with <a href="docs/trec.html">TREC disks 4-5</a></li>
<li>Getting started with <a href="docs/wt10g.html">the Wt10g collection</a></li>
<li>Getting started with <a href="docs/gov2.html">the Gov2 collection</a></li>
<li>Getting started with <a href="docs/clue.html">the ClueWeb09 collection</a></li>
<li>Ivory <a href="docs/pipeline.html">preprocessing and indexing pipeline</a></li>
<li>Ivory <a href="docs/pwsim.html">pairwise document similarity computation</a></li>
<li><a href="docs/regression.html">Experimental results</a> with Ivory</li>
<li><a href="docs/team.html">Project team</a></li>
</ul>
</div></div>
<div class="post">
<div class="content">
<p style="line-height:90%"><small>This work is or has been supported
by the following sources: NSF under awards IIS-0836560 and
IIS-0705832; Google and IBM under the Academic Cloud Computing
Initiative (ACCI); the Intramural Research Program of the NIH,
National Library of Medicine; DARPA/IPTO Contract No. HR0011-06-2-0001
under the GALE program; and Amazon Web Services. Any opinions,
findings, conclusions, or recommendations expressed here do not
necessarily reflect those of the sponsors. </small></p>
</div></div>
<!--- END MAIN CONTENT HERE -->
</div>
<!-- main END -->
<div class="fixed"></div>
</div>
<!-- content END -->
<!-- footer START -->
<div id="footer">
<div id="copyright">
Last updated:
<script type="text/javascript">
<!--//
document.write(document.lastModified);
//-->
</script>
</div>
<div id="themeinfo">
Adapted from a WordPress Theme by <a href="http://www.neoease.com/">NeoEase</a>. Valid <a href="http://validator.w3.org/check?uri=referer">XHTML 1.1</a> and <a href="http://jigsaw.w3.org/css-validator/check/referer?profile=css3">CSS 3</a>. </div>
</div>
<!-- footer END -->
</div>
<!-- container END -->
</div>
<!-- wrap END -->
</body>
</html>