Skip to content

Commit

Permalink
homepage update4
Browse files Browse the repository at this point in the history
  • Loading branch information
Hao Yan committed Nov 5, 2011
1 parent e4ffb57 commit 840c0bc
Showing 1 changed file with 7 additions and 10 deletions.
17 changes: 7 additions & 10 deletions kamikaze/index.php
Expand Up @@ -17,28 +17,25 @@
Search indexes, graph algorithms, and certain sparse matrix representations make heavy use of compressed integer arrays. Search indexes, graph algorithms, and certain sparse matrix representations make heavy use of compressed integer arrays.


<p> <p>
<b>Use in search engines</b>: The inverted index is used in search engines for efficient query processing. The index is a mapping from terms to lists of documents matching those terms. <b>Use in search engines</b>: The inverted index is used in search engines for efficient query processing. The index is a mapping from terms to lists of documents matching those terms. The basic steps of both indexing and query processing discussed above are shown in the following figure.
</p> </p>


<p> <p align="center">
During the indexing process, search engines convert the documents into inverted lists. An inverted list is for a particular term a sequence of document IDs (and other information which can also be considered as sequences of integers). Search engines often compress the inverted lists before they write them to the persistent storage - disks at a cluster of machines. <img src = "images/search.png" width="600px" />
</p> </p>


<p>
During query processing, given a query of K terms, the search engine often needs to do at least the following things: First, the engine loads inverted lists (related to those terms) from disks to memory. In a distributed environment, it might also involve a large amount of data transmission over network. Kamikaze can reduce the data size and thus the cost of disk and network traffic significantly. Second, the engine finds all documents on the compressed lists that contain most of the terms. This process often requires extremely fast decompression and look-up operations on compressed data, which can be done by Kamikaze in a very efficient way. Finally, the engine calculates the rankings for the matched documents and returns the documents with the highest rankings. Kamikaze has nothing to do with this last step.
</p> </p>


<p> <p>
The basic steps of both indexing and query processing discussed above are shown in the following figure. From the above figure, you can see that Kamikaze is mainly used for compressing inverted lists ( step2) and performing various operations on compressed indices to find matched documents (step6).

<p align="center">
<img src = "images/search.png" width="600px" />
</p> </p>


<p>
During the indexing process, search engines convert the documents into inverted lists. An inverted list is for a particular term a sequence of document IDs (and other information which can also be considered as sequences of integers). Search engines often compress the inverted lists before they write them to the persistent storage - disks at a cluster of machines.
</p> </p>


<p> <p>
From the above figure, you can see that Kamikaze is mainly used for compressing inverted lists ( step2) and performing various operations on compressed indices to find matched documents (step6). During query processing, given a query of K terms, the search engine often needs to do at least the following things: First, the engine loads inverted lists (related to those terms) from disks to memory. In a distributed environment, it might also involve a large amount of data transmission over network. Kamikaze can reduce the data size and thus the cost of disk and network traffic significantly. Second, the engine finds all documents on the compressed lists that contain most of the terms. This process often requires extremely fast decompression and look-up operations on compressed data, which can be done by Kamikaze in a very efficient way. Finally, the engine calculates the rankings for the matched documents and returns the documents with the highest rankings. Kamikaze has nothing to do with this last step.
</p> </p>


<p> <p>
Expand Down

0 comments on commit 840c0bc

Please sign in to comment.