ClueWeb12 B13 Comparisons

What follows is an initial comparison of selected information retrieval systems on the ClueWeb12 B13 collection using scripts provided by authors/leading contributors of those systems. The systems are listed in alphabetical order.


Two metrics for indexing are reported below: the size of the generated index, and the time taken to generate that index.

System Type Size Time Terms Postings Tokens
ATIRE Count 42.4 GB 3h 03m
ATIRE Count + Quantized 53.4 GB 4h 25m
JASS 83.2 GB ATIRE Quantized + 32m
MG4J Count 17 GB 2h 38m 133M 12.7G
MG4J Position 58 GB 3h 20m 133M 12.7G 33.8G
  • The quantized index pre-calculates the BM25 scores at indexing time and stores these instead of term frequencies, more about the quantization in ATIRE can be found in Crane et al. (2013).
    • The quantization is performed single threaded although easily parallelized.
  • Both indexes were not stemmed.
  • Both indexes were pruned of SGML tags, used for typically unused search time features.
  • Both indexes postings lists are stored impact ordered, with docids being compressed using variable-byte compression after being delta encoded.


Both retrieval efficiency (by query latency) and effectiveness (MAP@1000) were measured on two query sets: 201-250, and 251-300.

Retrieval Models

  • ATIRE uses a modified version of BM25, described here.
  • Searching was done using top-k search also described in the above paper.
    • This is not early termination, all documents for all terms in the query still get scored.
  • BM25 parameters were set to the default for ATIRE, k1=0.9 b=0.4.
  • Only stopping of tags was performed, this has no effect on search.

See the description for the Gov2 runs.

Retrieval Latency

The table below shows the average search time across queries by query set. The search times were taken from the internal reporting of each systems.

System Model Index Topics 201-250 Topics 251-300
ATIRE BM25 Count 809ms 788ms
ATIRE Quantized BM25 Count + Quantized 290ms 296ms
JASS 222ms 261ms
JASS 5M Postings 103ms 88ms
MG4J BM25 Count 706ms 570ms
MG4J Model B Count 60ms 73ms
MG4J Model B+ Position 122ms 258ms

Retrieval Effectiveness

The systems generated run files to be consumed by the trec_eval tool. Each system was evaluated on the top 1000 results for each query, and the table below shows the MAP scores for the systems.

System Model Index Topics 201-250 Topics 251-300
ATIRE BM25 Count 0.0439 0.0196
ATIRE Quantized BM25 Count + Quantized 0.0429 0.0201
JASS 0.0429 0.0201
JASS 5M Postings 0.0393 0.0193
MG4J BM25 Count 0.0410 0.0207
MG4J Model B Count 0.0418 0.0206
MG4J Model B+ Position 0.0402 0.0166
Statistical Analysis

TODO: Need to run statistical analyses.