Skip to content

Commit

Permalink
Speed up decompression by not needing a lookup table for literal items.
Browse files Browse the repository at this point in the history
Looking up into and decoding the values from char_table has long shown up as a
hotspot in the decompressor. While it turns out that it's hard to make a more
efficient decoder for the copy ops, the literals are simple enough that we can
decode them without needing a table lookup. (This means that 1/4 of the table
is now unused, although that in itself doesn't buy us anything.)

The gains are small, but definitely present; some tests win as much as 10%,
but 1-4% is more typical. These results are from Core i7, in 64-bit mode;
Core 2 and Opteron show similar results. (I've run with more iterations
than unusual to make sure the smaller gains don't drown entirely in noise.)

  Benchmark            Time(ns)    CPU(ns) Iterations
  ---------------------------------------------------
  BM_UFlat/0              74665      74428     182055 1.3GB/s  html      [ +3.1%]
  BM_UFlat/1             714106     711997      19663 940.4MB/s  urls    [ +4.4%]
  BM_UFlat/2               9820       9789    1427115 12.1GB/s  jpg      [ -1.2%]
  BM_UFlat/3              30461      30380     465116 2.9GB/s  pdf       [ +0.8%]
  BM_UFlat/4             301445     300568      46512 1.3GB/s  html4     [ +2.2%]
  BM_UFlat/5              29338      29263     479452 801.8MB/s  cp      [ +1.6%]
  BM_UFlat/6              13004      12970    1000000 819.9MB/s  c       [ +2.1%]
  BM_UFlat/7               4180       4168    3349282 851.4MB/s  lsp     [ +1.3%]
  BM_UFlat/8            1026149    1024000      10000 959.0MB/s  xls     [+10.7%]
  BM_UFlat/9             237441     236830      59072 612.4MB/s  txt1    [ +0.3%]
  BM_UFlat/10            203966     203298      69307 587.2MB/s  txt2    [ +0.8%]
  BM_UFlat/11            627230     625000      22400 651.2MB/s  txt3    [ +0.7%]
  BM_UFlat/12            836188     833979      16787 551.0MB/s  txt4    [ +1.3%]
  BM_UFlat/13            351904     350750      39886 1.4GB/s  bin       [ +3.8%]
  BM_UFlat/14             45685      45562     308370 800.4MB/s  sum     [ +5.9%]
  BM_UFlat/15              5286       5270    2656546 764.9MB/s  man     [ +1.5%]
  BM_UFlat/16             78774      78544     178117 1.4GB/s  pb        [ +4.3%]
  BM_UFlat/17            242270     241345      58091 728.3MB/s  gaviota [ +1.2%]
  BM_UValidate/0          42149      42000     333333 2.3GB/s  html      [ -3.0%]
  BM_UValidate/1         432741     431303      32483 1.5GB/s  urls      [ +7.8%]
  BM_UValidate/2            198        197   71428571 600.7GB/s  jpg     [+16.8%]
  BM_UValidate/3          14560      14521     965517 6.1GB/s  pdf       [ -4.1%]
  BM_UValidate/4         169065     168671      83832 2.3GB/s  html4     [ -2.9%]

R=jeff

Revision created by MOE tool push_codebase.


git-svn-id: https://snappy.googlecode.com/svn/trunk@41 03e5f5b5-db94-4691-08a0-1a8bf15f6143
  • Loading branch information
snappy.mirrorbot@gmail.com committed Jun 3, 2011
1 parent 8efa263 commit 197f3ee
Showing 1 changed file with 15 additions and 5 deletions.
20 changes: 15 additions & 5 deletions snappy.cc
Expand Up @@ -663,13 +663,18 @@ class SnappyDecompressor {
}

const unsigned char c = *(reinterpret_cast<const unsigned char*>(ip++));
const uint32 entry = char_table[c];
const uint32 trailer = LittleEndian::Load32(ip) & wordmask[entry >> 11];
ip += entry >> 11;
const uint32 length = entry & 0xff;

if ((c & 0x3) == LITERAL) {
uint32 literal_length = length + trailer;
uint32 literal_length = c >> 2;
if (PREDICT_FALSE(literal_length >= 60)) {
// Long literal.
const uint32 literal_length_length = literal_length - 59;
literal_length =
LittleEndian::Load32(ip) & wordmask[literal_length_length];
ip += literal_length_length;
}
++literal_length;

uint32 avail = ip_limit_ - ip;
while (avail < literal_length) {
bool allow_fast_path = (avail >= 16);
Expand All @@ -689,6 +694,11 @@ class SnappyDecompressor {
}
ip += literal_length;
} else {
const uint32 entry = char_table[c];
const uint32 trailer = LittleEndian::Load32(ip) & wordmask[entry >> 11];
const uint32 length = entry & 0xff;
ip += entry >> 11;

// copy_offset/256 is encoded in bits 8..10. By just fetching
// those bits, we get copy_offset (since the bit-field starts at
// bit 8).
Expand Down

0 comments on commit 197f3ee

Please sign in to comment.