C++ associative containers
C++ C Makefile Shell M4
Latest commit 4cb9240 Jul 26, 2016 @greg7mdp greg7mdp committed with donovanhide Correct the memory usage claims to take into account allocator overhe…
…ad (#132)

The default memory allocator used (libc_allocator_with_realloc)
necessarily has some overhead, as the size of the block is not passed to
free(). The memory usage claims are updated to take into account an
overhead of up to 16 bytes per malloc'ed block.
Permalink
Failed to load latest commit information.
doc Correct the memory usage claims to take into account allocator overhe… Jul 26, 2016
experimental Tue Jan 31 11:33:04 2012 Google Inc. <google-sparsehash@googlegroups.… Jan 31, 2012
m4 Some more obsolete files: the rest of hashtable_unittest.cc (no longer Jun 24, 2011
packages Thu Feb 23 23:47:18 2012 Google Inc. <google-sparsehash@googlegroups.… Feb 23, 2012
src Update test for large objects with a more reasonable hash function. Jul 23, 2016
vsprojects Tue Jan 31 11:33:04 2012 Google Inc. <google-sparsehash@googlegroups.… Jan 31, 2012
.gitignore Add gitignore Aug 17, 2015
AUTHORS Tue Jan 31 11:33:04 2012 Google Inc. <google-sparsehash@googlegroups.… Jan 31, 2012
COPYING Fri Jan 14 16:53:32 2005 El Goog <opensource@google.com> Mar 22, 2007
ChangeLog Update ChangeLog Oct 12, 2015
INSTALL Updates for current autotools Feb 22, 2012
Makefile.am Fix for issue #81 Feb 22, 2012
Makefile.in Updates for current autotools Feb 22, 2012
NEWS Update NEWS Oct 12, 2015
README Correct the memory usage claims to take into account allocator overhe… Jul 26, 2016
README_windows.txt Tue Jan 31 11:33:04 2012 Google Inc. <google-sparsehash@googlegroups.… Jan 31, 2012
TODO Tue Mar 20 17:29:34 2007 Google Inc. <opensource@google.com> Mar 22, 2007
aclocal.m4 Updates for current autotools Feb 22, 2012
autogen.sh Update the versions of automake and autoconf to Jun 29, 2011
config.guess Updates for current autotools Feb 22, 2012
config.sub Updates for current autotools Feb 22, 2012
configure Updates for current autotools Feb 22, 2012
configure.ac Updates for current autotools Feb 22, 2012
depcomp Updates for current autotools Feb 22, 2012
google-sparsehash.sln Add template_util_unittest to the msvc solution. Dec 15, 2011
install-sh Updates for current autotools Feb 22, 2012
missing Updates for current autotools Feb 22, 2012
sparsehash.sln Tue Jan 31 11:33:04 2012 Google Inc. <google-sparsehash@googlegroups.… Jan 31, 2012

README

This directory contains several hash-map implementations, similar in
API to SGI's hash_map class, but with different performance
characteristics.  sparse_hash_map uses very little space overhead, 1-2
bits per entry.  dense_hash_map is very fast, particulary on lookup.
(sparse_hash_set and dense_hash_set are the set versions of these
routines.)  On the other hand, these classes have requirements that
may not make them appropriate for all applications.

All these implementation use a hashtable with internal quadratic
probing.  This method is space-efficient -- there is no pointer
overhead -- and time-efficient for good hash functions.

COMPILING
---------
To compile test applications with these classes, run ./configure
followed by make.  To install these header files on your system, run
'make install'.  (On Windows, the instructions are different; see
README_windows.txt.)  See INSTALL for more details.

This code should work on any modern C++ system.  It has been tested on
Linux (Ubuntu, Fedora, RedHat, Debian), Solaris 10 x86, FreeBSD 6.0,
OS X 10.3 and 10.4, and Windows under both VC++7 and VC++8.

USING
-----
See the html files in the doc directory for small example programs
that use these classes.  It's enough to just include the header file:

   #include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
   google::sparse_hash_set<int, int> number_mapper;

and use the class the way you would other hash-map implementations.
(Though see "API" below for caveats.)

By default (you can change it via a flag to ./configure), these hash
implementations are defined in the google namespace.

API
---
The API for sparse_hash_map, dense_hash_map, sparse_hash_set, and
dense_hash_set, are a superset of the API of SGI's hash_map class.
See doc/sparse_hash_map.html, et al., for more information about the
API.

The usage of these classes differ from SGI's hash_map, and other
hashtable implementations, in the following major ways:

1) dense_hash_map requires you to set aside one key value as the
   'empty bucket' value, set via the set_empty_key() method.  This
   *MUST* be called before you can use the dense_hash_map.  It is
   illegal to insert any elements into a dense_hash_map whose key is
   equal to the empty-key.

2) For both dense_hash_map and sparse_hash_map, if you wish to delete
   elements from the hashtable, you must set aside a key value as the
   'deleted bucket' value, set via the set_deleted_key() method.  If
   your hash-map is insert-only, there is no need to call this
   method.  If you call set_deleted_key(), it is illegal to insert any
   elements into a dense_hash_map or sparse_hash_map whose key is
   equal to the deleted-key.

3) These hash-map implementation support I/O.  See below.

There are also some smaller differences:

1) The constructor takes an optional argument that specifies the
   number of elements you expect to insert into the hashtable.  This
   differs from SGI's hash_map implementation, which takes an optional
   number of buckets.

2) erase() does not immediately reclaim memory.  As a consequence,
   erase() does not invalidate any iterators, making loops like this
   correct:
      for (it = ht.begin(); it != ht.end(); ++it)
        if (...) ht.erase(it);
   As another consequence, a series of erase() calls can leave your
   hashtable using more memory than it needs to.  The hashtable will
   automatically compact at the next call to insert(), but to
   manually compact a hashtable, you can call
      ht.resize(0)

I/O
---
In addition to the normal hash-map operations, sparse_hash_map can
read and write hashtables to disk.  (dense_hash_map also has the API,
but it has not yet been implemented, and writes will always fail.)

In the simplest case, writing a hashtable is as easy as calling two
methods on the hashtable:
   ht.write_metadata(fp);
   ht.write_nopointer_data(fp);

Reading in this data is equally simple:
   google::sparse_hash_map<...> ht;
   ht.read_metadata(fp);
   ht.read_nopointer_data(fp);

The above is sufficient if the key and value do not contain any
pointers: they are basic C types or agglomorations of basic C types.
If the key and/or value do contain pointers, you can still store the
hashtable by replacing write_nopointer_data() with a custom writing
routine.  See sparse_hash_map.html et al. for more information.

SPARSETABLE
-----------
In addition to the hash-map and hash-set classes, this package also
provides sparsetable.h, an array implementation that uses space
proportional to the number of elements in the array, rather than the
maximum element index.  It uses very little space overhead: 2 to 5
bits per entry.  See doc/sparsetable.html for the API.

RESOURCE USAGE
--------------
* sparse_hash_map has memory overhead of about 4 to 10 bits per 
  hash-map entry, assuming a typical average occupancy of 50%.
* dense_hash_map has a factor of 2-3 memory overhead: if your
  hashtable data takes X bytes, dense_hash_map will use 3X-4X memory
  total.

Hashtables tend to double in size when resizing, creating an
additional 50% space overhead.  dense_hash_map does in fact have a
significant "high water mark" memory use requirement, which is 6 times
the size of hash entries in the table when resizing (when reaching 
50% occupancy, the table resizes to double the previous size, and the 
old table (2x) is copied to the new table (4x)).

sparse_hash_map, however, is written to need very little space
overhead when resizing: only a few bits per hashtable entry.

PERFORMANCE
-----------
You can compile and run the included file time_hash_map.cc to examine
the performance of sparse_hash_map, dense_hash_map, and your native
hash_map implementation on your system.  One test against the
SGI hash_map implementation gave the following timing information for
a simple find() call:
   SGI hash_map:     22 ns
   dense_hash_map:   13 ns
   sparse_hash_map: 117 ns
   SGI map:         113 ns

See doc/performance.html for more detailed charts on resource usage
and performance data.

---
16 March 2005
(Last updated: 12 September 2010)