Initial manual entries for the array cache and how to configure it

eldersantos · Mar 22, 2012 · c831577 · c831577
1 parent 7c7422d
commit c831577
Showing 1 changed file with 57 additions and 6 deletions.
diff --git a/kernel/src/docs/ops/cache.txt b/kernel/src/docs/ops/cache.txt
@@ -110,11 +110,22 @@ Object cache
 ***********
 
 The object cache caches individual nodes and relationships and their properties in a form that is optimized for fast traversal of the graph.
+There are two different categories of object caches in Neo4j.
 
-Neo4j will utilize as much as it can out of the allocated heap memory for the JVM for the object cache. Note however that Neo4j is "competing" for the heap space with other objects in the same JVM, such as a custom application, if deployed in embedded mode, and Neo4j will let the application "win" by using less memory if the application needs more.
+There's the reference caches.
+Here neo4j will utilize as much as it can out of the allocated heap memory for the JVM for object caching and relies on garbage collection for eviction from the cache in an LRU manner.
+Note however that Neo4j is "competing" for the heap space with other objects in the same JVM, such as a custom application, if deployed in embedded mode, and Neo4j will let the application "win" by using less memory if the application needs more.
+
+The other is the _atomic array cache_ which gets assigned a certain amount of space in the JVM heap and will purge objects from itself whenever it grows bigger than that.
+It uses an +AtomicArrayReference+ to cache the objects and doesn't rely on garbage collection for evicting objects from the cache.
+Here the competition with other objects in the heap as well as GC-pauses can be better controlled since the cache gets assigned a maximum heap space usage.
+The overhead of the array cache is also much smaller and insert/lookup times faster than for reference caches.
 
 [TIP]
-The use of heap memory is subject to the java garbage collector - depending on the cache type some tuning might be needed to play well with the GC at large heap sizes . Therefore, if you assigning large a heaps for Neo4j's sake isn't always the best strategy as it may lead to long GC-pauses. Instead leave some RAM space for Neo4j's filesystem caches. These are outside of the heap and under under the kernel's direct control, thus more efficiently managed.
+The use of heap memory is subject to the java garbage collector - depending on the cache type some tuning might be needed to play well with the GC at large heap sizes.
+Therefore, assigning a large heap for Neo4j's sake isn't always the best strategy as it may lead to long GC-pauses.
+Instead leave some space for Neo4j's filesystem caches.
+These are outside of the heap and under under the kernel's direct control, thus more efficiently managed.
 
 The content of this cache are objects with a representation geared towards supporting the Neo4j object API and graph traversals.
 Reading from this cache is 5 to 10 times faster than reading from the file buffer cache.
@@ -125,14 +136,14 @@ The cached objects are however populated lazily.
 The properties for a node or relationship are not loaded until properties are accessed for that node or relationship.
 String (and array) properties are not loaded until that particular property is accessed.
 The relationships for a particular node is also not loaded until the relationships are accessed for that node.
-Eviction from the cache happens in an LRU manner when the memory is needed.
 
 
 Configuration
 ~~~~~~~~~~~~~
 
 The main configuration parameter for the object cache is the `cache_type` parameter.
 This specifies which cache implementation to use for the object cache.
+Note that there will exist two cache instances, one for nodes and one for relationships.
 The available cache types are:
 
 [options="header",frame="none",cols="<15m,<85"]
@@ -146,9 +157,49 @@ The available cache types are:
                  This is the default cache implementation.
 | weak         | Provides short life span for cached objects.
                  Suitable for high throughput applications where a larger portion of the graph than what can fit into memory is frequently accessed.
-| strong       | This cache will cache *all data* in the *entire graph*.
-                 It will never release memory held by the cache.
-                 Provides optimal performance if your graph is small enough to fit in memory.
+| strong       | This cache will hold on to *all data* that gets loaded to never release it again.
+                 Provides good performance if your graph is small enough to fit in memory.
+| array        | Provides means of assigning a specific amount of memory to dedicate to caching loaded nodes and relationships.
+                 Small footprint and fast insert/lookup. Should be the best option for most scenarios. See below on how to configure it.
+|==========================================
+
+Array cache configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since the array cache operates with a dedicated space in the JVM it may be configured per use case for optimal performance.
+There are two aspects of the cache size.
+
+One is the size of the actual array to put the references to the objects in. It is specified as a fraction of the heap, for example specifying +1+ will let the array size itself take up 1% out of the entire heap.
+Increasing the array size will reduce the chance of collisions (positions in the array are calculated from a hash based on the object id) at the expense of more heap used for it.
+More collisions means more redundant loading of objects from the low level cache to the high level cache.
+
+[options="header",frame="none",cols="<15m,<85"]
+|==========================================
+| `configuration option`            | Description (what it controls)                                                        | Example value
+| node_cache_array_fraction         | Fraction of the heap to dedicate to the array holding the nodes in the cache.         | 1
+| relationship_cache_array_fraction | Fraction of the heap to dedicate to the array holding the relationships in the cache. | 2
+|==========================================
+
+The other aspect is the maximum size of all the objects in the cache. It is specified as size in bytes, for example +500M+ for 500 megabytes.
+Right before the maximum size is reached a +purge+ is performed where (currently) random objects are evicted from the cache until the cache size gets below 90% of the maximum size.
+Optimal settings for the maximum size is dependent on the size of your graph.
+If your graph is small set it to be able to just fit your entire graph so that the rest of the heap can be used for other things.
+If your graph is large set it to a comfortable size where most or a decent portion of your graph fits in it, but still small enough to make room for other object in the heap.
+The amount of space to "give" to the other object is dependent on how high the load is so that garbage collection is able to keep up. 
+
+[options="header",frame="none",cols="<15m,<85"]
+|==========================================
+| `configuration option`   | Description (what it controls)                                           | Example value
+| node_cache_size          | Maximum size of the heap memory to dedicate to the cached nodes.         | 2G
+| relationship_cache_size  | Maximum size of the heap memory to dedicate to the cached relationships. | 800M
+|==========================================
+
+There's also an option for controlling if and how often statistics about the array cache is logged, for some kind of entry point to diagnose the behavior of your cache with the current configuration.
+
+[options="header",frame="none",cols="<15m,<85"]
+|==========================================
+| `configuration option`       | Description                                                          | Example value
+| array_cache_min_log_interval | Minimum time between statistical log messages about the array cache. | 30s (for 30 seconds) or 4m (for 4 minutes)
 |==========================================
 
 You can read about references and relevant JVM settings for Sun HotSpot here: