Optimize memory allocation

A number of memory allocation optimizations have been implemented. Most optimizations reduce contention caused by synchronization between threads during allocation and deallocation of memory. Most notably: * Synchronization of memory management in scheduler specific allocator instances has been rewritten to use lock-free synchronization. * Synchronization of memory management in scheduler specific pre-allocators has been rewritten to use lock-free synchronization. * The 'mseg_alloc' memory segment allocator now use scheduler specific instances instead of one instance. Apart from reducing contention this also ensures that memory allocators always create memory segments on the local NUMA node on a NUMA system.
erlang · Nov 13, 2011 · a67e91e · a67e91e
1 parent 55358c5
commit a67e91e
Show file tree

Hide file tree

Showing 61 changed files with 6,302 additions and 2,153 deletions.
diff --git a/erts/configure.in b/erts/configure.in
@@ -259,13 +259,6 @@ AS_HELP_STRING([--enable-m32-build],
   esac
 ],enable_m32_build=no)
 
-AC_ARG_ENABLE(fixalloc,
-AS_HELP_STRING([--disable-fixalloc], [disable the use of fix_alloc]))
-if test x${enable_fixalloc} = xno ; then
-  AC_DEFINE(NO_FIX_ALLOC,[],
-	    [Define if you don't want the fix allocator in Erlang])
-fi
-
 AC_SUBST(PERFCTR_PATH)
 AC_ARG_WITH(perfctr,
 AS_HELP_STRING([--with-perfctr=PATH],

diff --git a/erts/doc/src/erts_alloc.xml b/erts/doc/src/erts_alloc.xml
@@ -58,11 +58,8 @@
       <item>Allocator used for memory blocks that are expected to be
        long-lived, for example Erlang code.</item>
       <tag><c>fix_alloc</c></tag>
-      <item>A very fast allocator used for some fix-sized
-       data. <c>fix_alloc</c> manages a set of memory pools from
-       which memory blocks are handed out. <c>fix_alloc</c>
-       allocates memory pools from <c>ll_alloc</c>. Memory pools
-       that have been allocated are never deallocated.</item>
+      <item>A fast allocator used for some frequently used
+       fixed size data types.</item>
       <tag><c>std_alloc</c></tag>
       <item>Allocator used for most memory blocks not allocated via any of
        the other allocators described above.</item>
@@ -83,7 +80,7 @@
       where only small blocks are placed. Currently this allocator is
       disabled by default.</item>
     </taglist>
-    <p><c>sys_alloc</c> and <c>fix_alloc</c> are always enabled and
+    <p><c>sys_alloc</c> is always enabled and
       cannot be disabled. <c>mseg_alloc</c> is always enabled if it is
       available and an allocator that uses it is enabled. All other
       allocators can be <seealso marker="#M_e">enabled or disabled</seealso>.
@@ -104,7 +101,7 @@
     <marker id="alloc_util"></marker>
     <title>The alloc_util framework</title>
     <p>Internally a framework called <c>alloc_util</c> is used for
-      implementing allocators. <c>sys_alloc</c>, <c>fix_alloc</c>, and
+      implementing allocators. <c>sys_alloc</c>, and
       <c>mseg_alloc</c> do not use this framework; hence, the
       following does <em>not</em> apply to them.</p>
     <p>An allocator manages multiple areas, called carriers, in which
@@ -212,6 +209,14 @@
 	  This since it will only cause problems for other allocators.</p>
       </item>
     </taglist>
+    <p>Apart from the ordinary allocators described above a number of
+       pre-allocators are used for some specific data types. These
+       pre-allocators pre-allocate a fixed amount of memory for certain data
+       types when the run-time system starts. As long as there are available
+       pre-allocated memory, it will be used. When no pre-allocated memory is
+       available, memory will be allocated in ordinary allocators. These
+       pre-allocators are typically much faster than the ordinary allocators,
+       but can only satisfy a limited amount of requests.</p>
   </section>
 
   <note><p>
@@ -272,18 +277,6 @@
        Max cached segments. The maximum number of memory segments
        stored in the memory segment cache. Valid range is
        0-30. Default value is 5.</item>
-      <tag><marker id="MMcci"><c><![CDATA[+MMcci <time>]]></c></marker></tag>
-      <item>
-       Cache check interval (in milliseconds). The memory segment
-       cache is checked for segments to destroy at an interval
-       determined by this parameter. Default value is 1000.</item>
-    </taglist>
-    <p>The following flags are available for configuration of
-      <c>fix_alloc</c>:</p>
-    <taglist>
-      <tag><marker id="MFe"><c>+MFe true</c></marker></tag>
-      <item>
-       Enable <c>fix_alloc</c>. Note: <c>fix_alloc</c> cannot be disabled.</item>
     </taglist>
     <p>The following flags are available for configuration of
       <c>sys_alloc</c>:</p>
@@ -322,7 +315,7 @@
        based on <c>alloc_util</c>. If <c>u</c> is used as subsystem
        identifier (i.e., <c><![CDATA[<S> = u]]></c>) all allocators based on
        <c>alloc_util</c> will be effected. If <c>B</c>, <c>D</c>, <c>E</c>,
-       <c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
+        <c>F</c>, <c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
        subsystem identifier, only the specific allocator identified will be
        effected:</p>
     <taglist>
@@ -441,26 +434,23 @@
        kilobytes). See <seealso marker="#mseg_mbc_sizes">the description
        on how sizes for mseg_alloc multiblock carriers are decided</seealso>
        in "the <c>alloc_util</c> framework" section.</item>
-      <tag><marker id="M_t"><c><![CDATA[+M<S>t true|false|<amount>]]></c></marker></tag>
+      <tag><marker id="M_t"><c><![CDATA[+M<S>t true|false]]></c></marker></tag>
       <item>
-        <p>Multiple, thread specific instances of the allocator.
-           This option will only have any effect on the runtime system
-           with SMP support. Default behaviour on the runtime system with
-           SMP support (<c>N</c> equals the number of scheduler threads):</p>
+       Multiple, thread specific instances of the allocator.
+       This option will only have any effect on the runtime system
+       with SMP support. Default behaviour on the runtime system with
+       SMP support:
        <taglist>
-         <tag><c>temp_alloc</c></tag>
-	 <item><c>N + 1</c> instances.</item>
          <tag><c>ll_alloc</c></tag>
 	 <item><c>1</c> instance.</item>
          <tag>Other allocators</tag>
-	 <item><c>N</c> instances when <c>N</c> is less than or equal to
-	 <c>16</c>. <c>16</c> instances when <c>N</c> is greater than
-	 <c>16</c>.</item>
+	 <item><c>NoSchedulers+1</c> instances. Each scheduler will use
+	 a lock-free instance of its own and other threads will use
+	 a common instance.</item>
        </taglist>
-       <p><c>temp_alloc</c> will always use <c>N + 1</c> instances when
-          this option has been enabled regardless of the amount passed.
-          Other allocators will use the same amount of instances as the
-          amount passed as long as it isn't greater than <c>N</c>.</p>
+       It was previously (before ERTS version 5.9) possible to configure
+       a smaller amount of thread specific instances than schedulers.
+       This is, however, not possible any more.
       </item>
     </taglist>
     <p>Currently the following flags are available for configuration of

diff --git a/erts/emulator/Makefile.in b/erts/emulator/Makefile.in
@@ -725,7 +725,7 @@ RUN_OBJS = \
 	$(OBJDIR)/external.o		$(OBJDIR)/dist.o \
 	$(OBJDIR)/binary.o		$(OBJDIR)/erl_db.o \
 	$(OBJDIR)/erl_db_util.o		$(OBJDIR)/erl_db_hash.o \
-	$(OBJDIR)/erl_db_tree.o		$(OBJDIR)/fix_alloc.o \
+	$(OBJDIR)/erl_db_tree.o		$(OBJDIR)/erl_thr_progress.o \
 	$(OBJDIR)/big.o			$(OBJDIR)/hash.o \
 	$(OBJDIR)/index.o		$(OBJDIR)/atom.o \
 	$(OBJDIR)/module.o		$(OBJDIR)/export.o \
@@ -742,7 +742,8 @@ RUN_OBJS = \
 	$(OBJDIR)/erl_bif_re.o		$(OBJDIR)/erl_unicode.o \
 	$(OBJDIR)/packet_parser.o	$(OBJDIR)/safe_hash.o \
 	$(OBJDIR)/erl_zlib.o		$(OBJDIR)/erl_nif.o \
-	$(OBJDIR)/erl_bif_binary.o      $(OBJDIR)/erl_ao_firstfit_alloc.o
+	$(OBJDIR)/erl_bif_binary.o      $(OBJDIR)/erl_ao_firstfit_alloc.o \
+	$(OBJDIR)/erl_sched_spec_pre_alloc.o
 
 ifeq ($(TARGET),win32)
 DRV_OBJS = \

diff --git a/erts/emulator/beam/atom.names b/erts/emulator/beam/atom.names
@@ -69,6 +69,8 @@ atom ac
 atom active
 atom all
 atom all_but_first
+atom alloc_info
+atom alloc_sizes
 atom allocated
 atom allocated_areas
 atom allocator
@@ -553,5 +555,6 @@ atom warning_msg
 atom wordsize
 atom write_concurrency
 atom xor
+atom x86
 atom yes
 atom yield
diff --git a/erts/emulator/beam/bif.tab b/erts/emulator/beam/bif.tab
@@ -160,10 +160,6 @@ bif erlang:md5_update/2
 bif 'erl.util.crypt.md5':update/2	ebif_md5_update_2
 bif erlang:md5_final/1
 bif 'erl.util.crypt.md5':final/1	ebif_md5_final_1
-bif erlang:memory/0
-bif 'erl.lang':memory/0			ebif_memory_0
-bif erlang:memory/1
-bif 'erl.lang':memory/1			ebif_memory_1
 bif erlang:module_loaded/1
 bif 'erl.system.code':is_loaded/1	ebif_is_loaded_1 module_loaded_1
 bif erlang:function_exported/3

diff --git a/erts/emulator/beam/erl_afit_alloc.c b/erts/emulator/beam/erl_afit_alloc.c
@@ -65,16 +65,20 @@ erts_afalc_start(AFAllctr_t *afallctr,
 		 AFAllctrInit_t *afinit,
 		 AllctrInit_t *init)
 {
-    AFAllctr_t nulled_state = {{0}};
-    /* {{0}} is used instead of {0}, in order to avoid (an incorrect) gcc
-       warning. gcc warns if {0} is used as initializer of a struct when
-       the first member is a struct (not if, for example, the third member
-       is a struct). */
+    struct {
+	int dummy;
+	AFAllctr_t allctr;
+    } zero = {0};
+    /* The struct with a dummy element first is used in order to avoid (an
+       incorrect) gcc warning. gcc warns if {0} is used as initializer of
+       a struct when the first member is a struct (not if, for example,
+       the third member is a struct). */
+
     Allctr_t *allctr = (Allctr_t *) afallctr;
 
-    init->sbmbct = 0; /* Small mbc not supported by afit */
+    sys_memcpy((void *) afallctr, (void *) &zero.allctr, sizeof(AFAllctr_t));
 
-    sys_memcpy((void *) afallctr, (void *) &nulled_state, sizeof(AFAllctr_t));
+    init->sbmbct = 0; /* Small mbc not supported by afit */
 
     allctr->mbc_header_size		= sizeof(Carrier_t);
     allctr->min_mbc_size		= MIN_MBC_SZ;