Skip to content

Commit

Permalink
Optimize memory allocation
Browse files Browse the repository at this point in the history
A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
  instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
  pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
  instances instead of one instance. Apart from reducing contention
  this also ensures that memory allocators always create memory
  segments on the local NUMA node on a NUMA system.
  • Loading branch information
rickard-green committed Nov 13, 2011
1 parent 55358c5 commit a67e91e
Show file tree
Hide file tree
Showing 61 changed files with 6,302 additions and 2,153 deletions.
7 changes: 0 additions & 7 deletions erts/configure.in
Original file line number Diff line number Diff line change
Expand Up @@ -259,13 +259,6 @@ AS_HELP_STRING([--enable-m32-build],
esac
],enable_m32_build=no)

AC_ARG_ENABLE(fixalloc,
AS_HELP_STRING([--disable-fixalloc], [disable the use of fix_alloc]))
if test x${enable_fixalloc} = xno ; then
AC_DEFINE(NO_FIX_ALLOC,[],
[Define if you don't want the fix allocator in Erlang])
fi

AC_SUBST(PERFCTR_PATH)
AC_ARG_WITH(perfctr,
AS_HELP_STRING([--with-perfctr=PATH],
Expand Down
58 changes: 24 additions & 34 deletions erts/doc/src/erts_alloc.xml
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,8 @@
<item>Allocator used for memory blocks that are expected to be
long-lived, for example Erlang code.</item>
<tag><c>fix_alloc</c></tag>
<item>A very fast allocator used for some fix-sized
data. <c>fix_alloc</c> manages a set of memory pools from
which memory blocks are handed out. <c>fix_alloc</c>
allocates memory pools from <c>ll_alloc</c>. Memory pools
that have been allocated are never deallocated.</item>
<item>A fast allocator used for some frequently used
fixed size data types.</item>
<tag><c>std_alloc</c></tag>
<item>Allocator used for most memory blocks not allocated via any of
the other allocators described above.</item>
Expand All @@ -83,7 +80,7 @@
where only small blocks are placed. Currently this allocator is
disabled by default.</item>
</taglist>
<p><c>sys_alloc</c> and <c>fix_alloc</c> are always enabled and
<p><c>sys_alloc</c> is always enabled and
cannot be disabled. <c>mseg_alloc</c> is always enabled if it is
available and an allocator that uses it is enabled. All other
allocators can be <seealso marker="#M_e">enabled or disabled</seealso>.
Expand All @@ -104,7 +101,7 @@
<marker id="alloc_util"></marker>
<title>The alloc_util framework</title>
<p>Internally a framework called <c>alloc_util</c> is used for
implementing allocators. <c>sys_alloc</c>, <c>fix_alloc</c>, and
implementing allocators. <c>sys_alloc</c>, and
<c>mseg_alloc</c> do not use this framework; hence, the
following does <em>not</em> apply to them.</p>
<p>An allocator manages multiple areas, called carriers, in which
Expand Down Expand Up @@ -212,6 +209,14 @@
This since it will only cause problems for other allocators.</p>
</item>
</taglist>
<p>Apart from the ordinary allocators described above a number of
pre-allocators are used for some specific data types. These
pre-allocators pre-allocate a fixed amount of memory for certain data
types when the run-time system starts. As long as there are available
pre-allocated memory, it will be used. When no pre-allocated memory is
available, memory will be allocated in ordinary allocators. These
pre-allocators are typically much faster than the ordinary allocators,
but can only satisfy a limited amount of requests.</p>
</section>

<note><p>
Expand Down Expand Up @@ -272,18 +277,6 @@
Max cached segments. The maximum number of memory segments
stored in the memory segment cache. Valid range is
0-30. Default value is 5.</item>
<tag><marker id="MMcci"><c><![CDATA[+MMcci <time>]]></c></marker></tag>
<item>
Cache check interval (in milliseconds). The memory segment
cache is checked for segments to destroy at an interval
determined by this parameter. Default value is 1000.</item>
</taglist>
<p>The following flags are available for configuration of
<c>fix_alloc</c>:</p>
<taglist>
<tag><marker id="MFe"><c>+MFe true</c></marker></tag>
<item>
Enable <c>fix_alloc</c>. Note: <c>fix_alloc</c> cannot be disabled.</item>
</taglist>
<p>The following flags are available for configuration of
<c>sys_alloc</c>:</p>
Expand Down Expand Up @@ -322,7 +315,7 @@
based on <c>alloc_util</c>. If <c>u</c> is used as subsystem
identifier (i.e., <c><![CDATA[<S> = u]]></c>) all allocators based on
<c>alloc_util</c> will be effected. If <c>B</c>, <c>D</c>, <c>E</c>,
<c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
<c>F</c>, <c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
subsystem identifier, only the specific allocator identified will be
effected:</p>
<taglist>
Expand Down Expand Up @@ -441,26 +434,23 @@
kilobytes). See <seealso marker="#mseg_mbc_sizes">the description
on how sizes for mseg_alloc multiblock carriers are decided</seealso>
in "the <c>alloc_util</c> framework" section.</item>
<tag><marker id="M_t"><c><![CDATA[+M<S>t true|false|<amount>]]></c></marker></tag>
<tag><marker id="M_t"><c><![CDATA[+M<S>t true|false]]></c></marker></tag>
<item>
<p>Multiple, thread specific instances of the allocator.
This option will only have any effect on the runtime system
with SMP support. Default behaviour on the runtime system with
SMP support (<c>N</c> equals the number of scheduler threads):</p>
Multiple, thread specific instances of the allocator.
This option will only have any effect on the runtime system
with SMP support. Default behaviour on the runtime system with
SMP support:
<taglist>
<tag><c>temp_alloc</c></tag>
<item><c>N + 1</c> instances.</item>
<tag><c>ll_alloc</c></tag>
<item><c>1</c> instance.</item>
<tag>Other allocators</tag>
<item><c>N</c> instances when <c>N</c> is less than or equal to
<c>16</c>. <c>16</c> instances when <c>N</c> is greater than
<c>16</c>.</item>
<item><c>NoSchedulers+1</c> instances. Each scheduler will use
a lock-free instance of its own and other threads will use
a common instance.</item>
</taglist>
<p><c>temp_alloc</c> will always use <c>N + 1</c> instances when
this option has been enabled regardless of the amount passed.
Other allocators will use the same amount of instances as the
amount passed as long as it isn't greater than <c>N</c>.</p>
It was previously (before ERTS version 5.9) possible to configure
a smaller amount of thread specific instances than schedulers.
This is, however, not possible any more.
</item>
</taglist>
<p>Currently the following flags are available for configuration of
Expand Down
5 changes: 3 additions & 2 deletions erts/emulator/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -725,7 +725,7 @@ RUN_OBJS = \
$(OBJDIR)/external.o $(OBJDIR)/dist.o \
$(OBJDIR)/binary.o $(OBJDIR)/erl_db.o \
$(OBJDIR)/erl_db_util.o $(OBJDIR)/erl_db_hash.o \
$(OBJDIR)/erl_db_tree.o $(OBJDIR)/fix_alloc.o \
$(OBJDIR)/erl_db_tree.o $(OBJDIR)/erl_thr_progress.o \
$(OBJDIR)/big.o $(OBJDIR)/hash.o \
$(OBJDIR)/index.o $(OBJDIR)/atom.o \
$(OBJDIR)/module.o $(OBJDIR)/export.o \
Expand All @@ -742,7 +742,8 @@ RUN_OBJS = \
$(OBJDIR)/erl_bif_re.o $(OBJDIR)/erl_unicode.o \
$(OBJDIR)/packet_parser.o $(OBJDIR)/safe_hash.o \
$(OBJDIR)/erl_zlib.o $(OBJDIR)/erl_nif.o \
$(OBJDIR)/erl_bif_binary.o $(OBJDIR)/erl_ao_firstfit_alloc.o
$(OBJDIR)/erl_bif_binary.o $(OBJDIR)/erl_ao_firstfit_alloc.o \
$(OBJDIR)/erl_sched_spec_pre_alloc.o

ifeq ($(TARGET),win32)
DRV_OBJS = \
Expand Down
3 changes: 3 additions & 0 deletions erts/emulator/beam/atom.names
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ atom ac
atom active
atom all
atom all_but_first
atom alloc_info
atom alloc_sizes
atom allocated
atom allocated_areas
atom allocator
Expand Down Expand Up @@ -553,5 +555,6 @@ atom warning_msg
atom wordsize
atom write_concurrency
atom xor
atom x86
atom yes
atom yield
4 changes: 0 additions & 4 deletions erts/emulator/beam/bif.tab
Original file line number Diff line number Diff line change
Expand Up @@ -160,10 +160,6 @@ bif erlang:md5_update/2
bif 'erl.util.crypt.md5':update/2 ebif_md5_update_2
bif erlang:md5_final/1
bif 'erl.util.crypt.md5':final/1 ebif_md5_final_1
bif erlang:memory/0
bif 'erl.lang':memory/0 ebif_memory_0
bif erlang:memory/1
bif 'erl.lang':memory/1 ebif_memory_1
bif erlang:module_loaded/1
bif 'erl.system.code':is_loaded/1 ebif_is_loaded_1 module_loaded_1
bif erlang:function_exported/3
Expand Down
18 changes: 11 additions & 7 deletions erts/emulator/beam/erl_afit_alloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,16 +65,20 @@ erts_afalc_start(AFAllctr_t *afallctr,
AFAllctrInit_t *afinit,
AllctrInit_t *init)
{
AFAllctr_t nulled_state = {{0}};
/* {{0}} is used instead of {0}, in order to avoid (an incorrect) gcc
warning. gcc warns if {0} is used as initializer of a struct when
the first member is a struct (not if, for example, the third member
is a struct). */
struct {
int dummy;
AFAllctr_t allctr;
} zero = {0};
/* The struct with a dummy element first is used in order to avoid (an
incorrect) gcc warning. gcc warns if {0} is used as initializer of
a struct when the first member is a struct (not if, for example,
the third member is a struct). */

Allctr_t *allctr = (Allctr_t *) afallctr;

init->sbmbct = 0; /* Small mbc not supported by afit */
sys_memcpy((void *) afallctr, (void *) &zero.allctr, sizeof(AFAllctr_t));

sys_memcpy((void *) afallctr, (void *) &nulled_state, sizeof(AFAllctr_t));
init->sbmbct = 0; /* Small mbc not supported by afit */

allctr->mbc_header_size = sizeof(Carrier_t);
allctr->min_mbc_size = MIN_MBC_SZ;
Expand Down
Loading

0 comments on commit a67e91e

Please sign in to comment.