Permalink
Browse files

Optimize memory allocation

A number of memory allocation optimizations have been implemented. Most
optimizations reduce contention caused by synchronization between
threads during allocation and deallocation of memory. Most notably:
* Synchronization of memory management in scheduler specific allocator
  instances has been rewritten to use lock-free synchronization.
* Synchronization of memory management in scheduler specific
  pre-allocators has been rewritten to use lock-free synchronization.
* The 'mseg_alloc' memory segment allocator now use scheduler specific
  instances instead of one instance. Apart from reducing contention
  this also ensures that memory allocators always create memory
  segments on the local NUMA node on a NUMA system.
  • Loading branch information...
rickard-green committed Sep 15, 2010
1 parent 55358c5 commit a67e91e658bdbba24fcc3c79b06fdf10ff830bc9
Showing with 6,302 additions and 2,153 deletions.
  1. +0 −7 erts/configure.in
  2. +24 −34 erts/doc/src/erts_alloc.xml
  3. +3 −2 erts/emulator/Makefile.in
  4. +3 −0 erts/emulator/beam/atom.names
  5. +0 −4 erts/emulator/beam/bif.tab
  6. +11 −7 erts/emulator/beam/erl_afit_alloc.c
  7. +692 −590 erts/emulator/beam/erl_alloc.c
  8. +47 −139 erts/emulator/beam/erl_alloc.h
  9. +13 −13 erts/emulator/beam/erl_alloc.types
  10. +844 −173 erts/emulator/beam/erl_alloc_util.c
  11. +105 −4 erts/emulator/beam/erl_alloc_util.h
  12. +10 −6 erts/emulator/beam/erl_ao_firstfit_alloc.c
  13. +10 −6 erts/emulator/beam/erl_bestfit_alloc.c
  14. +56 −25 erts/emulator/beam/erl_bif_info.c
  15. +11 −7 erts/emulator/beam/erl_goodfit_alloc.c
  16. +11 −4 erts/emulator/beam/erl_init.c
  17. +1 −4 erts/emulator/beam/erl_lock_check.c
  18. +1 −7 erts/emulator/beam/erl_mtrace.c
  19. +881 −276 erts/emulator/beam/erl_process.c
  20. +47 −13 erts/emulator/beam/erl_process.h
  21. +16 −1 erts/emulator/beam/erl_process_lock.h
  22. +305 −0 erts/emulator/beam/erl_sched_spec_pre_alloc.c
  23. +239 −0 erts/emulator/beam/erl_sched_spec_pre_alloc.h
  24. +1,010 −0 erts/emulator/beam/erl_thr_progress.c
  25. +210 −0 erts/emulator/beam/erl_thr_progress.h
  26. +2 −0 erts/emulator/beam/erl_threads.h
  27. +0 −287 erts/emulator/beam/fix_alloc.c
  28. +1 −1 erts/emulator/beam/global.h
  29. +5 −8 erts/emulator/beam/sys.h
  30. +1 −1 erts/emulator/beam/time.c
  31. +9 −7 erts/emulator/beam/utils.c
  32. +0 −1 erts/emulator/hipe/hipe_bif_list.m4
  33. +11 −0 erts/emulator/sys/common/erl_check_io.c
  34. +8 −1 erts/emulator/sys/common/erl_check_io.h
  35. +369 −300 erts/emulator/sys/common/erl_mseg.c
  36. +6 −5 erts/emulator/sys/common/erl_mseg.h
  37. +115 −78 erts/emulator/sys/common/erl_poll.c
  38. +4 −1 erts/emulator/sys/common/erl_poll.h
  39. +4 −2 erts/emulator/sys/unix/erl_unix_sys.h
  40. +20 −4 erts/emulator/sys/unix/sys.c
  41. +8 −3 erts/emulator/sys/vxworks/sys.c
  42. +6 −0 erts/emulator/sys/win32/erl_poll.c
  43. +1 −2 erts/emulator/sys/win32/sys.c
  44. +78 −7 erts/emulator/test/driver_SUITE.erl
  45. +2 −1 erts/emulator/test/driver_SUITE_data/Makefile.src
  46. +241 −0 erts/emulator/test/driver_SUITE_data/thr_free_drv.c
  47. +13 −0 erts/emulator/test/mtx_SUITE.erl
  48. +311 −2 erts/emulator/test/system_info_SUITE.erl
  49. +1 −2 erts/etc/common/erlexec.c
  50. BIN erts/preloaded/ebin/erl_prim_loader.beam
  51. BIN erts/preloaded/ebin/erlang.beam
  52. BIN erts/preloaded/ebin/init.beam
  53. BIN erts/preloaded/ebin/otp_ring0.beam
  54. BIN erts/preloaded/ebin/prim_file.beam
  55. BIN erts/preloaded/ebin/prim_inet.beam
  56. BIN erts/preloaded/ebin/prim_zip.beam
  57. BIN erts/preloaded/ebin/zlib.beam
  58. +404 −0 erts/preloaded/src/erlang.erl
  59. +6 −4 lib/runtime_tools/src/erts_alloc_config.erl
  60. +109 −106 lib/stdlib/test/ets_SUITE.erl
  61. +17 −8 lib/stdlib/test/supervisor_SUITE.erl
View
@@ -259,13 +259,6 @@ AS_HELP_STRING([--enable-m32-build],
esac
],enable_m32_build=no)
AC_ARG_ENABLE(fixalloc,
AS_HELP_STRING([--disable-fixalloc], [disable the use of fix_alloc]))
if test x${enable_fixalloc} = xno ; then
AC_DEFINE(NO_FIX_ALLOC,[],
[Define if you don't want the fix allocator in Erlang])
fi
AC_SUBST(PERFCTR_PATH)
AC_ARG_WITH(perfctr,
AS_HELP_STRING([--with-perfctr=PATH],
@@ -58,11 +58,8 @@
<item>Allocator used for memory blocks that are expected to be
long-lived, for example Erlang code.</item>
<tag><c>fix_alloc</c></tag>
<item>A very fast allocator used for some fix-sized
data. <c>fix_alloc</c> manages a set of memory pools from
which memory blocks are handed out. <c>fix_alloc</c>
allocates memory pools from <c>ll_alloc</c>. Memory pools
that have been allocated are never deallocated.</item>
<item>A fast allocator used for some frequently used
fixed size data types.</item>
<tag><c>std_alloc</c></tag>
<item>Allocator used for most memory blocks not allocated via any of
the other allocators described above.</item>
@@ -83,7 +80,7 @@
where only small blocks are placed. Currently this allocator is
disabled by default.</item>
</taglist>
<p><c>sys_alloc</c> and <c>fix_alloc</c> are always enabled and
<p><c>sys_alloc</c> is always enabled and
cannot be disabled. <c>mseg_alloc</c> is always enabled if it is
available and an allocator that uses it is enabled. All other
allocators can be <seealso marker="#M_e">enabled or disabled</seealso>.
@@ -104,7 +101,7 @@
<marker id="alloc_util"></marker>
<title>The alloc_util framework</title>
<p>Internally a framework called <c>alloc_util</c> is used for
implementing allocators. <c>sys_alloc</c>, <c>fix_alloc</c>, and
implementing allocators. <c>sys_alloc</c>, and
<c>mseg_alloc</c> do not use this framework; hence, the
following does <em>not</em> apply to them.</p>
<p>An allocator manages multiple areas, called carriers, in which
@@ -212,6 +209,14 @@
This since it will only cause problems for other allocators.</p>
</item>
</taglist>
<p>Apart from the ordinary allocators described above a number of
pre-allocators are used for some specific data types. These
pre-allocators pre-allocate a fixed amount of memory for certain data
types when the run-time system starts. As long as there are available
pre-allocated memory, it will be used. When no pre-allocated memory is
available, memory will be allocated in ordinary allocators. These
pre-allocators are typically much faster than the ordinary allocators,
but can only satisfy a limited amount of requests.</p>
</section>
<note><p>
@@ -272,18 +277,6 @@
Max cached segments. The maximum number of memory segments
stored in the memory segment cache. Valid range is
0-30. Default value is 5.</item>
<tag><marker id="MMcci"><c><![CDATA[+MMcci <time>]]></c></marker></tag>
<item>
Cache check interval (in milliseconds). The memory segment
cache is checked for segments to destroy at an interval
determined by this parameter. Default value is 1000.</item>
</taglist>
<p>The following flags are available for configuration of
<c>fix_alloc</c>:</p>
<taglist>
<tag><marker id="MFe"><c>+MFe true</c></marker></tag>
<item>
Enable <c>fix_alloc</c>. Note: <c>fix_alloc</c> cannot be disabled.</item>
</taglist>
<p>The following flags are available for configuration of
<c>sys_alloc</c>:</p>
@@ -322,7 +315,7 @@
based on <c>alloc_util</c>. If <c>u</c> is used as subsystem
identifier (i.e., <c><![CDATA[<S> = u]]></c>) all allocators based on
<c>alloc_util</c> will be effected. If <c>B</c>, <c>D</c>, <c>E</c>,
<c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
<c>F</c>, <c>H</c>, <c>L</c>, <c>R</c>, <c>S</c>, or <c>T</c> is used as
subsystem identifier, only the specific allocator identified will be
effected:</p>
<taglist>
@@ -441,26 +434,23 @@
kilobytes). See <seealso marker="#mseg_mbc_sizes">the description
on how sizes for mseg_alloc multiblock carriers are decided</seealso>
in "the <c>alloc_util</c> framework" section.</item>
<tag><marker id="M_t"><c><![CDATA[+M<S>t true|false|<amount>]]></c></marker></tag>
<tag><marker id="M_t"><c><![CDATA[+M<S>t true|false]]></c></marker></tag>
<item>
<p>Multiple, thread specific instances of the allocator.
This option will only have any effect on the runtime system
with SMP support. Default behaviour on the runtime system with
SMP support (<c>N</c> equals the number of scheduler threads):</p>
Multiple, thread specific instances of the allocator.
This option will only have any effect on the runtime system
with SMP support. Default behaviour on the runtime system with
SMP support:
<taglist>
<tag><c>temp_alloc</c></tag>
<item><c>N + 1</c> instances.</item>
<tag><c>ll_alloc</c></tag>
<item><c>1</c> instance.</item>
<tag>Other allocators</tag>
<item><c>N</c> instances when <c>N</c> is less than or equal to
<c>16</c>. <c>16</c> instances when <c>N</c> is greater than
<c>16</c>.</item>
<item><c>NoSchedulers+1</c> instances. Each scheduler will use
a lock-free instance of its own and other threads will use
a common instance.</item>
</taglist>
<p><c>temp_alloc</c> will always use <c>N + 1</c> instances when
this option has been enabled regardless of the amount passed.
Other allocators will use the same amount of instances as the
amount passed as long as it isn't greater than <c>N</c>.</p>
It was previously (before ERTS version 5.9) possible to configure
a smaller amount of thread specific instances than schedulers.
This is, however, not possible any more.
</item>
</taglist>
<p>Currently the following flags are available for configuration of
@@ -725,7 +725,7 @@ RUN_OBJS = \
$(OBJDIR)/external.o $(OBJDIR)/dist.o \
$(OBJDIR)/binary.o $(OBJDIR)/erl_db.o \
$(OBJDIR)/erl_db_util.o $(OBJDIR)/erl_db_hash.o \
$(OBJDIR)/erl_db_tree.o $(OBJDIR)/fix_alloc.o \
$(OBJDIR)/erl_db_tree.o $(OBJDIR)/erl_thr_progress.o \
$(OBJDIR)/big.o $(OBJDIR)/hash.o \
$(OBJDIR)/index.o $(OBJDIR)/atom.o \
$(OBJDIR)/module.o $(OBJDIR)/export.o \
@@ -742,7 +742,8 @@ RUN_OBJS = \
$(OBJDIR)/erl_bif_re.o $(OBJDIR)/erl_unicode.o \
$(OBJDIR)/packet_parser.o $(OBJDIR)/safe_hash.o \
$(OBJDIR)/erl_zlib.o $(OBJDIR)/erl_nif.o \
$(OBJDIR)/erl_bif_binary.o $(OBJDIR)/erl_ao_firstfit_alloc.o
$(OBJDIR)/erl_bif_binary.o $(OBJDIR)/erl_ao_firstfit_alloc.o \
$(OBJDIR)/erl_sched_spec_pre_alloc.o
ifeq ($(TARGET),win32)
DRV_OBJS = \
@@ -69,6 +69,8 @@ atom ac
atom active
atom all
atom all_but_first
atom alloc_info
atom alloc_sizes
atom allocated
atom allocated_areas
atom allocator
@@ -553,5 +555,6 @@ atom warning_msg
atom wordsize
atom write_concurrency
atom xor
atom x86
atom yes
atom yield
@@ -160,10 +160,6 @@ bif erlang:md5_update/2
bif 'erl.util.crypt.md5':update/2 ebif_md5_update_2
bif erlang:md5_final/1
bif 'erl.util.crypt.md5':final/1 ebif_md5_final_1
bif erlang:memory/0
bif 'erl.lang':memory/0 ebif_memory_0
bif erlang:memory/1
bif 'erl.lang':memory/1 ebif_memory_1
bif erlang:module_loaded/1
bif 'erl.system.code':is_loaded/1 ebif_is_loaded_1 module_loaded_1
bif erlang:function_exported/3
@@ -65,16 +65,20 @@ erts_afalc_start(AFAllctr_t *afallctr,
AFAllctrInit_t *afinit,
AllctrInit_t *init)
{
AFAllctr_t nulled_state = {{0}};
/* {{0}} is used instead of {0}, in order to avoid (an incorrect) gcc
warning. gcc warns if {0} is used as initializer of a struct when
the first member is a struct (not if, for example, the third member
is a struct). */
struct {
int dummy;
AFAllctr_t allctr;
} zero = {0};
/* The struct with a dummy element first is used in order to avoid (an
incorrect) gcc warning. gcc warns if {0} is used as initializer of
a struct when the first member is a struct (not if, for example,
the third member is a struct). */
Allctr_t *allctr = (Allctr_t *) afallctr;
init->sbmbct = 0; /* Small mbc not supported by afit */
sys_memcpy((void *) afallctr, (void *) &zero.allctr, sizeof(AFAllctr_t));
sys_memcpy((void *) afallctr, (void *) &nulled_state, sizeof(AFAllctr_t));
init->sbmbct = 0; /* Small mbc not supported by afit */
allctr->mbc_header_size = sizeof(Carrier_t);
allctr->min_mbc_size = MIN_MBC_SZ;
Oops, something went wrong.

0 comments on commit a67e91e

Please sign in to comment.