Skip to content

Commit

Permalink
Try compiling with a large memory model if the initial compile fails.
Browse files Browse the repository at this point in the history
Documentation and comments are still not completely coherent on
this subject, but the compilation issue itself is improved.
  • Loading branch information
Greg Smith authored and Greg Smith committed Jul 17, 2012
1 parent 0dbaeda commit ebfc6aa
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 33 deletions.
54 changes: 31 additions & 23 deletions README.rst
Expand Up @@ -312,38 +312,46 @@ Bugs
====

On some systems, the amount of memory selected for the stream array
ends up exceeding how large of a block of RAM the system is willing
to allocate at once. This seems a particular issue on 32-bit operating
systems, but even 64-bit ones are not immune. The program currently
enforces an upper limit on the stream array size of 130M, which
allocates approximately 3GB of memory just for that part (with 4GB being
the normal limit for 32-bit structures). If your system fails to
compile stream with an error such as this::
ends up exceeding how large of a block of RAM the operatin system (or
in some cases the compiler) is willing to allocate at once. This
seems a particular issue on 32-bit operating systems, but even 64-bit
ones are not immune.

If your system fails to compile stream with an error such as this::

stream.c:(.text+0x34): relocation truncated to fit: R_X86_64_32S against `.bss'

You will need to manually decrease the size of the array until the
program will compile and link. Manual compile can be done like this::
stream-scaling will try to compile stream using the gcc "-mcmodel=large"
option after hitting this error. That will let the program use larger data
structures. If you are using a new enough version of the gcc compiler,
believed to be at least verison 4.4, the program will run normally after
that; you can ignore these "relocation truncated" warnings.

If you have both a large amount of cache--so a matching large block of memory
is needed--and an older version of gcc, the second compile attempt will also
fail, with the following error::

stream.c:1: sorry, unimplemented: code model ‘large’ not supported yet

In that case, it is unlikely you will get accurate results from
stream-scaling. You can try it anyway by manually decreasing the size of the
array until the program will compile and link. Manual compile can be done like
this::

gcc -O3 -DN=130000000 -fopenmp stream.c -o stream

And then reducing the ``-DN`` value until compilation is successful.
After that upper limit is determined, adjust the setting for
MAX_ARRAY_SIZE at the beginning of the stream-scaling program to reflect
it.

The current version of stream-scaling tries to work around this by
using a customized version of the stream code that dynamically allocates
these arrays. It is still possible a problem here exists, and a
warning suggesting a workaround (an easier one than doing a manual
compile as described above) appears if your system appears to have
so much cache it could run into this issue.

If you encounter this situation, where stream-scaling still doesn't
work properly for you, a problem report to the author would
be appreciated. It's not clear yet why the exact cut-off value varies
on some systems, or if there are systems where the improved dynamic
allocation logic may not be sufficient.
it. An upper limit on the stream array size of 130M as shown here
allocates approximately 3GB of memory for the test array, with 4GB being
the normal limit for 32-bit structures.

The fixes for this issue are new, and it is still possible a problem here
still exists. If you have a gcc version >=4.4 but stream-scaling still won't
compile correctly, a problem report to the author would be appreciated. It's
not clear yet why the exact cut-off value varies on some systems, or if there
are systems where the improved dynamic allocation logic may not be sufficient.

Documentation
=============
Expand Down
32 changes: 22 additions & 10 deletions stream-scaling
Expand Up @@ -195,7 +195,7 @@ function stream_array_elements {
fi

# The array sizing code will overflow 32 bits on systems with many
# processors having lots of cache. The crash looks like this:
# processors having lots of cache. The compiler error looks like this:
#
# $ gcc -O3 -DN=133823657 -fopenmp stream.c -o stream
# /tmp/ccecdC49.o: In function `checkSTREAMresults':
Expand All @@ -214,9 +214,11 @@ function stream_array_elements {
# stream.c:(.text+0x660): relocation truncated to fit: R_X86_64_32S against `.bss'
# stream.c:(.text+0x6ab): additional relocation overflows omitted from the output
# collect2: ld returned 1 exit status

# Clamp the upper value to a smaller maximum size to try and avoid this
# error. 130,000,000 makes for approximately a 3GB array.
#
# Warn about this issue, and provide a way to clamp the upper value to a smaller
# maximum size to try and avoid this error. 130,000,000 makes for approximately
# a 3GB array. The large memory model compiler option will avoid this issue
# if a gcc version that supports it is available.
if [ $NEEDED_SIZE -gt $MAX_ARRAY_SIZE ] ; then
#
# Size clamp code
Expand All @@ -236,18 +238,16 @@ function stream_array_elements {
fi

# Given the sizing above uses a factor of 10X cache size, this reduced size
# is still large enough for current generation procesors up to the 48 core
# might still be large enough for current generation procesors up to the 48 core
# range. For example, a system containing 8 Intel Xeon L7555 processors with
# 4 cores having 24576 KB cache each will suggest:
#
# Total CPU system cache: 814743552 bytes
# Computed minimum array elements needed: 370337978
#
# So using 130,000,000 instead of 370,337,978 still an array >3X the
# size of cache sum. Really large systems with >48 processors might overflow
# this still, but hopefully this limitation will be addressed by the
# underlying stream code being called here eventually, rather than
# trying to work around it here.
# So using 130,000,000 instead of 370,337,978 still be an array >3X the
# size of the cache sum in this case. Really large systems with >48 processors
# might overflow this still.

echo Array elements used: $NEEDED_SIZE
eval $__resultvar="'$NEEDED_SIZE'"
Expand Down Expand Up @@ -307,6 +307,18 @@ if [ -f stream ] ; then
fi

gcc -O3 $ARRAY_FLAG -fopenmp stream.c -o stream
if [ $? -ne 0 ] ; then
# The most likely way the program will fail to compile is if it's
# trying to use more memory than will fit on the standard gcc memory
# model. Try the large one instead. This will only work on newer
# gcc versions (it works on at least>=4.4), so there's no single
# compile option set here that will support older gcc versions
# and the large memory model. Just trying both ways seems both
# simpler and more definitive than something like checking the
# gcc version.
echo === Trying large memory model ===
gcc -O3 $ARRAY_FLAG -fopenmp stream.c -o stream -mcmodel=large
fi

if [ ! -x stream ] ; then
echo Error: did not find valid stream program compiled here, aborting
Expand Down

0 comments on commit ebfc6aa

Please sign in to comment.