Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posix write of uninitialised bytes #184

Open
ax3l opened this issue Jun 17, 2018 · 14 comments
Open

Posix write of uninitialised bytes #184

ax3l opened this issue Jun 17, 2018 · 14 comments

Comments

@ax3l
Copy link
Contributor

ax3l commented Jun 17, 2018

ADIOS 1.13.1 is adding noise to valgrind debugging with uninitialised byte(s) in the syscall to write(buf) in adios_posix_write_pg1 (adios_posix.c:733).

The issue is triggered when previously a adios_databuffer_resize (buffer.c:68) is called - I guess a few unimportant but still uninitialized bytes slip into the final write call from it.

Detailed valgrind warning for ADIOS 1.13.1 with _nompi library write API:

==22326== Syscall param write(buf) points to uninitialised byte(s)
==22326==    at 0x5B5C190: __write_nocancel (syscall-template.S:84)
==22326==    by 0x2F0F6A: adios_posix_write_pg (adios_posix.c:733)
==22326==    by 0x2F166B: adios_posix_close (adios_posix.c:945)
==22326==    by 0x2993F6: common_adios_close (common_adios.c:1264)
==22326==    by 0x2951BA: adios_close (adios.c:210)

==22326==  Address 0x72323be is 894 bytes inside a block of size 16,777,223 alloc'd
==22326==    at 0x4C2BADF: malloc (vg_replace_malloc.c:298)
==22326==    by 0x4C2DE5F: realloc (vg_replace_malloc.c:785)
==22326==    by 0x2FC133: adios_databuffer_resize (buffer.c:68)
==22326==    by 0x297060: common_adios_open (common_adios.c:417)
==22326==    by 0x294E94: adios_open (adios.c:90)

@pnorbert do you have an idea what's uninitialised here? Maybe you can limit the write() to the initialized bytes or simply zero-init the reallocated buffer's hanging bytes? Even if it might not be a bug, it makes downstream analysis of memory leaks very noisy as soon as we use ADIOS.

@ax3l
Copy link
Contributor Author

ax3l commented Aug 8, 2018

ping @pnorbert do you have an idea what's uninitialised here?

@pnorbert
Copy link
Contributor

pnorbert commented Aug 8, 2018 via email

@ax3l
Copy link
Contributor Author

ax3l commented Aug 8, 2018

If you have CMake 3.10.0+ and ADIOS 1.13.1 in your CMAKE_PREFIX_PATH and adios_config in your PATH, you can copy paste these lines:

# sources
git clone https://github.com/openPMD/openPMD-api.git

mkdir -p openPMD-api-build
cd openPMD-api-build

# configure & build
cmake ../openPMD-api -DopenPMD_USE_HDF5=OFF -DopenPMD_USE_PYTHON=OFF
cmake --build .

# find uninitialised bytes
valgrind bin/SerialIOTests

Corresponding unit test file:
https://github.com/openPMD/openPMD-api/blob/c0ccefb25503b49a10938e2096098c6e3ea5c13b/test/SerialIOTest.cpp

Full output:

$ valgrind bin/SerialIOTests                                                                                                          
==30204== Memcheck, a memory error detector
==30204== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==30204== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==30204== Command: bin/SerialIOTests
==30204== 
==30204== Syscall param write(buf) points to uninitialised byte(s)
==30204==    at 0x61DD190: __write_nocancel (syscall-template.S:84)
==30204==    by 0x4EB602F: adios_posix_write_pg (adios_posix.c:733)
==30204==    by 0x4EB6730: adios_posix_close (adios_posix.c:945)
==30204==    by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204==    by 0x4E5A0FB: adios_close (adios.c:210)
==30204==    by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==  Address 0x83c13be is 894 bytes inside a block of size 16,777,223 alloc'd
==30204==    at 0x4C2BADF: malloc (vg_replace_malloc.c:298)
==30204==    by 0x4C2DE5F: realloc (vg_replace_malloc.c:785)
==30204==    by 0x4EC11F8: adios_databuffer_resize (buffer.c:68)
==30204==    by 0x4E5BFA1: common_adios_open (common_adios.c:417)
==30204==    by 0x4E59DD5: adios_open (adios.c:90)
==30204==    by 0x4E4BBE7: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== 
==30204== Syscall param write(buf) points to uninitialised byte(s)
==30204==    at 0x61DD190: __write_nocancel (syscall-template.S:84)
==30204==    by 0x4EB6177: adios_posix_write_index (adios_posix.c:776)
==30204==    by 0x4EB6EDB: adios_posix_close (adios_posix.c:1087)
==30204==    by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204==    by 0x4E5A0FB: adios_close (adios.c:210)
==30204==    by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x130DF5: Catch::RunContext::invokeActiveTestCase() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x160867: Catch::(anonymous namespace)::runTests(std::shared_ptr<Catch::Config> const&) (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==  Address 0x807e4f1 is 1,185 bytes inside a block of size 1,000,020 alloc'd
==30204==    at 0x4C2BADF: malloc (vg_replace_malloc.c:298)
==30204==    by 0x4C2DE5F: realloc (vg_replace_malloc.c:785)
==30204==    by 0x4E62A75: buffer_write (adios_internals.c:2124)
==30204==    by 0x4E671B3: adios_write_index_v1 (adios_internals.c:4086)
==30204==    by 0x4EB67D7: adios_posix_close (adios_posix.c:960)
==30204==    by 0x4E5E337: common_adios_close (common_adios.c:1264)
==30204==    by 0x4E5A0FB: adios_close (adios.c:210)
==30204==    by 0x4E4BC53: openPMD::ADIOS1IOHandlerImpl::~ADIOS1IOHandlerImpl() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x4E4C29E: openPMD::ADIOS1IOHandler::~ADIOS1IOHandler() (in /home/axel/build/openPMD-api-build/lib/libopenPMD.ADIOS1.Serial.so)
==30204==    by 0x1DEB58: openPMD::Writable::~Writable() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x176F05: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204==    by 0x142F21: ____C_A_T_C_H____T_E_S_T____2() (in /home/axel/build/openPMD-api-build/bin/SerialIOTests)
==30204== 
HZDR sample not accessible. (Supplied directory is not valid: ../samples/hzdr-sample/bp/)
===============================================================================
All tests passed (26 assertions in 5 test cases)

==30204== 
==30204== HEAP SUMMARY:
==30204==     in use at exit: 2,641,333 bytes in 26,714 blocks
==30204==   total heap usage: 77,532 allocs, 50,818 frees, 93,048,116 bytes allocated
==30204== 
==30204== LEAK SUMMARY:
==30204==    definitely lost: 3,480 bytes in 29 blocks
==30204==    indirectly lost: 2,636,176 bytes in 26,682 blocks
==30204==      possibly lost: 0 bytes in 0 blocks
==30204==    still reachable: 1,677 bytes in 3 blocks
==30204==         suppressed: 0 bytes in 0 blocks
==30204== Rerun with --leak-check=full to see details of leaked memory
==30204== 
==30204== For counts of detected and suppressed errors, rerun with: -v
==30204== Use --track-origins=yes to see where uninitialised values come from
==30204== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

@ax3l
Copy link
Contributor Author

ax3l commented Aug 8, 2018

@pnorbert I guess the variables for your test need to be large enough to trigger the ADIOS realloc. So > 16MB if I remember the defaults right ;-)

@pnorbert
Copy link
Contributor

pnorbert commented Aug 8, 2018 via email

@pnorbert
Copy link
Contributor

pnorbert commented Aug 8, 2018 via email

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

Thanks, great news! Just ping me if I shall check again with updates in master.

@pnorbert
Copy link
Contributor

pnorbert commented Aug 9, 2018

This simple example exhibits the valgrind problem with long doubles.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main (int argc, char ** argv) 
{
    char        filename[] = "test_longdouble_valgrind.data";
    long double ld1 = 1.2345e+80;

    long double *bufm = (long double *) malloc (sizeof(long double));
    long double *bufc = (long double *) calloc (1, sizeof(long double));

    memcpy (bufm, &ld1, sizeof(long double));
    memcpy (bufc, &ld1, sizeof(long double));

    int fd = creat(filename, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    write(fd, bufm, sizeof(long double));
    write(fd, bufc, sizeof(long double));
    close(fd);
    free(bufm);
    free(bufc);
    return 0;
}

Valgrind complains about both the malloc'd and calloc'd buffers.

$ valgrind ./test_longdouble_valgrind 
==15574== Memcheck, a memory error detector
==15574== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==15574== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==15574== Command: ./test_longdouble_valgrind
==15574== 
==15574== Syscall param write(buf) points to uninitialised byte(s)
==15574==    at 0x4F312C0: __write_nocancel (syscall-template.S:84)
==15574==    by 0x40083F: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574==  Address 0x520404a is 10 bytes inside a block of size 16 alloc'd
==15574==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15574==    by 0x4007D8: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574== 
==15574== Syscall param write(buf) points to uninitialised byte(s)
==15574==    at 0x4F312C0: __write_nocancel (syscall-template.S:84)
==15574==    by 0x400855: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)
==15574==  Address 0x520409a is 10 bytes inside a block of size 16 alloc'd
==15574==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15574==    by 0x4007EB: main (in /home/adios/work/test/other_tests/test_longdouble_valgrind)

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

Do you know how to fix it?

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

Oh dang, it's a stdlib or valgrind issue!

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

Hm, maybe that's relevant: http://valgrind.org/docs/manual/manual-core.html

Precision: There is no support for 80 bit arithmetic. Internally, Valgrind represents all such "long double" numbers in 64 bits, and so there may be some differences in results. Whether or not this is critical remains to be seen. Note, the x86/amd64 fldt/fstpt instructions (read/write 80-bit numbers) are correctly simulated, using conversions to/from 64 bits, so that in-memory images of 80-bit numbers look correct if anyone wants to see.

10 bytes inside a block of size 16 alloc'd

@pnorbert
Copy link
Contributor

pnorbert commented Aug 9, 2018

This is a valgrind issue.
Filling the buffers with some value (255) before the memcpy does not help.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main (int argc, char ** argv)
{
    char        filename[] = "test_longdouble_valgrind.data";
    long double ld1 = 1.2345e+80;

    long double *bufm = (long double *) malloc (sizeof(long double));
    long double *bufc = (long double *) calloc (1, sizeof(long double));

    memset (bufm, 255, sizeof(long double));
    memset (bufc, 255, sizeof(long double));

    printf("bufm[15]=%3.3hhu\n", ((char*)bufm)[15]);

    memcpy (bufm, &ld1, sizeof(long double));
    memcpy (bufc, &ld1, sizeof(long double));

    printf("bufm[15]=%3.3hhu\n", ((char*)bufm)[15]);

    int fd = creat(filename, S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    write(fd, bufm, sizeof(long double));
    write(fd, bufc, sizeof(long double));
    close(fd);

    free(bufm);
    free(bufc);

    return 0;
}

The file content shows that all 16 bytes of the buffer has been filled. The last 6 bytes are zeros.

$ ./test_longdouble_valgrind 
bufm[15]=255
bufm[15]=000
$ od -bv test_longdouble_valgrind.data 
0000000 000 350 363 217 374 121 104 205 011 101 100 000 000 000 000 000
0000020 000 350 363 217 374 121 104 205 011 101 100 000 000 000 000 000

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

Ok, I'll report that upstream with your example.

@ax3l
Copy link
Contributor Author

ax3l commented Aug 9, 2018

I reported your example upstream in https://bugs.kde.org/show_bug.cgi?id=397313

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants