New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with running nanomq 0.20.5 on 32bit armv7a architecture #1506
Comments
More valgrind output with more flags: |
Could you verify if this issue only happens on 0.20.5? does 0.19.5 works fine? |
Yeah correct 0.19.5 and 0.20.0 works fine, atleast for a while until the memory leak crashes the program. |
Thats strange. No memleak detected with ASAN on my dev env. Noted that session keeping and msg caching might result in the wrong report of memleak. Could you submit your memleak info? And this issue is a mistake of mine, due to this pr nanomq/NanoNNG@ded86c0 Will re-release 0.20.5 today! |
I have pushed a fix. Could you have a try on the master branch? |
Awesome thanks! |
I suppose the fix works? Will proceed to modify the release. |
Testing the fix right now. For the memleak info, its hard to reproduce as it only occurs after 2+ days of load testing the broker with a high message amount. In 0.19.5 it was crashing more often but in 0.20.0 it was only occurring after 2 days of runtime at a really high message rate. When running normally with a standard message amount we don't notice the issue, and its hard to get debug logs as the rest of the system starts crashing. |
Ok, perhaps AWS IoT bridging is the one to be blame. Heavy test conducted on nanomq, but AWS Bridge is not include, due to lack of resources. Happy to fix it if you got more log. (set nanomq log to "info" is enough) And... what do you mean by "rest of the system starts crashing" ? Seems like an OS issue |
The rest of the system runs out of ram and oom reaper starts killing all the running processes. |
Understood. Limit the queue length (max_mqueue_len) and the num of ctx (parallel) could help on reducing the memory ceiling of nanomq:
|
Will try that thanks! Will open a new issue once I get some debug logs from the aws iot bridge! Also confirming it no longer segfaults when running on latest master on our 32bit armv7 platform. Thanks for the quick help. |
I tried tuning the memory but nomatter what I do I can't get it to reduce below 150 megabytes after a certain amount of runtime it just starts increasing albeit very slowly. cat /nanomq_config.conf |
Hi, I was saying queue length (max_mqueue_len) and the num of ctx (parallel) help limit the memory usage. However, I cannot calculate a precise number of memory consumption. It is also affected by the size of msg and number of connected clients. As for your parameters, you could try to reduce the number of msq_len to 256. but it only works if you see logs like "Warning: msg lost due to reach the limit of lmq". (cache msg occupies memory) Otherwise, perhaps the memory is used by sockets and aios, pipes for sustaining MQTT connections. |
Describe the bug
Seeing issues with nanomq starting as a zombie process when built for this architecture, seems to work on 64bit arm environments but not for 32bit arm environment.
Expected behavior
nanomq should start up not as a zombie process, and should be killable.
Actual Behavior
nanomq never starts up successfully
To Reproduce
If possible include actual reproduction test code here.
Minimal C test cases are perferred.
Environment Details
nanomq old_config file has:
websocket.enable = false
enable_ipc_internal = false
compiler:
armv7a_gcc_6_5_0
binary:
/usr/bin/nanomq: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=8563969b188aac30e2eed2df45bbf3bb2cfdbbfc, stripped
testing scenario
Using on a armv7a embedded device, we also have a aarch64 device that compiles successfully and runs properly on the new release.
Client SDK
We are using AWS IOT core on the other side.
If possible include the mqtt sdk you used to connect to nanomq
Minimal C test cases are perferred.
Additional context
Add any other context about the problem here.
Valgrind output when running:
root@imx6ull~# valgrind --leak-check=full nanomq start --url nmq-tcp://127.0.0.1:1234 --old_conf .nanomq_config.conf
==16633== Memcheck, a memory error detector
==16633== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==16633== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==16633== Command: nanomq start --url nmq-tcp://127.0.0.1:1234 --old_conf ./nanomq_config.conf
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x31D0A: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481A570: calloc (vg_replace_malloc.c:762)
==16633== by 0x31D31: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==16633== by 0x31D3B: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==16633== by 0x31D3B: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==16633== by 0x31D3B: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==16633== by 0x31D3B: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x328CC: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481A570: calloc (vg_replace_malloc.c:762)
==16633== by 0x328F3: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==16633== by 0x328FD: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==16633== by 0x328FD: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==16633== by 0x328FD: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==16633== by 0x328FD: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x31A4C: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x34F60: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481A570: calloc (vg_replace_malloc.c:762)
==16633== by 0x34F8F: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==16633== by 0x34F99: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==16633== by 0x34F99: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==16633== by 0x34F99: ??? (in /usr/bin/nanomq)
==16633==
==16633== Conditional jump or move depends on uninitialised value(s)
==16633== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==16633== by 0x34F99: ??? (in /usr/bin/nanomq)
==16633==
NanoMQ Broker is started successfully!
==16633== Thread 8 nng:task:
==16633== Invalid read of size 1
==16633== at 0x481BAD8: strlen (vg_replace_strmem.c:461)
==16633== by 0x460D599F: __vfprintf_internal (vfprintf-internal.c:1688)
==16633== by 0x460E0FAD: __vsnprintf_internal (vsnprintf.c:114)
==16633== by 0x460C9D4D: snprintf (snprintf.c:31)
==16633== by 0x21097: ??? (in /usr/bin/nanomq)
==16633== Address 0x18b is not stack'd, malloc'd or (recently) free'd
==16633==
==16633==
==16633== Process terminating with default action of signal 11 (SIGSEGV)
==16633== Access not within mapped region at address 0x18B
==16633== at 0x481BAD8: strlen (vg_replace_strmem.c:461)
==16633== by 0x460D599F: __vfprintf_internal (vfprintf-internal.c:1688)
==16633== by 0x460E0FAD: __vsnprintf_internal (vsnprintf.c:114)
==16633== by 0x460C9D4D: snprintf (snprintf.c:31)
==16633== by 0x21097: ??? (in /usr/bin/nanomq)
==16633== If you believe this happened as a result of a stack
==16633== overflow in your program's main thread (unlikely but
==16633== possible), you can try to increase the size of the
==16633== main thread stack using the --main-stacksize= flag.
==16633== The main thread stack size used in this run was 8388608.
==16633==
==16633== HEAP SUMMARY:
==16633== in use at exit: 21,868 bytes in 99 blocks
==16633== total heap usage: 314 allocs, 215 frees, 146,042 bytes allocated
==16633==
==16633== Thread 1:
==16633== 1,440 bytes in 10 blocks are possibly lost in loss record 44 of 46
==16633== at 0x481A5C0: calloc (vg_replace_malloc.c:762)
==16633== by 0x4606CE11: allocate_dtv (dl-tls.c:286)
==16633== by 0x4606D423: _dl_allocate_tls (dl-tls.c:532)
==16633== by 0x461976BB: allocate_stack (allocatestack.c:622)
==16633== by 0x461976BB: pthread_create@@GLIBC_2.4 (pthread_create.c:662)
==16633== by 0x1E23F: ??? (in /usr/bin/nanomq)
==16633==
==16633== LEAK SUMMARY:
==16633== definitely lost: 0 bytes in 0 blocks
==16633== indirectly lost: 0 bytes in 0 blocks
==16633== possibly lost: 1,440 bytes in 10 blocks
==16633== still reachable: 20,428 bytes in 89 blocks
==16633== suppressed: 0 bytes in 0 blocks
==16633== Reachable blocks (those to which a pointer was found) are not shown.
==16633== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==16633==
==16633== Use --track-origins=yes to see where uninitialised values come from
==16633== For lists of detected and suppressed errors, rerun with: -s
==16633== ERROR SUMMARY: 1002 errors from 21 contexts (suppressed: 0 from 0)
Segmentation fault``
root@imx6ull:~# valgrind --leak-check=full --track-origins=yes nanomq start --url nmq-tcp://127.0.0.1:1234 --old_conf ./nanomq_config.conf
==1318== Memcheck, a memory error detector
==1318== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1318== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1318== Command: nanomq start --url nmq-tcp://127.0.0.1:1234 --old_conf /nanomq_config.conf
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x31D0A: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481A570: calloc (vg_replace_malloc.c:762)
==1318== by 0x31D31: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==1318== by 0x31D3B: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==1318== by 0x31D3B: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==1318== by 0x31D3B: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==1318== by 0x31D3B: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x328CC: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481A570: calloc (vg_replace_malloc.c:762)
==1318== by 0x328F3: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==1318== by 0x328FD: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==1318== by 0x328FD: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==1318== by 0x328FD: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==1318== by 0x328FD: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x31A4C: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x34F60: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481A570: calloc (vg_replace_malloc.c:762)
==1318== by 0x34F8F: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BE94: strncpy (vg_replace_strmem.c:552)
==1318== by 0x34F99: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BEC0: strncpy (vg_replace_strmem.c:552)
==1318== by 0x34F99: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF10: strncpy (vg_replace_strmem.c:552)
==1318== by 0x34F99: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
==1318== Conditional jump or move depends on uninitialised value(s)
==1318== at 0x481BF34: strncpy (vg_replace_strmem.c:552)
==1318== by 0x34F99: ??? (in /usr/bin/nanomq)
==1318== Uninitialised value was created by a heap allocation
==1318== at 0x48179A0: malloc (vg_replace_malloc.c:309)
==1318== by 0x49E5BD1: getdelim (iogetdelim.c:62)
==1318== by 0x3AE19: ??? (in /usr/bin/nanomq)
==1318==
NanoMQ Broker is started successfully!
==1318== Thread 8 nng:task:
==1318== Invalid read of size 1
==1318== at 0x481BAD8: strlen (vg_replace_strmem.c:461)
==1318== by 0x49DF99F: __vfprintf_internal (vfprintf-internal.c:1688)
==1318== by 0x49EAFAD: __vsnprintf_internal (vsnprintf.c:114)
==1318== by 0x49D3D4D: snprintf (snprintf.c:31)
==1318== by 0x21097: ??? (in /usr/bin/nanomq)
==1318== Address 0x18b is not stack'd, malloc'd or (recently) free'd
==1318==
==1318==
==1318== Process terminating with default action of signal 11 (SIGSEGV)
==1318== Access not within mapped region at address 0x18B
==1318== at 0x481BAD8: strlen (vg_replace_strmem.c:461)
==1318== by 0x49DF99F: __vfprintf_internal (vfprintf-internal.c:1688)
==1318== by 0x49EAFAD: __vsnprintf_internal (vsnprintf.c:114)
==1318== by 0x49D3D4D: snprintf (snprintf.c:31)
==1318== by 0x21097: ??? (in /usr/bin/nanomq)
==1318== If you believe this happened as a result of a stack
==1318== overflow in your program's main thread (unlikely but
==1318== possible), you can try to increase the size of the
==1318== main thread stack using the --main-stacksize= flag.
==1318== The main thread stack size used in this run was 8388608.
==1318==
==1318== HEAP SUMMARY:
==1318== in use at exit: 21,868 bytes in 99 blocks
==1318== total heap usage: 314 allocs, 215 frees, 146,042 bytes allocated
==1318==
==1318== Thread 1:
==1318== 1,440 bytes in 10 blocks are possibly lost in loss record 44 of 46
==1318== at 0x481A5C0: calloc (vg_replace_malloc.c:762)
==1318== by 0x4606CE11: allocate_dtv (dl-tls.c:286)
==1318== by 0x4606D423: _dl_allocate_tls (dl-tls.c:532)
==1318== by 0x497C6BB: allocate_stack (allocatestack.c:622)
==1318== by 0x497C6BB: pthread_create@@GLIBC_2.4 (pthread_create.c:662)
==1318== by 0x1E23F: ??? (in /usr/bin/nanomq)
==1318==
==1318== LEAK SUMMARY:
==1318== definitely lost: 0 bytes in 0 blocks
==1318== indirectly lost: 0 bytes in 0 blocks
==1318== possibly lost: 1,440 bytes in 10 blocks
==1318== still reachable: 20,428 bytes in 89 blocks
==1318== suppressed: 0 bytes in 0 blocks
==1318== Reachable blocks (those to which a pointer was found) are not shown.
==1318== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1318==
==1318== For lists of detected and suppressed errors, rerun with: -s
==1318== ERROR SUMMARY: 1002 errors from 21 contexts (suppressed: 0 from 0)
Segmentation fault
The text was updated successfully, but these errors were encountered: