New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ovs-vswitchd: possible memory leak in 2.12.0 (issue is happening since 2.10.1 at least) #168
Comments
|
Possibly related to this issue. Here is the steps:
My environment The suspicious ovs-vswitchd log |
|
Hello, I am seeing the same in 2.10.1 and 2.10.1-r2. |
|
I believe I have addressed this issue. |
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
Hi there, I'm now running on version 2.12.0 and the problem remains the same. If there is any information or help I can provide to find and fix this issue let me know. Thanks. |
|
I am sorry the fix described in https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/361062.html didn't solve your issue. Probably it was more related to #168 (comment) and #168 (comment) issues from which descriptions it was possible to deduce the problem was in the establishment of communication with ovs-vswitchd daemon. For the proper solution it is the most important to reproduce the issue properly. So, it would be nice if you could describe your use case in more detail. Also, it is possible that we are not talking about memory leak but about huge memory consumption which depends on variety of different traffic flows. Unfortunately, I am not the expert in functionality of ovs-vswithcd and I can only predict about possibility to limit memory consumption. There are a couple of experts (for example ben, dumitru, michelson, ilya, wiliamtu...) who I belive can tell you more about this. Anyway, if you can depict in more detail about your use case I am willing to dig again into this issue. |
Hi. The commit "stream_ssl: fix important memory leak in ssl_connect() function" from @damijans is not in v2.12.0 release. @casasfernando, Could you try building from branch-2.12 and not from the v2.12.0 tag? |
|
Hi, |
Please, don't be. I only wanted to update the report since I also reported the same issue to Alpine Linux maintainers and the pinged me to confirm this was still happening today. Regarding the scenario, I think my original post is pretty much detailed so I'm not sure what else I can add to it. The scenario didn't change at all, except that nowadays I'm running on newer versions of the software that I will provide below: OVS command line: /usr/sbin/ovs-vswitchd --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor --mlockall unix:/var/run/openvswitch/db.sock If there is any information that you feel it's missing in the issue description (1st post) don't hesitate to ask I will provide it. Thanks. |
Last time I did share one with Ben but I can't remember if it helped or not. I still have that one around (with the binary including debug symbols) if it makes any sense to share it or I can try to create a new one with the current version instead (probably it makes more sense). |
Is not impossible to do, but certainly not easy at the moment. |
chap will only show the leaked memory's size and content and the number of blocks leaked. Then we will have to guess what data structure in OVS it is, probably using pahole. And we will need much more information on your setup, ex: running a "ovs-bugtool -y". This is definitely not easy. If you want to wait for Alphine using latest OVS version, then that's also make sense. |
While checking valgrind reports after running "make check-valgrind" I have noticed
reports for several tests similar to the following:
....
==5345== Memcheck, a memory error detector
==5345== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5345== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5345== Command: ovsdb-client --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem transact ssl:127.0.0.1:40111 \ \ \ ["ordinals",
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["number",\ "==",\ 1]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 2,\ "name":\ "old\ two"}},
==5345== \ \ \ \ \ \ {"op":\ "update",
==5345== \ \ \ \ \ \ \ "table":\ "ordinals",
==5345== \ \ \ \ \ \ \ "where":\ [["name",\ "==",\ "two"]],
==5345== \ \ \ \ \ \ \ "row":\ {"number":\ 1,\ "name":\ "old\ one"}}]
==5345== Parent PID: 5344
==5345==
==5345==
==5345== HEAP SUMMARY:
==5345== in use at exit: 116,551 bytes in 3,341 blocks
==5345== total heap usage: 5,134 allocs, 1,793 frees, 412,290 bytes allocated
==5345==
==5345== 6,221 (184 direct, 6,037 indirect) bytes in 1 blocks are definitely lost in loss record 498 of 500
==5345== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5345== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5345== by 0x4E56B27: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E5BA11: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4E65145: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5345== by 0x4522DF: ssl_connect (stream-ssl.c:530)
==5345== by 0x443D38: scs_connecting (stream.c:315)
==5345== by 0x443D38: stream_connect (stream.c:338)
==5345== by 0x443FA1: stream_open_block (stream.c:266)
==5345== by 0x40AB79: open_jsonrpc (ovsdb-client.c:507)
==5345== by 0x40AB79: open_rpc (ovsdb-client.c:143)
==5345== by 0x40B06B: do_transact__ (ovsdb-client.c:871)
==5345== by 0x40B245: do_transact (ovsdb-client.c:893)
==5345== by 0x405F76: main (ovsdb-client.c:282)
==5345==
==5345== LEAK SUMMARY:
==5345== definitely lost: 184 bytes in 1 blocks
==5345== indirectly lost: 6,037 bytes in 117 blocks
==5345== possibly lost: 0 bytes in 0 blocks
==5345== still reachable: 110,330 bytes in 3,223 blocks
==5345== suppressed: 0 bytes in 0 blocks
==5345== Reachable blocks (those to which a pointer was found) are not shown.
==5345== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5345==
==5345== For counts of detected and suppressed errors, rerun with: -v
==5345== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
....
This report was extracted from "index uniqueness checking" test and complains about
leaking memory in ovsdb-client application. The problem is not huge, since ovsdb-client
is CLI tool which is constantly reinvoked/restarted, thus leaked memory is not accumulated.
More problematic issue is that for the same test valgrind reports the similar problem also for
ovsdb-server:
....
==5290== Memcheck, a memory error detector
==5290== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==5290== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==5290== Command: ovsdb-server --log-file --detach --no-chdir --pidfile --private-key=/home/damijan.skvarc/doma/ovs/tests/testpki-privkey2.pem --certificate=/home/damijan.skvarc/doma/ovs/tests/testpki-cert2.pem --ca-cert=/home/damijan.skvarc/doma/ovs/tests/testpki-cacert.pem --remote=pssl:0:127.0.0.1 db
==5290== Parent PID: 5289
==5290==
==5292== Warning: noted but unhandled ioctl 0x2403 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292== Warning: noted but unhandled ioctl 0x2400 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5290==
==5290== HEAP SUMMARY:
==5290== in use at exit: 2,066 bytes in 48 blocks
==5290== total heap usage: 87 allocs, 39 frees, 14,152 bytes allocated
==5290==
==5290== LEAK SUMMARY:
==5290== definitely lost: 0 bytes in 0 blocks
==5290== indirectly lost: 0 bytes in 0 blocks
==5290== possibly lost: 0 bytes in 0 blocks
==5290== still reachable: 2,066 bytes in 48 blocks
==5290== suppressed: 0 bytes in 0 blocks
==5290== Reachable blocks (those to which a pointer was found) are not shown.
==5290== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5290==
==5290== For counts of detected and suppressed errors, rerun with: -v
==5290== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1 from 1)
==5292== Warning: noted but unhandled ioctl 0x2401 with no size/direction hints.
==5292== This could cause spurious value errors to appear.
==5292== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==5292==
==5292== HEAP SUMMARY:
==5292== in use at exit: 164,018 bytes in 4,252 blocks
==5292== total heap usage: 17,910 allocs, 13,658 frees, 1,907,468 bytes allocated
==5292==
==5292== 49,720 (1,472 direct, 48,248 indirect) bytes in 8 blocks are definitely lost in loss record 580 of 580
==5292== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5292== by 0x5105E77: CRYPTO_malloc (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E1D23: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E4861: ??? (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E5414: ASN1_item_ex_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x51E546A: ASN1_item_d2i (in /lib/x86_64-linux-gnu/libcrypto.so.1.0.0)
==5292== by 0x4E53E00: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x4E55727: ??? (in /lib/x86_64-linux-gnu/libssl.so.1.0.0)
==5292== by 0x452C4B: ssl_connect (stream-ssl.c:530)
==5292== by 0x445B18: scs_connecting (stream.c:315)
==5292== by 0x445B18: stream_connect (stream.c:338)
==5292== by 0x445B91: stream_recv (stream.c:369)
==5292== by 0x432A9C: jsonrpc_recv.part.7 (jsonrpc.c:310)
==5292== by 0x433977: jsonrpc_recv (jsonrpc.c:1139)
==5292== by 0x433977: jsonrpc_session_recv (jsonrpc.c:1112)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run (jsonrpc-server.c:553)
==5292== by 0x40CCE3: ovsdb_jsonrpc_session_run_all (jsonrpc-server.c:586)
==5292== by 0x40CCE3: ovsdb_jsonrpc_server_run (jsonrpc-server.c:401)
==5292== by 0x40682E: main_loop (ovsdb-server.c:209)
==5292== by 0x40682E: main (ovsdb-server.c:460)
==5292==
==5292== LEAK SUMMARY:
==5292== definitely lost: 1,472 bytes in 8 blocks
==5292== indirectly lost: 48,248 bytes in 936 blocks
==5292== possibly lost: 0 bytes in 0 blocks
==5292== still reachable: 114,298 bytes in 3,308 blocks
==5292== suppressed: 0 bytes in 0 blocks
==5292== Reachable blocks (those to which a pointer was found) are not shown.
==5292== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5292==
==5292== For counts of detected and suppressed errors, rerun with: -v
==5292== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)
....
In this case ovsdb-server is running as daemon process (--detach option) and leaking memory is
accumulated whenever ovsdb-client is reconnected. Within observed test ovsdb-client CLI tool
connects 8 times to ovsdb-server. Leaked memory in ovsdb-client (for each invocation) is approx.
6K bytes, while leaked memory in ovsdb-server is aprox. 48Kbytes what is actually 8*6K. Thus per
each connection both ovsdb-client and ovsdb-server leak approx. 6K bytes.
I have done a small manual test to check if ovsdb-server is indeed accumulating leaked memory
by dumping ovsdb-server in a loop:
console1:
ovsdb-server \
--log-file \
--detach --no-chdir --pidfile \
--private-key=testpki-privkey2.pem \
--certificate=testpki-cert2.pem \
--ca-cert=testpki-cacert.pem \
--remote=pssl:0:127.0.0.1 \
db
while (true); do \
ovsdb-client \
--private-key=testpki-privkey.pem \
--certificate=testpki-cert.pem \
--ca-cert=testpki-cacert.pem \
dump ssl:127.0.0.1:42067; \
done
console2:
watch -n 0.5 'cat /proc/$(pidof ovsdb-server)/status | grep VmSize'
In console2 it was evidently seen ovsdb-server is constantly leaking memory. After a while
(i.e. after a certain number of reconnections) the OOM killer jumps out and kills ovsdb-server.
Very similar situation was already noticed and described in
openvswitch/ovs-issues#168. There, the problem pops up while connecting
controller to ovs-vswitchd daemon.
Valgrind reports point to a problem in openssl library, however after studying openssl code for
a while I have found out the problem is actually in ovs. When connection through SSL channel is
taken place openssl library allocates memory for keeping track of certificate. Reference to this
memory works very similar as std::shared_ptr pointer in recent C++ dialects. i.e. when allocated
memory is referenced its reference counter is incremented and decremented after the memory is
derefered. When reference counter becomes zero allocated memory is automatically deallocated.
In openssl library environment certificate is retrieved by calling SSL_get_peer_certificate()
where its reference counter is incremented. After retrieved certificate is not used any more its
reference counter must be decremented by calling X509_free(). If not, allocated memory is never
freed despite the ssl connection is properly closed.
The problem was caused in stream-ssl.c in function ssl_connect(), which retrieves common peer name
by calling SSL_get_peer_certificate() function and without calling X509_free() function afterwards.
Signed-off-by: Damijan Skvarc <damjan.skvarc@gmail.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
|
We just upgraded from openvswitch 2.10.4 to 2.10.5 for 5 hosts. Three have a RSS about 1g in The host is running ovn-controller managing a integration bridge. There is also another our own build of agent managing 2 bridges with ovs-vsctl and ovs-ofctl. Number of flows are in the lower thousands, about 2000-4000. I tried gcore and some simple chap commands. Here are some findings on rss 1g case
And summarize incoming and outging 0x2668 is 9832 in decimal. pahole on ovs-vswitchd binary indicates that the closest match is |
|
Likely not |
|
|
|
Now I am fairly sure |
|
@yousong I still have the dump from last year in case you want to analyze it and compare it to what you see in yours. At least to understand if it can be the same issue or not. Thanks |
|
Yes, I should have shared my coredump earlier. The xz-ed content is actually very small, only 2.2MB, vs. 3.6GB when decompressed. That's really unexpected. Below is the link, with rpm and debuginfo. https://drive.google.com/drive/folders/1Sr_fd6XitQISdy7gVLhlpdmYS6MWQg5J |
Thanks for sharing the dump but probably I didn’t explain myself correctly. |
Yes, please do. My guess is that it would it take me a while checking with the code and tracking down the actual cause. It will boost the process if other devs more familiar with the codebase got a chance to try. |
|
The patch below should fix my issue with 2.10. Only after blaming master branch I realized that it was already fixed there with commit openvswitch/ovs@584992d . The upstream fix was only included in 2.13 and 2.14. So likely it should also help your case with 2.12 From 6e56458229d81aa48cc0c7e2f4d03af23733f0ba Mon Sep 17 00:00:00 2001
From: Yousong Zhou <yszhou4tech@gmail.com>
Date: Tue, 29 Dec 2020 12:04:27 +0800
Subject: [PATCH] ofproto-dpif: fix xcache mem leak in nxt_resume
Fixes: 72700fe0 ("ofproto-dpif: Fix NXT_RESUME flow stats")
Signed-off-by: Yousong Zhou <yszhou4tech@gmail.com>
---
ofproto/ofproto-dpif.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 6ad7b85fc..51fd0f463 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -5117,6 +5117,7 @@ nxt_resume(struct ofproto *ofproto_,
/* Clean up. */
ofpbuf_uninit(&odp_actions);
dp_packet_uninit(&packet);
+ xlate_cache_uninit(&xcache);
return error;
} |
|
Thanks, @yousong for your analysis! The issue that fixed by commit openvswitch/ovs@584992d was introduced by a different bugfix commit openvswitch/ovs@2f355bf that got accepted to master in 2.11 release time frame. This buggy fix was backported to 2.10 and down to 2.8. So, it was released as part of 2.10.1 release. This matches with the description of this bug report giving at least some confidence that this is the root cause of the issue. I think we need to revisit the patch set https://patchwork.ozlabs.org/project/openvswitch/list/?series=130294&state=* and backport these fixes down to branches where issues were introduced. I will try to find some time for this somewhere next week. Branches 2.11 and below will reach their EOL with release of 2.15, but they will have at least one more stable release before that. |
|
I backported commit openvswitch/ovs@584992d to branches 2.8-2.11. Released as part of v2.11.5, v2.10.6, v2.9.8, and v2.8.10. |
|
Today I upgraded to Alpine 3.13.1 which provides ovs v2.12.2. Thanks |
|
Two and a half days in running v2.12.2 and the memory usage is stable at 12MB. Thanks |
|
Thanks, @casasfernando, for the update. Sounds good so far. |
|
After upgrading to v2.12.2 and monitoring memory usage for a week the problem is to be gone and I'm closing the issue. |
casasfernando commentedMar 2, 2019
•
edited
I'm running OVS on a Qemu/KVM virtualization host for a while now. The virtualization host is running around 12 VMs without a huge network activity. The OVS configuration is very simple, nothing fancy. Four VLANs, two bond ports with two slaves each, three internal ports and then one port for each VM (I can provide the output of ovs-vsctl show if needed).
Recently I noticed that when the ovs-vswitchd process starts it's consuming around 10MB of RAM but after two weeks (the server is running 24x7) ovs-vswitchd can be consuming around 2.5GB of RAM and after let's say a month or so it will be around 5GB or even a bit more.
What I'm doing now to keep ovs-vswitchd memory usage under control is to restart the service every couple of weeks but this is far from ideal.
Please find below some additional information about my setup and feel free to ask for any other details I may have missed.
OVS command line: /usr/sbin/ovs-vswitchd --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor --mlockall unix:/var/run/openvswitch/db.sock
OVS version: 2.10.1
Distro: Alpine Linux 3.9.0
Linux kernel version: 4.19.18
Qemu version: 3.1
Libvirt version: 4.10.0
OVS memory usage from log file:
Also I'm attaching two screenshots from Munin monitoring ovs-vswitchd rss size and meminfo for around five days and how memory usage growth is almost linear to time.
I can provide a memory dump of the process while consuming 500MB of RAM (collected using gcore) if that would help.
Please let me know if you need any more details.
The text was updated successfully, but these errors were encountered: