Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple SSL_ERROR_WANT_ASYNC which is returned by ASYNC engine #1387

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

nidhidamodaran
Copy link

@afrind , Please review the updated changes as per your comments.

The changes are required to handle cases where multiple SSL_ERROR_WANT_ASYNC is returned by the async engine.

Also I have removed the code which closes the async fd as this is my crypto driver handle and should not be closed by upper layers as I require it further in my SSL connection

Copy link
Contributor

@afrind afrind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test that reproduces the issue you are experiencing that demonstrates your fix solves the problem?

@@ -48,8 +48,6 @@ void AsyncPipeReader::close() {

if (closeCb_) {
closeCb_(fd_);
} else {
netops::close(fd_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this seems wrong - close() no longer closes the underlying fd when closeCb_ is not defined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fd was added in async list by underlying openssl. Isn't it better to let the ssl library close the fd once operation is complete.

Also, If I close here, my driver handle will be closed and cannot be used for any further in the SSL connection.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind I will elaborate my use case too here:

I have a crypto device to which asymmetric and symmetric operations from openssl are offloaded. Instead of pipe fd, I am using the driver handle of this device as an async fd, which is signalled whenever the crypto operation is completed. I am using this same async fd throughout the handshake.

This handle has to be open throughout the length of an SSL connection (for record processing too). Also in this design, for each of crypto operation done in ASYNC mode, the SSL_ERROR_WANT_ASYNC is returned. As such after such a request is processed at folly layer and SSL_accept restarted, another crypto operation may return same error and thereby returning multiple WANT_ASYNC errors in sequence, which the existing folly is not handling correctly.

The issue is that if sslSocket_->restartSSLAccept again returns WANT_ASYNC error, asyncOperationFinishCallback_ is not be set null and further events not processed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncPipeWriter is generic and can have uses other than this use case. I think what you want to do instead is set the closeCB_ to a no-op in your case?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind , Thanks for the review. I have removed the change so as to not break other use cases. Please review rest of the code changes.

@nidhidamodaran
Copy link
Author

nidhidamodaran commented Jun 22, 2020

test_patch.txt

@afrind , please use the above patch to reproduce issue I currently face. I created a test case to return multiple SSL_ERROR_WANT_ASYNC, but when run it always returns a SSL_ERROR_WANT_READ between two continuous SSL_ERROR_WANT_ASYNC errors. Though the attached patch is not really creating an async job, but you can use it to quickly reproduce the error.

The original folly code will hang with this change, ie when 2 continuous SSL_ERROR_WANT_ASYNC is received, and this PR fixes the issue. Please verify.
Let me know if you need any further details

@nidhidamodaran
Copy link
Author

Also please note :Existing folly code works correctly if between two SSL_ERRROR_WANT_ASYNC error codes, a non-fatal error code is returned as this will clean all stale data structures.

@nidhidamodaran
Copy link
Author

nidhidamodaran commented Jun 23, 2020

@afrind , I have added a testcase to return SSL_ERROR_WANT_ASYNC multiple times. I have also committed cert/key/CAcert as the one given in repo doesn't seem to be working.

Please review. Please note the error is only reproducible if between 2 SSL_ERROR_WANT_ASYNC errors, no other error codes are returned, say SSL_ERROR_WANT_READ.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@yfeldblum
Copy link
Contributor

Legit.

Can we avoid checking in new keys/certs? It should be possible to generate small keys quickly and in-memory at unit-test time.

@nidhidamodaran
Copy link
Author

@afrind @yfeldblum , I have removed the certs committed earlier and used the existing certs. Please review.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@afrind afrind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments came from another developer internally

@@ -1525,6 +1525,79 @@ static int kRSAEvbExIndex = -1;
static int kRSASocketExIndex = -1;
static constexpr StringPiece kEngineId = "AsyncSSLSocketTest";

int customRsaPubDec(int flen, const unsigned char *from, unsigned char *to, RSA *rsa,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 'static'

return ret;
}

int verifyCb(X509_STORE_CTX *ctx, void *arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 'static'


ssl::EvpPkeyUniquePtr publicEvpPkey(X509_get_pubkey(cert));
EVP_PKEY *pkey = publicEvpPkey.get();
RSA* rsa = EVP_PKEY_get1_RSA(publicEvpPkey.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a leak (openssl _get1 increments (adds by 1) the reference count of the undelrying object), and the reason why tests aren't catching this is because we disable this test when running under ASAN mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 1762 for where we disable

pipeReader_->setReadCB(nullptr);
sslSocket_->setAsyncOperationFinishCallback(nullptr);
sslSocket_->restartSSLAccept();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here explaining why order is significant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind I have addressed all your comments, please review

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afrind has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@afrind
Copy link
Contributor

afrind commented Jul 24, 2020

The test failed with a SEGV internally in opt mode?

*** Aborted at 1595628991 (Unix time, try 'date -d @1595628991') ***
*** Signal 11 (SIGSEGV) (0x0) received by PID 2984783 (pthread TID 0x7fbba672a3c0) (linux TID 2984783) (code: 128), stack trace: ***
    @ 000000000061b6f5 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       ./folly/experimental/symbolizer/SignalHandler.cpp:442
    @ 0000000000000000 (unknown)
    @ 00000000001d6e1e RSA_free
    @ 0000000000193f22 EVP_PKEY_free
    @ 000000000022cbfb pubkey_cb
    @ 00000000000a365e asn1_item_embed_free
    @ 00000000000a397f asn1_template_free
    @ 00000000000a3642 asn1_item_embed_free
    @ 00000000000a397f asn1_template_free
    @ 00000000000a3642 asn1_item_embed_free
    @ 00000000000a38b6 ASN1_item_free
    @ 000000000020b53a OPENSSL_sk_pop_free
    @ 000000000003dffa SSL_free
    @ 000000000056f100 folly::AsyncSSLSocket::~AsyncSSLSocket()
                       ./folly/Memory.h:302
                       -> ./folly/io/async/AsyncSSLSocket.cpp
    @ 000000000056f6bd folly::AsyncSSLSocket::~AsyncSSLSocket()
                       ./folly/io/async/AsyncSSLSocket.cpp:315
    @ 000000000058b634 folly::AsyncSocket::destroy()
                       ./folly/io/async/DelayedDestruction.h:55
                       -> ./folly/io/async/AsyncSocket.cpp
    @ 0000000000448019 folly::AsyncSSLSocketTest_OpenSSL110MultipleAsyncTest_Test::TestBody()
                       ./folly/io/async/DelayedDestruction.h:70
                       -> ./folly/io/async/test/AsyncSSLSocketTest.cpp
    @ 000000000069433e void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2473
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 0000000000685281 testing::Test::Run() [clone .part.595]
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2490
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 0000000000685701 testing::TestInfo::Run() [clone .part.596]
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2481
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 0000000000685894 testing::TestCase::Run() [clone .part.597]
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2801
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 0000000000686e34 testing::internal::UnitTestImpl::RunAllTests()
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4735
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 0000000000687124 testing::UnitTest::Run()
                       /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2473
                       -> /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest-all.cc
    @ 000000000048abe5 main
                       ./third-party-buck/platform007/build/googletest/include/gtest/gtest.h:2255
                       -> ./folly/io/async/test/AsyncSSLSocketTest2.cpp
    @ 00000000000211a5 __libc_start_main
    @ 0000000000420679 _start
                       /home/engshare/third-party2/glibc/2.26/src/glibc-2.26/csu/../sysdeps/x86_64/start.S:120

@nidhidamodaran
Copy link
Author

@afrind please help me reproduce this error. How do I build folly in opt mode ?

@afrind
Copy link
Contributor

afrind commented Jul 28, 2020

I'm actually not sure since we use a different build system internally. I think it's a combination of -NDEBUG and -O3 mostly? If you are linking against jemalloc, try MALLOC_CONF=junk:true,abort:true ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants