Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS 1.3 Early data performance issue #3906

Closed
yctung opened this issue Jul 11, 2017 · 7 comments
Closed

TLS 1.3 Early data performance issue #3906

yctung opened this issue Jul 11, 2017 · 7 comments

Comments

@yctung
Copy link

yctung commented Jul 11, 2017

Hi,

Thanks to the awesome OpenSSL community. I have successfully enable the TLS1.3 + early data in my system! Awesome!

However, when I try to do some microbenchmark to this fancy new TLS feature, I find enabling early data turns out consume more time to get the server response :(

Later I try to do a simple test by using OpenSSL's example code and find this issue still exists! Following is my settings and results of this small test:

Test settings

  1. Testing environment: VMWare + Ubuntu 16.04
  2. Testing OpenSSL branch: master (so, TLS 1.3 draft 20)
  3. Socket delay: 50ms (i.e., sudo tc qdisc add dev lo root netem delay 50ms)
  4. code modification: I just copy the early data read feature from s_server.c to its www_body(). Please check the reference in the bottom of this post for knowing the added code
  5. server testing script: openssl s_server -key key.pem -cert cert.pem -accept 44330 -state -www -early_data
  6. client testing script: openssl s_client -state -connect 127.0.0.1:44330 -keylogfile sslkeylog.log -sess_out session.dat -sess_in session.dat -early_data http.txt
    (http.txt is just a dummy GET request)

Testing results

  1. server can get the early data correctly and write the early return back
  2. client can ensure that the early data has been accepted and get the dummy early return (i.e., dummy20early)
  3. Here is the result of wireshark traces: http://imgur.com/xt5vB4j.png

My problem

  1. As shown in the attached figure of wireshark traces, there is no 0RTT. The early return basically wait more than 300ms to be received (instead of 1RTT = 100ms).
  2. There is a huge delay (150ms) between the early data and the following packets sent to the server (I guess that is the client finish and the early data end). Should they be sent back-to-back?
  3. Server needs to wait the session ticket being sent before sending the early reply by SSL_write_early_data(). Is it necessary?
  4. Why not the client hello and early data being squeezed in the same packet? It seems the early data is sent only after the ACK of the client hello received, so no 0RTT for sure.

The above issues seems to make the TLS1.3 + early data not that useful. I haven't been able to check how the statem works when the early data is sent, but this testing result seems wrong to me. Is it something have not been explored by the OpenSSL community? or it is my fault to use OpenSSL in a wrong way?

(I would be happy to help fixing this problem, but I might need some helps)

Reference:

code added before the for(;;) reading loop of www_body in s_server.c
This code is nothing special, just return a dummy html page "dummy20early" when the server gets the early data sent from the client

// *** ADDED BY YCTUNG ***
    BIO_printf(bio_s_out, "early_data flag = %d\n", early_data);
    if (early_data) {
        BIO_printf(bio_s_out, "Going to check early data\n");
        int write_header = 1, edret = SSL_READ_EARLY_DATA_ERROR;
        size_t readbytes;

        while (edret != SSL_READ_EARLY_DATA_FINISH) {
            for (;;) {
                edret = SSL_read_early_data(con, buf, bufsize, &readbytes);
                if (edret != SSL_READ_EARLY_DATA_ERROR)
                    break;

                switch (SSL_get_error(con, 0)) {
                case SSL_ERROR_WANT_WRITE:
                case SSL_ERROR_WANT_ASYNC:
                case SSL_ERROR_WANT_READ:
                    /* Just keep trying - busy waiting */
                    continue;
                default:
                    BIO_printf(bio_err, "Error reading early data\n");
                    ERR_print_errors(bio_err);
                    goto err;
                }
            }
            if (readbytes > 0) {
                if (write_header) {
                    BIO_printf(bio_s_out, "Early data received, readbytes = %ld\n", readbytes);
                    write_header = 0;
                }
                raw_write_stdout(buf, (unsigned int)readbytes);
                (void)BIO_flush(bio_s_out);
            }
        }
        if (SSL_is_init_finished(con))
            print_connection_info(con);

        BIO_printf(bio_s_out, "edret = %d\n", edret);
        if (write_header)
            BIO_printf(bio_s_out, "No early data received\n");
        else {
            BIO_printf(bio_s_out, "\nEnd of early data, readbytes = %ld\n\n", readbytes);char body[]="<HTML><BODY>dummy20early</BODY></HTML>\r\n\r\n";
            int bodySize = sizeof(body) - 1;

            char resp[500];
            int respSize = -1;
            respSize = sprintf(resp, "HTTP/1.0 200 ok\r\nContent-type: text/html\r\nContent-Length: %d\r\n\r\n%s", bodySize, body);
            BIO_printf(bio_s_out, "*** early resp size = %d, data =\n%s\n", respSize, resp);
            
            // quickly return it by early data write
            size_t byteWritten = -1;
            edret = SSL_write_early_data(con, resp, respSize, &byteWritten);
            BIO_flush(io);
     }
}
@yctung
Copy link
Author

yctung commented Jul 11, 2017

Some updates of my follow-up experiments:
(1) try to write the early response once the SSL_read_early_data() return > 0 (instead of waiting the SSL_READ_EARLY_DATA_FINISH) helps, but in such cases, how the server know he receives all the data?
(2) I try to tweak the BIO and there is a way to squeeze the Client Hello + Early Data into the same packet (not sure if it is a on-going work of OpenSSL?)
(3) I suspect I need to do the same thing in the server side, otherwise, there would never be 0RTT

I still have one problem. Is there any way to write the "end of early data" by calling a function? I notice this is always sent after the client finish message (by calling SSL_connect()). Should it be necessary? I try to move this into the same packet of client hello message as well because my system will only send 1 early data.

@yctung
Copy link
Author

yctung commented Jul 11, 2017

New update:

I have solved this problem in my system. I do some hacks to squeeze early data or reply into the same packet of client/server hello if possible.

Here is the screenshot of this result: http://imgur.com/vR5uGWg.png

This change reduces the latency of getting a HTTP response from 3RTT to 1RTT.

Not sure if people thinks it is an important thing for OpenSSL. I currently solve this issue by set my own bio bwrite function and batch the client/server message outside of openssl. It would be better/cleaner if this can be handled by openssl's state machine.

@mattcaswell
Copy link
Member

These are some interesting results. I'll need to do some tests to investigate further and get a better understanding of what is happening.

@InfoHunter
Copy link
Member

Is that the statem_flush() causes the ClientHello is sent before state machine returns to SSL_write_early_data?

WORK_STATE ossl_statem_client_post_work(SSL *s, WORK_STATE wst)
{
    OSSL_STATEM *st = &s->statem;

    s->init_num = 0;

    switch (st->hand_state) {
    default:
        /* No post work to be done */
        break;

    case TLS_ST_CW_CLNT_HELLO:
        if (wst == WORK_MORE_A && statem_flush(s) != 1)
            return WORK_MORE_A;

@mattcaswell
Copy link
Member

The reason this happens is because of Nagle's algorithm (https://en.wikipedia.org/wiki/Nagle%27s_algorithm). Quoting that page:

Nagle's algorithm works by combining a number of small outgoing messages, and sending them all at once. Specifically, as long as there is a sent packet for which the sender has received no acknowledgment, the sender should keep buffering its output until it has a full packet's worth of output, thus allowing output to be sent all at once.

So, OpenSSL immediately sends the ClientHello packet. It then also immediately sends the early application data packet. However at the network stack layer Nagle's algorithm kicks in. We have not yet received an ACK for the initial ClientHello packet so the early application data gets buffered waiting for either more application data (until we have a full packet's worth), or until we've received the ACK. If no more early application data is forthcoming then we do not actually see the early data get sent until we have that ACK come back.

For the purposes of a quick test to prove this theory, I made the following change:

diff --git a/apps/s_socket.c b/apps/s_socket.c
index b2bf244f23..e75bfe0526 100644
--- a/apps/s_socket.c
+++ b/apps/s_socket.c
@@ -92,6 +92,16 @@ int init_client(int *sock, const char *host, const char *port,
             continue;
         }
 
+        if (type == SOCK_STREAM) {
+            ret = BIO_set_tcp_ndelay(*sock, 1);
+            if (ret < 0) {
+                BIO_closesocket(*sock);
+                ERR_print_errors(bio_err);
+                return 0;
+            }
+            ret = 0;
+        }
+
 #ifndef OPENSSL_NO_SCTP
         if (protocol == IPPROTO_SCTP) {
             /*

This has the effect of setting the TCP_NODELAY option on the underlying socket in s_client. With this change the problem is gone - the early application data gets sent immediately.

Therefore there are a few options for fixing this:

  1. We could simply note this effect in the documentation and leave it up to application developers to set TCP_NODELAY as appropriate

  2. We could attempt to automatically set TCP_NODELAY if appropriate

  3. We could ensure that the ClientHello isn't flushed immediately, and we put it into the same packet as the early application data.

The problem with (2) is that it breaks the BIO abstraction. We have to peek inside the BIO to see if it is really a socket and mess with socket options that are really none of our business to play around with. Some applications may have non-trivial BIO set ups (e.g. BIO chains, or mem-BIOs etc) - so this would be difficult/impossible to do in these circumstances.

Option (3) has some merit but only really helps with the first packet of early application data. If you send a second or subsequent packets then you hit the same problem.

Therefore I am currently thinking that option 1 might be the best way forward.

mattcaswell added a commit to mattcaswell/openssl that referenced this issue Jul 18, 2017
In particular add information about the effect of Nagle's algorithm on
early data.

Fixes openssl#3906
@mattcaswell
Copy link
Member

I went with option (1), so there is now some discussion of this issue in the man page.

@yctung
Copy link
Author

yctung commented Jul 19, 2017

Hi Matt,

You are awesome! I did notice the problem might be caused by Nagle's algorithm but my idea was incorrect. I thought the problem is cased by enabling "tcp no delay", so the packets are sent right away before they can be "batched" together in the single packet. It turns out is in the reverse way LoL....

I have confirmed that setting TCP no delay solves this issue :)

Yu-Chih

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants