ssl: Fix stateless ticket in_window comparison #6019

sindrip · 2022-05-24T14:02:38Z

If a stateless ticket is older than the WindowSize of the bloom filter
then it will be rejected. By subtracting the ReportedAge from the
RealAge we have the drift time, which should become too large during
a replay. Previously the absolute age of the ticket was being used.

Fixes #6014

github-actions · 2022-05-24T14:04:08Z

CT Test Results

      2 files     64 suites 47m 43s ⏱️
  735 tests   668 ✔️   67 💤 0 ❌
3 494 runs 2 725 ✔️ 769 💤 0 ❌

Results for commit 612fe8f.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

IngelaAndin · 2022-06-02T09:50:35Z

Your test case actually fails on some of our test machines. It might be due to making the window size smaller the possibility for false negatives become higher. There actually is no guarantee with stateless tickets that all valid tickets will be considered valid and used. I think this section of the RFC is what we should go on is:

8.3.  Freshness Checks

   Because the ClientHello indicates the time at which the client sent
   it, it is possible to efficiently determine whether a ClientHello was
   likely sent reasonably recently and only accept 0-RTT for such a
   ClientHello, otherwise falling back to a 1-RTT handshake.  This is
   necessary for the ClientHello storage mechanism described in
   Section 8.2 because otherwise the server needs to store an unlimited
   number of ClientHellos, and is a useful optimization for self-
   contained single-use tickets because it allows efficient rejection of
   ClientHellos which cannot be used for 0-RTT.

   In order to implement this mechanism, a server needs to store the
   time that the server generated the session ticket, offset by an
   estimate of the round-trip time between client and server.  I.e.,

       adjusted_creation_time = creation_time + estimated_RTT

   This value can be encoded in the ticket, thus avoiding the need to
   keep state for each outstanding ticket.  The server can determine the
   client's view of the age of the ticket by subtracting the ticket's
   "ticket_age_add" value from the "obfuscated_ticket_age" parameter in
   the client's "pre_shared_key" extension.  The server can determine
   the expected_arrival_time of the ClientHello as:

     expected_arrival_time = adjusted_creation_time + clients_ticket_age

   When a new ClientHello is received, the expected_arrival_time is then
   compared against the current server wall clock time and if they
   differ by more than a certain amount, 0-RTT is rejected, though the
   1-RTT handshake can be allowed to complete.

 There are several potential sources of error that might cause
   mismatches between the expected_arrival_time and the measured time.
   Variations in client and server clock rates are likely to be minimal,
   though potentially the absolute times may be off by large values.
   Network propagation delays are the most likely causes of a mismatch
   in legitimate values for elapsed time.  Both the NewSessionTicket and
   ClientHello messages might be retransmitted and therefore delayed,
   which might be hidden by TCP.  For clients on the Internet, this
   implies windows on the order of ten seconds to account for errors in
   clocks and variations in measurements; other deployment scenarios may
   have different needs.  Clock skew distributions are not symmetric, so
   the optimal tradeoff may involve an asymmetric range of permissible
   mismatch values.

   Note that freshness checking alone is not sufficient to prevent
   replays because it does not detect them during the error window,
   which -- depending on bandwidth and system capacity -- could include
   billions of replays in real-world settings.  In addition, this
   freshness checking is only done at the time the ClientHello is
   received and not when subsequent early Application Data records are
   received.  After early data is accepted, records may continue to be
   streamed to the server over a longer time period.

I do not think our implementation takes estimated round trip time into account, and that is probably what we want. But I do not have an immediate good idea of how to best do that.

sindrip · 2022-06-02T11:44:52Z

Wouldn't adding the estimated round-trip time be considered an enhancement? As it stands, this fix is to implement the original intention of the code, because the anti-replay mechanism is unusable in its current state. Currently the deviation from the RFC is that we calculate expected_arrival_time without adjusting for the estimated connection latency estimated_RTT. It is also not clear to me how one is supposed to derive estimated_RTT when receiving a ClientHello, if it is a constant or per connection.

Regarding the point that not all valid tickets will be considered valid and used. I'm assuming this is induced by the bloom filter that is used to validate tickets. Doesn't that imply that each test-case for valid tickets should be ran multiple times and expected to pass x% of times (97% with the default bloom filter values)? Alternatively the tests could be ran with a deterministic data structure for the ticket validation storage, but that is I believe a big rework and requires explicit testing of the bloom filter (I think it's only indirectly tested right now).

8.2. Client Hello Recording
...
Servers MAY
also implement data stores with false positives, such as Bloom
filters, in which case they MUST respond to apparent replay by
rejecting 0-RTT but MUST NOT abort the handshake.

Regarding my original concern about the clock drift time:

8.2. Client Hello Recording
...
Note: If the client's clock is running much faster than the server's,
then a ClientHello may be received that is outside the window in the
future, in which case it might be accepted for 1-RTT, causing a
client retry, and then acceptable later for 0-RTT. This is another
variant of the second form of attack described in Section 8.

Regarding this specific PR, I originally reduced the window_size to keep the test execution time down. I could try tweaking the Lifetime and WindowSize and we could see how that works. Let me know if you want to try this.

u3s · 2022-06-07T08:27:52Z

I would like clarify discussion in related issue #6014 , before proceeding with this PR.

u3s · 2022-06-22T16:56:56Z

@peterdmv wrote:

Regarding the solution I would prefer using the naming scheme of the RFC and keeping variables even if they are not used for future reference and maintainability e.g. EstimatedRTT.

I'm not sure how to read that. Examples provided in issue discussion are
different than existing code and also PR content. At the moment, I'm not sure if it is worth to rewrite more code just to match the RFC content - assuming the logic is good and code execution result as expected.

Other comments

changing that might be potentially incompatible(i think it should be targeted to master branch) - because tickets will start being usable(or useful as is suggested) after WindowSize seconds.
- I understand it that with stateless tickets, user can potentially use them as many times
  as they want (although it is recommended for the clients to use it only once -
  RFC C.4)
- Today with anti_replay: this is limited to Bloom filter WindowSize seconds
  (which is suggestd to not be correct) - tickets with RealAge bigger than WindowSize will be rejected
- also Today without anti_replay: this is limited to ticket lifetime
- With the change proposed: re-use possibility will be limited only with ticket lifetime
  (no matter if with or without anti_replay enabled)
I wonder if readability of code could be improved a little (so conclusion from our
discussion is reflected):
- I was considering adding some explanatory comment and/or renaming some functions
  (I'm not sure which is better - what do you think)?
  - function names tls_server_session_ticket:stateless*ticket are kind of ticket oriented
  - they don't cover ClientHello replay protection (anti_replay) flow and might
    be misguiding - I mean it is not only ticket that is checked with anti_replay
docs - maybe anti_replay option manual could be improved so our discussion is
somehow reflected there as well.

lib/ssl/src/tls_server_session_ticket.erl

u3s · 2022-06-23T11:26:27Z

I've added more tests for anti_replay in #6055.
ticket_reuse_anti_replay will be failing with PR6019, because I wrote it to verify existing behavior which we agreed is not expected.

I hope you will adjust it in this PR, once I merge PR-6055 to master (plan to do it within a day or two).

This will be needed:

@@ -243,7 +243,7 @@ ticketage_smaller_than_windowsize_anti_replay(Config) when is_list(Config) ->
 ticketage_bigger_than_windowsize_anti_replay() ->
     [{doc, "Session resumption with stateless tickets and anti_replay enabled."
       "Fresh ClientHellos."
-      "Ticket age bigger than windowsize. 0-RTT is expected to fail."
+      "Ticket age bigger than windowsize. 0-RTT is expected to succeed."
       "(Erlang client - Erlang server)"}].
 ticketage_bigger_than_windowsize_anti_replay(Config) when is_list(Config) ->
     WindowSize = 3,
@@ -252,10 +252,10 @@ ticketage_bigger_than_windowsize_anti_replay(Config) when is_list(Config) ->
     ssl_test_lib:check_result(Server0, ok, Client0, ok),
     Client1 = anti_replay_helper_connect(Server0, Client0, Port0, ClientNode,
                                          Hostname, ClientOpts,
-                                         {seconds, WindowSize + 2}, false),
+                                         {seconds, WindowSize + 2}, true),
     Client2 = anti_replay_helper_connect(Server0, Client0, Port0, ClientNode,
                                          Hostname, ClientOpts,
-                                         {seconds, 2*WindowSize + 2}, false),
+                                         {seconds, 2*WindowSize + 2}, true),
     process_flag(trap_exit, false),
     [ssl_test_lib:close(A) || A <- [Server0, Client0, Client1, Client2]].

sindrip · 2022-06-28T13:27:29Z

Let me know once #6055 is merged and I can make the suggested changes

u3s · 2022-06-28T15:01:23Z

#6055 is merged. please rebase and fix the ticketage_bigger_than_windowsize_anti_replay.

sindrip · 2022-06-30T12:45:48Z

I rebased on master and updated the testcase from #6055, what do you think about the testcases that I wrote originally in this PR, the new testcases you added test the same thing @u3s?

I wonder if readability of code could be improved a little (so conclusion from our
discussion is reflected):

I was considering adding some explanatory comment and/or renaming some functions
(I'm not sure which is better - what do you think)?

function names tls_server_session_ticket:stateless*ticket are kind of ticket oriented

they don't cover ClientHello replay protection (anti_replay) flow and might
be misguiding - I mean it is not only ticket that is checked with anti_replay

I think comments surrounding the checks with references to the RFC would be fine by me. Do you want me to cover this?

docs - maybe anti_replay option manual could be improved so our discussion is
somehow reflected there as well.

Do you want me to add it to this PR?

u3s · 2022-06-30T13:13:52Z

I rebased on master and updated the testcase from #6055, what do you think about the testcases that I wrote originally in this PR, the new testcases you added test the same thing @u3s?

I'm fine with leaving them as your tests are more of a white box tests. My recent addition is more end2end.

think comments surrounding the checks with references to the RFC would be fine by me. Do you want me to cover this?

yes please.

   docs - maybe anti_replay option manual could be improved so our discussion is
   somehow reflected there as well.

Do you want me to add it to this PR?

yes, if not a problem for you to make a small adjustment. Maybe its a matter of adding ClientHello around "freshness checks" - to indicate what we're recording and checking with anti_replay - I'm looking here.

sindrip · 2022-07-04T10:55:44Z

I did some slight documentation updates, but it is kind of adressed here: https://www.erlang.org/doc/apps/ssl/using_ssl.html#anti-replay-protection-in-tls-1.3

lib/ssl/doc/src/using_ssl.xml

If a stateless ticket is older than the WindowSize of the bloom filter then it will be rejected. By subtracting the ReportedAge from the RealAge we have the DeltaAge, which will become too large during a replay. Previously the absolute age of the ticket was being used.

u3s · 2022-07-19T14:58:23Z

thanks for contribution!

sindrip mentioned this pull request May 24, 2022

TLS 1.3 stateless session resumption with anti_replay ignores ticket lifetime #6014

Closed

IngelaAndin added team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI labels May 25, 2022

IngelaAndin assigned u3s Jun 1, 2022

IngelaAndin removed testing currently being tested, tag is used by OTP internal CI labels Jun 2, 2022

IngelaAndin added the in progress label Jun 18, 2022

u3s reviewed Jun 22, 2022

View reviewed changes

lib/ssl/src/tls_server_session_ticket.erl Outdated Show resolved Hide resolved

u3s mentioned this pull request Jun 23, 2022

ssl: add bloom filter tests #6055

Merged

sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from 52d334e to 3d1b225 Compare June 28, 2022 13:25

sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch 2 times, most recently from 61494f0 to 6360074 Compare June 30, 2022 12:31

sindrip changed the base branch from maint to master June 30, 2022 12:38

u3s added the testing currently being tested, tag is used by OTP internal CI label Jun 30, 2022

sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from 6360074 to b3618af Compare July 4, 2022 09:56

u3s reviewed Jul 5, 2022

View reviewed changes

lib/ssl/doc/src/using_ssl.xml Outdated Show resolved Hide resolved

sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from b3618af to 612fe8f Compare July 6, 2022 12:52

u3s added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Jul 6, 2022

u3s merged commit 09c601f into erlang:master Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ssl: Fix stateless ticket in_window comparison #6019

ssl: Fix stateless ticket in_window comparison #6019

sindrip commented May 24, 2022 •

edited

Loading

github-actions bot commented May 24, 2022 •

edited

Loading

IngelaAndin commented Jun 2, 2022 •

edited by u3s

Loading

sindrip commented Jun 2, 2022

u3s commented Jun 7, 2022

u3s commented Jun 22, 2022 •

edited

Loading

u3s commented Jun 23, 2022 •

edited

Loading

sindrip commented Jun 28, 2022

u3s commented Jun 28, 2022

sindrip commented Jun 30, 2022

u3s commented Jun 30, 2022

sindrip commented Jul 4, 2022

u3s commented Jul 19, 2022

ssl: Fix stateless ticket in_window comparison #6019

ssl: Fix stateless ticket in_window comparison #6019

Conversation

sindrip commented May 24, 2022 • edited Loading

github-actions bot commented May 24, 2022 • edited Loading

CT Test Results

Artifacts

IngelaAndin commented Jun 2, 2022 • edited by u3s Loading

sindrip commented Jun 2, 2022

u3s commented Jun 7, 2022

u3s commented Jun 22, 2022 • edited Loading

Other comments

u3s commented Jun 23, 2022 • edited Loading

sindrip commented Jun 28, 2022

u3s commented Jun 28, 2022

sindrip commented Jun 30, 2022

u3s commented Jun 30, 2022

sindrip commented Jul 4, 2022

u3s commented Jul 19, 2022

sindrip commented May 24, 2022 •

edited

Loading

github-actions bot commented May 24, 2022 •

edited

Loading

IngelaAndin commented Jun 2, 2022 •

edited by u3s

Loading

u3s commented Jun 22, 2022 •

edited

Loading

u3s commented Jun 23, 2022 •

edited

Loading