Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssl: Fix stateless ticket in_window comparison #6019

Merged
merged 1 commit into from
Jul 19, 2022

Conversation

sindrip
Copy link
Contributor

@sindrip sindrip commented May 24, 2022

If a stateless ticket is older than the WindowSize of the bloom filter
then it will be rejected. By subtracting the ReportedAge from the
RealAge we have the drift time, which should become too large during
a replay. Previously the absolute age of the ticket was being used.

Fixes #6014

@github-actions
Copy link
Contributor

github-actions bot commented May 24, 2022

CT Test Results

       2 files       64 suites   47m 43s ⏱️
   735 tests    668 ✔️   67 💤 0
3 494 runs  2 725 ✔️ 769 💤 0

Results for commit 612fe8f.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@IngelaAndin IngelaAndin added team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI labels May 25, 2022
@IngelaAndin
Copy link
Contributor

IngelaAndin commented Jun 2, 2022

Your test case actually fails on some of our test machines. It might be due to making the window size smaller the possibility for false negatives become higher. There actually is no guarantee with stateless tickets that all valid tickets will be considered valid and used. I think this section of the RFC is what we should go on is:

8.3.  Freshness Checks

   Because the ClientHello indicates the time at which the client sent
   it, it is possible to efficiently determine whether a ClientHello was
   likely sent reasonably recently and only accept 0-RTT for such a
   ClientHello, otherwise falling back to a 1-RTT handshake.  This is
   necessary for the ClientHello storage mechanism described in
   Section 8.2 because otherwise the server needs to store an unlimited
   number of ClientHellos, and is a useful optimization for self-
   contained single-use tickets because it allows efficient rejection of
   ClientHellos which cannot be used for 0-RTT.

   In order to implement this mechanism, a server needs to store the
   time that the server generated the session ticket, offset by an
   estimate of the round-trip time between client and server.  I.e.,

       adjusted_creation_time = creation_time + estimated_RTT

   This value can be encoded in the ticket, thus avoiding the need to
   keep state for each outstanding ticket.  The server can determine the
   client's view of the age of the ticket by subtracting the ticket's
   "ticket_age_add" value from the "obfuscated_ticket_age" parameter in
   the client's "pre_shared_key" extension.  The server can determine
   the expected_arrival_time of the ClientHello as:

     expected_arrival_time = adjusted_creation_time + clients_ticket_age

   When a new ClientHello is received, the expected_arrival_time is then
   compared against the current server wall clock time and if they
   differ by more than a certain amount, 0-RTT is rejected, though the
   1-RTT handshake can be allowed to complete.

 There are several potential sources of error that might cause
   mismatches between the expected_arrival_time and the measured time.
   Variations in client and server clock rates are likely to be minimal,
   though potentially the absolute times may be off by large values.
   Network propagation delays are the most likely causes of a mismatch
   in legitimate values for elapsed time.  Both the NewSessionTicket and
   ClientHello messages might be retransmitted and therefore delayed,
   which might be hidden by TCP.  For clients on the Internet, this
   implies windows on the order of ten seconds to account for errors in
   clocks and variations in measurements; other deployment scenarios may
   have different needs.  Clock skew distributions are not symmetric, so
   the optimal tradeoff may involve an asymmetric range of permissible
   mismatch values.

   Note that freshness checking alone is not sufficient to prevent
   replays because it does not detect them during the error window,
   which -- depending on bandwidth and system capacity -- could include
   billions of replays in real-world settings.  In addition, this
   freshness checking is only done at the time the ClientHello is
   received and not when subsequent early Application Data records are
   received.  After early data is accepted, records may continue to be
   streamed to the server over a longer time period.

I do not think our implementation takes estimated round trip time into account, and that is probably what we want. But I do not have an immediate good idea of how to best do that.

@IngelaAndin IngelaAndin removed testing currently being tested, tag is used by OTP internal CI labels Jun 2, 2022
@sindrip
Copy link
Contributor Author

sindrip commented Jun 2, 2022

Wouldn't adding the estimated round-trip time be considered an enhancement? As it stands, this fix is to implement the original intention of the code, because the anti-replay mechanism is unusable in its current state. Currently the deviation from the RFC is that we calculate expected_arrival_time without adjusting for the estimated connection latency estimated_RTT. It is also not clear to me how one is supposed to derive estimated_RTT when receiving a ClientHello, if it is a constant or per connection.

Regarding the point that not all valid tickets will be considered valid and used. I'm assuming this is induced by the bloom filter that is used to validate tickets. Doesn't that imply that each test-case for valid tickets should be ran multiple times and expected to pass x% of times (97% with the default bloom filter values)? Alternatively the tests could be ran with a deterministic data structure for the ticket validation storage, but that is I believe a big rework and requires explicit testing of the bloom filter (I think it's only indirectly tested right now).

8.2. Client Hello Recording
...
Servers MAY
also implement data stores with false positives, such as Bloom
filters, in which case they MUST respond to apparent replay by
rejecting 0-RTT but MUST NOT abort the handshake.

Regarding my original concern about the clock drift time:

8.2. Client Hello Recording
...
Note: If the client's clock is running much faster than the server's,
then a ClientHello may be received that is outside the window in the
future, in which case it might be accepted for 1-RTT, causing a
client retry, and then acceptable later for 0-RTT. This is another
variant of the second form of attack described in Section 8.

Regarding this specific PR, I originally reduced the window_size to keep the test execution time down. I could try tweaking the Lifetime and WindowSize and we could see how that works. Let me know if you want to try this.

@u3s
Copy link
Contributor

u3s commented Jun 7, 2022

I would like clarify discussion in related issue #6014 , before proceeding with this PR.

@u3s
Copy link
Contributor

u3s commented Jun 22, 2022

@peterdmv wrote:

Regarding the solution I would prefer using the naming scheme of the RFC and keeping variables even if they are not used for future reference and maintainability e.g. EstimatedRTT.

I'm not sure how to read that. Examples provided in issue discussion are
different than existing code and also PR content. At the moment, I'm not sure if it is worth to rewrite more code just to match the RFC content - assuming the logic is good and code execution result as expected.

Other comments

  1. changing that might be potentially incompatible(i think it should be targeted to master branch) - because tickets will start being usable(or useful as is suggested) after WindowSize seconds.
    • I understand it that with stateless tickets, user can potentially use them as many times
      as they want (although it is recommended for the clients to use it only once -
      RFC C.4)
    • Today with anti_replay: this is limited to Bloom filter WindowSize seconds
      (which is suggestd to not be correct) - tickets with RealAge bigger than WindowSize will be rejected
    • also Today without anti_replay: this is limited to ticket lifetime
    • With the change proposed: re-use possibility will be limited only with ticket lifetime
      (no matter if with or without anti_replay enabled)
  2. I wonder if readability of code could be improved a little (so conclusion from our
    discussion is reflected):
    • I was considering adding some explanatory comment and/or renaming some functions
      (I'm not sure which is better - what do you think)?
      • function names tls_server_session_ticket:stateless*ticket are kind of ticket oriented
      • they don't cover ClientHello replay protection (anti_replay) flow and might
        be misguiding - I mean it is not only ticket that is checked with anti_replay
  3. docs - maybe anti_replay option manual could be improved so our discussion is
    somehow reflected there as well.

@u3s u3s mentioned this pull request Jun 23, 2022
@u3s
Copy link
Contributor

u3s commented Jun 23, 2022

I've added more tests for anti_replay in #6055.
ticket_reuse_anti_replay will be failing with PR6019, because I wrote it to verify existing behavior which we agreed is not expected.

I hope you will adjust it in this PR, once I merge PR-6055 to master (plan to do it within a day or two).

This will be needed:

@@ -243,7 +243,7 @@ ticketage_smaller_than_windowsize_anti_replay(Config) when is_list(Config) ->
 ticketage_bigger_than_windowsize_anti_replay() ->
     [{doc, "Session resumption with stateless tickets and anti_replay enabled."
       "Fresh ClientHellos."
-      "Ticket age bigger than windowsize. 0-RTT is expected to fail."
+      "Ticket age bigger than windowsize. 0-RTT is expected to succeed."
       "(Erlang client - Erlang server)"}].
 ticketage_bigger_than_windowsize_anti_replay(Config) when is_list(Config) ->
     WindowSize = 3,
@@ -252,10 +252,10 @@ ticketage_bigger_than_windowsize_anti_replay(Config) when is_list(Config) ->
     ssl_test_lib:check_result(Server0, ok, Client0, ok),
     Client1 = anti_replay_helper_connect(Server0, Client0, Port0, ClientNode,
                                          Hostname, ClientOpts,
-                                         {seconds, WindowSize + 2}, false),
+                                         {seconds, WindowSize + 2}, true),
     Client2 = anti_replay_helper_connect(Server0, Client0, Port0, ClientNode,
                                          Hostname, ClientOpts,
-                                         {seconds, 2*WindowSize + 2}, false),
+                                         {seconds, 2*WindowSize + 2}, true),
     process_flag(trap_exit, false),
     [ssl_test_lib:close(A) || A <- [Server0, Client0, Client1, Client2]].

@sindrip sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from 52d334e to 3d1b225 Compare June 28, 2022 13:25
@sindrip
Copy link
Contributor Author

sindrip commented Jun 28, 2022

Let me know once #6055 is merged and I can make the suggested changes

@u3s
Copy link
Contributor

u3s commented Jun 28, 2022

#6055 is merged. please rebase and fix the ticketage_bigger_than_windowsize_anti_replay.

@sindrip sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch 2 times, most recently from 61494f0 to 6360074 Compare June 30, 2022 12:31
@sindrip sindrip changed the base branch from maint to master June 30, 2022 12:38
@sindrip
Copy link
Contributor Author

sindrip commented Jun 30, 2022

I rebased on master and updated the testcase from #6055, what do you think about the testcases that I wrote originally in this PR, the new testcases you added test the same thing @u3s?

  • I wonder if readability of code could be improved a little (so conclusion from our
    discussion is reflected):

    • I was considering adding some explanatory comment and/or renaming some functions
      (I'm not sure which is better - what do you think)?

      • function names tls_server_session_ticket:stateless*ticket are kind of ticket oriented
      • they don't cover ClientHello replay protection (anti_replay) flow and might
        be misguiding - I mean it is not only ticket that is checked with anti_replay

I think comments surrounding the checks with references to the RFC would be fine by me. Do you want me to cover this?

  • docs - maybe anti_replay option manual could be improved so our discussion is
    somehow reflected there as well.

Do you want me to add it to this PR?

@u3s u3s added the testing currently being tested, tag is used by OTP internal CI label Jun 30, 2022
@u3s
Copy link
Contributor

u3s commented Jun 30, 2022

I rebased on master and updated the testcase from #6055, what do you think about the testcases that I wrote originally in this PR, the new testcases you added test the same thing @u3s?

I'm fine with leaving them as your tests are more of a white box tests. My recent addition is more end2end.

think comments surrounding the checks with references to the RFC would be fine by me. Do you want me to cover this?

yes please.

   docs - maybe anti_replay option manual could be improved so our discussion is
   somehow reflected there as well.

Do you want me to add it to this PR?

yes, if not a problem for you to make a small adjustment. Maybe its a matter of adding ClientHello around "freshness checks" - to indicate what we're recording and checking with anti_replay - I'm looking here.

@sindrip sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from 6360074 to b3618af Compare July 4, 2022 09:56
@sindrip
Copy link
Contributor Author

sindrip commented Jul 4, 2022

I did some slight documentation updates, but it is kind of adressed here: https://www.erlang.org/doc/apps/ssl/using_ssl.html#anti-replay-protection-in-tls-1.3

lib/ssl/doc/src/using_ssl.xml Outdated Show resolved Hide resolved
If a stateless ticket is older than the WindowSize of the bloom filter
then it will be rejected. By subtracting the ReportedAge from the
RealAge we have the DeltaAge, which will become too large during
a replay. Previously the absolute age of the ticket was being used.
@sindrip sindrip force-pushed the ssl/stateless-anti-replay-window-fix branch from b3618af to 612fe8f Compare July 6, 2022 12:52
@u3s u3s added testing currently being tested, tag is used by OTP internal CI and removed testing currently being tested, tag is used by OTP internal CI labels Jul 6, 2022
@u3s u3s merged commit 09c601f into erlang:master Jul 19, 2022
@u3s
Copy link
Contributor

u3s commented Jul 19, 2022

thanks for contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in progress team:PS Assigned to OTP team PS testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TLS 1.3 stateless session resumption with anti_replay ignores ticket lifetime
3 participants