Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous fatal alerts are sometimes never sent #2638

Open
Scottmitch opened this issue Feb 15, 2017 · 20 comments
Open

Asynchronous fatal alerts are sometimes never sent #2638

Scottmitch opened this issue Feb 15, 2017 · 20 comments
Labels
triaged: bug The issue/pr is/fixes a bug
Milestone

Comments

@Scottmitch
Copy link

If, when attempting to send an alert, the write buffer is busy or BIO_write does not complete synchronously, there is no good way to tell the system to try again.

This can happen if for example the BIO doesn't have enough free space (~7 bytes) to write an alert.

OpenSSL Version

1.0.2j

Related

https://bugs.chromium.org/p/boringssl/issues/detail?id=130

@Scottmitch
Copy link
Author

fyi @davidben @richsalz

@richsalz
Copy link
Contributor

Thanks. I agree with @davidben 's assessment :)

@bernd-edlinger
Copy link
Member

I think nothing can be done when SSL_write is just in the middle of sending data,
so the last blocked SSL_write needs to be allowed to complete.
And would'nt SSL_shutdown actually return WANT_WRITE until
the pending alert is sent out?

@Scottmitch
Copy link
Author

This is the behavior I observed:

  • SSL_read but this returns -1 with error of SSL_ERROR_SSL (error:140800FF:SSL routines:ssl3_accept:unknown state)
  • SSL_write returns -1 with error of SSL_ERROR_SYSCALL
  • SSL_pending returns 0
  • SSL_shutdown returns 0 and doesn't touch the BIO

Is there more information I can provide to help?

@davidben
Copy link
Contributor

For failures during the handshake, perhaps the right behavior is just to have SSL_do_handshake return SSL_ERROR_WANT_WRITE and defer reporting failure until the alert is successfully written. Though this is a behavior change for consumers; perhaps someone was relying on failing a callback to synchronously fail SSL_do_handshake.

(Failures during SSL_read because a record was bad seem perfectly reasonable to just drop on the floor for simplicity. For those the peer is definitely speaking the protocol wrong rather than having an incompatible configuration, so punting on the alert isn't all that interesting.)

@davidben
Copy link
Contributor

davidben commented Feb 16, 2017

Another option is to perhaps add an API to flush all the pending outgoing stuff and have that work for both. Note this gets pretty tricky if an alert was queued while a SSL_write was pending. The wpend_buf mess means flushing "someone else"'s queued data needs tweaking for the return value to still flow to the correct place.

Things get even messier with warning alerts because they're not closure alerts. Actually, we recently noticed that warning alerts sent during the handshake basically don't work asynchronously. It's only working on accident due to the handshake BIO buffer, but can fail if the buffer boundaries line up at exactly the wrong place. We've since fixed this by taking it out completely and having each protocol (TLS vs. DTLS) be responsible for writing entire handshake flights at a time. This made things dramatically simpler and avoids a ton of state transitions.

Edit: Hrm. Come to think of it, I bet that handshake BIO buffer further makes in-handshake alerts complicated here in OpenSSL...

@bernd-edlinger
Copy link
Member

@Scottmitch yes, I see, but you should consider the first SSL_ERROR_SSL/SYSCALL
fatal, any further use of the SSL after the SSL_read in your scenario may crash at any time.
Avoid even the SSL_shutdown. Only SSL_free is safe, but also that is only true after my recent
patch set.

@Scottmitch
Copy link
Author

@bernd-edlinger - Gotcha. I was just using these methods to see if this would kick OpenSSL to write the pending alert to my BIO for testing purposes.

The goal is to send any alerts to peer before freeing and shutting down the socket. This improves situational awareness and reduces support tickets related to "why did the handshake fail" and/or "why did you close the connection on me".

@bernd-edlinger
Copy link
Member

okay, so before the UNKNOWN_STATE from SSL_read
was there a SSL_read with WANT_WRITE ?

@davidben
Copy link
Contributor

SSL_read does not surface WANT_WRITE when it tries to send a fatal alert and gets blocked.

@bernd-edlinger
Copy link
Member

Hmm...
Would it be an improvement to have a callback immediately before a fatal alert is about to be sent,
in ssl3_dispatch_alert to allow the application to make some room in the wbio so that the alert
always fits in without blocking?

@davidben
Copy link
Contributor

I think that's way too complicated for this.

@mattcaswell
Copy link
Member

No action required. Closing.

@Scottmitch
Copy link
Author

@mattcaswell - What is the expected behavior and application action required if OpenSSL wants to write an error but is not able to as described above?

@mattcaswell
Copy link
Member

I'd just close the connection without an alert.

@Scottmitch
Copy link
Author

IIRC this is the current behavior. The goal is to communicate alerts when ever possible to decrease support load in these situations (as described in #2638 (comment)).

Can you clarify the rational for closing "no action required". I'm assuming the hesitation on this issue involves the following:

  • this is challenging to tackle with the current API
  • assumed unlikely to occur so not worth the effort

are there other reasons (e.g. will be fixed in release X, not an issue, etc...)

@bernd-edlinger
Copy link
Member

When you have reasonably sized buffers, chances are you get to see the alert eventually
I mean when the error reproduces often enough.

@mattcaswell
Copy link
Member

this is challenging to tackle with the current API
assumed unlikely to occur so not worth the effort

Exactly. This would be significant effort to resolve for relatively little gain. I think it is unlikely that we will attempt to resolve this. That's not to say we wouldn't look at PRs if someone else wants to have a go at it though.

@Scottmitch
Copy link
Author

Scottmitch commented Jan 23, 2018

chances are you get to see the alert eventually

The issue is not "the alert will never been seen", but instead under certain circumstances it may occur and these edge cases can be complex/time consuming to identify root causes. These scenarios often involve multiple teams (application devs, framework devs, etc...) to try to pin down, and because they can be rare it is that much more difficult to reproduce. Having a reliable way to propagate errors becomes critical to empowering individuals with varied exposure to OpenSSL (and TLS in general) debug issues.

Exactly ... That's not to say we wouldn't look at PRs if someone else wants to have a go at it though.

Fair enough. However if there are API changes on the table in the future I would encourage the OpenSSL community to consider "reliable propagation of errors / control messages".

@mattcaswell mattcaswell added this to the 1.2.0 milestone Jan 23, 2018
@mattcaswell
Copy link
Member

Fair enough. However if there are API changes on the table in the future I would encourage the OpenSSL community to consider "reliable propagation of errors / control messages".

I reopened this and put it against the 1.2.0 milestone which is the next time we will do any changes to existing APIs. At least we should review this issue then and see if we want to do anything about it.

@mattcaswell mattcaswell reopened this Jan 23, 2018
@richsalz richsalz modified the milestones: 1.2.0, Post 1.1.0 May 6, 2018
@richsalz richsalz modified the milestones: Post 1.1.0, Assessed May 6, 2018
@t8m t8m modified the milestones: Assessed, Post 3.0.0 May 4, 2021
@t8m t8m added the triaged: bug The issue/pr is/fixes a bug label May 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged: bug The issue/pr is/fixes a bug
Projects
None yet
Development

No branches or pull requests

7 participants