New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate unnecessary uses of functions that may panic #447
Comments
There are a lot of uses of |
This depends on issue #453, which is about replacing the |
PR #454 demonstrates a generalization of the |
I'm going to step away from this issue to give room for somebody else to contribute, as this is one of the easiest ways to get started learning the Rustls codebase and improving it. If nobody takes it over in the next week or so then I'll work the rest of this into my schedule over February. |
PR #462 is a refactoring that eliminates some use of the |
Issue #461 is about refactoring some of the state held during the handshake. Such a refactoring will be needed to complete this project, as the reason we need so many
|
I'm midway in a follow-up that splits up |
Here's my assessment of this project so far: We have removed a lot of unwraps and other potential panic points. However, having removed so many, we can now see that there a lot more
I feel like our efforts on improving the code for the fourth class of panics, the least urgent "impossible" ones, is taking too much time from the first two classes, which are urgent. The main benefit to fixing the fourth class is that it makes it easier to grep for "unwrap" and "[" with less noise, so it is easier to find the more important classes of issues. I think, though, we're now probably better off just sorting through the noise and explicitly prioritizing the remaining ones we can find. |
I think your categories are very useful, thanks for writing that up. I've already started triaging some of the remaining panics from your sheet and will continue with that work. I do think the changes that have been made don't only focus on fixing panics, but also arguably make the code easier to navigate and reason about. As such, while the benefits may not be immediate, I do think the work done is valuable. For panics in category 3+, we should eventually annotate the invariants/causes in the source code rather than in a separate GSheet, so that we can keep track of the invariants more easily when assessing future changes. |
In the spreadsheet, we still have ~20 untriaged |
Thanks!
I agree. I am not judging the importance of the work. The refactoring work is very valuable. The Rustls codebase is in much better shape already, and I'm excited about the improvements that continue. I'm just trying to get us to reschedule the more critical work ahead of the continued cleanup efforts.
I agree that once the most urgent work is done, any array indexing and/or I also created a priority #0 in the spreadsheet to mark changes that seem to require public API changes. |
Specifically, I'm hoping we can do something like this:
|
Based on what's left untriaged in the spreadsheet, these are the unwraps that we're still unsure about:
I used |
@repi gave me a list of clippy lints that are useful for this project:
Especially It seems like if we |
@briansmith Hi. There is also https://github.com/philipc/findpanics It's Linux-only (ELF to be more precise) and works only on binaries, but can also be very useful. I'm also interested in some kind of static verification, but there are no good, user-friendly tools yet. And Rust's std has a lot of panics. I mean, even |
I have a plan to deal with the |
Awesome! That would leave just:
|
(The last one is probably also in scope of #667.) |
OK, great, then we'd be down to just:
|
(The unwrap in |
Seeing #283, this issue, https://www.abetterinternet.org/post/preparing-rustls-for-wider-adoption/ (which led me here), and https://daniel.haxx.se/blog/2020/10/09/rust-in-curl-with-hyper/ discussing OOM failures in particular, I am wondering if rust-lang/rust#84266, to make it easier to audit allocation failure paths, would help with this? |
Another class of panics or worse, undefined behavior, is arithmetic underflows/overflows. Normally only debug builds panic on overflow/underflow but release builds can be configured to also panic in those situations. In ring I try to use checked arithmetic ( |
@ctz I see that this is the last remaining issue in the 0.20 release milestone. However, this issue is very broad. Could we get more specific about which parts of this work are remaining for the 0.20 milestone? |
Here's what I suggest for this project:
I am investigating a potential reachable panic in ring that would warrant a new version of Rustls that enforces a higher minimum version of ring is used. |
Would the higher minimum version of ring still be 0.16.x, or do you mean requiring ring 0.17? |
Is there a publicly accessible list of remaining issues that need tackling for this? |
There is not. What is your interest? |
I was initially just curious about the plan for a 0.2.0 release and I came across this issue. Depending on what needs tackling, I may have some time to work on some smaller items. I realise though that publishing this list of issues is essentially publishing a list of unfixed exploits, so I understand why it's not public. |
At this point I don't think this issue is blocking the 0.20 release. If you'd like to work on something, it might be useful to eliminate the different forms of panicking (for example, (This is similar to how we have a Would be great if you can help out with this; let me know if you have further questions! |
Can you elaborate on where you mean by |
In the
|
It seems like the next step in this is enabling the clippy and/or other static verification in CI that we're avoiding panicking APIs, with the appropriate |
Last I checked there were still some obvious offenders, notably the client-side early data implementation. |
In the past, we've seen many issues filed, and many PRs merged, attempting to minimize the potential for panics. However, there has been no concerted effort to do so before now. We should make a concerted effort to make Rustls panic-free, tracked by this issue.
We should prefer to refactor the code so that, when practical (and sometimes maybe even when not so practical) the use of functions that panic are not used. This includes
panic!()
,Option::unwrap()
,Result::unwrap()
, etc.In the past, we've seen that we were often able to rewrite code so that no failure was possible in the new version. I.e. the old code used some function that returned a
Result()
indicating it might fail, but the new code uses only functions that always succeed. This is what we should prefer to do when we can.In some cases, we are certain that a panic cannot occur, but we must use a packing function like
Result::unwrap()
because the underlying APIs and/or the Rust type system provide us no alternative. In these cases we should try to encapsulate the thinking we're using to decide that the panic is truly impossible into a reusable and clearly-documented function and/or type, and use that type to eliminate "bare" uses ofunwrap()
. We should prefer doing this over bubbling truly impossible errors up to the caller.When we must use a function that truly is fallible, we must avoid using
unwrap()
and similar on theResult
(orOption
or whatever) it returns. If practical, we should handle the failure in some way that allows us to keep making forward progress. Otherwise, we must bubble the error up to the caller so that the application can decide what to do with it.We should have some static analysis in CI that gates the introduction of potentially-panicking code, especially hard-to-see ones like array indexing using the
[]
operator.In some cases, a combination of the above probably needs to be done. For example, we should refactor the code so that Mutexes are held only for the bare minimum number of operations, and we must ensure that none of those operations panic. We should create an abstraction for taking a mutex that we're certain will never be poisoned into a reusable API, and replace all uses of
unwrap()
in mutex acquisition with that abstraction. Such an abstraction should have a clear name that allows us to easily find uses of it, so that people can regularly find and audit them, asynchronously and independently from the development of the project.In some cases, we should improve the underlying libraries. FWIW, I am open and eager to improving ring and webpki to help Rustls achieve certainty that it is statically panic-free using the Rust type system.
The text was updated successfully, but these errors were encountered: