ErrorKind: rename Transient → Unexpected and revise uses of Unavailable #255

dhardy · 2018-02-08T12:06:52Z

I've not been happy with the Transient error kind for a while, notably because it was never used, and because Unavailable was used for errors which just might be resolvable on retry. This PR keeps Unavailable but replaces most uses with Unexpected.

Further thoughts:

we could keep Transient for cases where we legitimately know there is a good chance of success on retry — although we still have no actual use cases yet, and I suspect mostly it would be handled in the same way as Unexpected
Other is redundant with Unexpected, right? Remove?
Unavailable now has two uses — in JitterRng (see comment in PR) and for WASM (which has an auto-failing OsRng implementation) — so we really can assume it's irrecoverable when handling.

dhardy · 2018-02-15T17:29:16Z

@pitdicker I take it life's found some other priority for you by now; no complaints there. It would however be nice to get your thoughts on this if you can find the time.

vks · 2018-02-16T10:10:28Z

I don't like the name Unexpected, because if a retry might succeed, isn't it an expected failure?

dhardy · 2018-02-16T10:22:28Z

No, it's for unexpected failures; e.g. read failing because no file handles are available. There's a chance this will succeed later and its an unexpected failure. I don't understand your point?

vks · 2018-02-16T10:46:31Z

I was confused by the name and the description and what unexpected means:

Unexpected failure; there is a slim chance of recovery on retry.

If it is is recoverable by retrying, it sounds to me like it is an expected failure, but maybe that is just me.

dhardy · 2018-02-16T10:52:42Z

Oh, by "expected failure" you mean a failure whose possibility has been forseen? Well, so surely that applies to just about any possible failure (including out-of-memory), excepting things like SEGFAULT which don't normally happen in Rust code anyway.

So okay, we're talking about foreseen failures, but not expected ones (statistically, these are all highly unlikely).

pitdicker · 2018-02-17T19:50:12Z

@pitdicker I take it life's found some other priority for you by now; no complaints there. It would however be nice to get your thoughts on this if you can find the time.

A little, sorry. But balance is the trick 😄.

My first thought was: haven't we already been over all the options? But I like the change here.

dhardy · 2018-02-17T20:05:28Z

Yes, we've been over the options, but somehow practice didn't work out as planned (especially since you mentioned wanting to retry on some Unavailable errors).

pitdicker · 2018-02-17T20:12:01Z

Other is redundant with Unexpected, right? Remove?

Seems like a good idea to me. I can't think of any meaningful difference between them.

And using both Unavailable and Unexpected seems definitely better for OsRng.

So 👍 from me.

dhardy · 2018-02-22T10:25:16Z

Rebased to merge with latest master.

dhardy · 2018-02-22T12:16:22Z

@vks @pitdicker I made another significant change.

Unexpected → Failure, because we don't know anything about the probability
Added Transient back, because we do have a use
Added Custom
Reworked handling, especially in ReseedingRng

pitdicker · 2018-02-22T12:35:55Z

To be honest I don't immediately like almost any of the new changes. Can you explain your reasoning behind it a bit more?

vks · 2018-02-22T12:53:43Z

From the description, the difference between Failure ("retry later"), Transient ("transient") and NotReady ("needs more time") is not clear. All three basically suggest retrying later? Looking at how they are handled in the code, maybe they should be called ShouldRetry and ShouldWait? This seems more precise.

The use case for Custom is not clear to me.

dhardy · 2018-02-22T17:31:57Z

Thanks @pitdicker, review is definitely needed.

@vks didn't like the name Unexpected because often this is used for known unlikely errors; I reasoned that these are unlikely — but in general we don't know that. Still, I don't think Unexpected is too bad a name.

Transient I found useful for ReseedingRng, although the necessity of a separate error kind (and the utility of ReseedingRng) are questionable. It doesn't cost much though.

Custom was added to allow users to include other codes, but I don't actually have a use for it. We could remove it. It shouldn't increase the size of Error on 64-bit due to alignment; I did consider using only u16 to make the same true on 32-bit. Maybe I shouldn't be adding it without actual requirement; I don't know.

The changes to handling were simplifications, mostly. ReseedingRng::try_fill_bytes actually had an order dependence between generating via the internal RNG and reseeding due to the possibility of early return, although that may not matter at all due to branch prediction. We don't have an existing benchmark on that and I didn't bother adding one.

dhardy · 2018-02-22T17:34:28Z

Actually @vks I put some effort into documenting exactly what all these different kinds are for. The names you suggested sound better than Transient and NotReady though, thanks.

Possible change: Transient → ShouldRetry, NotReady → ShouldWait, remove Custom

pitdicker · 2018-02-22T17:45:32Z

I will have a careful look, and sorry for the maybe hard words.

To drop two links in here: For dhardy#9 (comment) I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled". And also we discussed in that issue names like Failure. The design of the error enum started with rust-lang/rfcs#2152 (comment) (nice to see how it evolved).

pitdicker · 2018-02-22T17:46:53Z

src/error.rs

+    /// Take the cause, if any. This allows the embedded cause to be extracted.
+    /// This uses `Option::take`, leaving `self` with no cause.
+    #[cfg(feature="std")]
+    pub fn take_cause(&mut self) -> Option<Box<stdError + Send + Sync>> {


Just curious, when would you use this?

stdError does not imply Clone and I didn't wish to add that, so this (or unwrap_cause) would be the only way to get the cause out, which I think might sometimes be useful.

I did consider just making cause a public field (with a hidden dummy field to prevent construction) but wasn't so sure about that.

pitdicker · 2018-02-22T17:50:41Z

src/error.rs

@@ -52,24 +73,41 @@ impl ErrorKind {
    /// A description of this error kind
    pub fn description(self) -> &'static str {
        match self {
-            ErrorKind::Unavailable => "permanent failure",
+            ErrorKind::Unavailable => "permanently unavailable",
+            ErrorKind::Failure => "failure",


Is not any error a 'failure'? I like Unexpected, UnexpectedFailure, Other, Unforeseen more.

pitdicker · 2018-02-22T17:50:55Z

src/error.rs

            ErrorKind::Transient => "transient failure",
            ErrorKind::NotReady => "not ready yet",
-            ErrorKind::Other => "uncategorised",
+            ErrorKind::Custom(_) => "custom error code",


Don't we have the error cause for this?

This method can only return a static string and it's not guaranteed that we have a cause.

That is an argument. Reminds me a lot of that experiment I did with keeping the OS errno we decided against, because we don't assume an OS with no_std (https://gist.github.com/pitdicker/2411a4e3aae245c217e4fdd8a0b99917).

Like you suggested and @vks said, I would wait until people start asking for this solution.

pitdicker · 2018-02-22T17:51:29Z

src/error.rs

            ErrorKind::__Nonexhaustive => unreachable!(),
        }
    }
 }

+impl fmt::Display for ErrorKind {


This is mostly necessary because of Custom?

pitdicker · 2018-02-22T17:52:22Z

src/error.rs

 /// Error type of random number generators
 /// 
 /// This is a relatively simple error type, designed for compatibility with and
 /// without the Rust `std` library. It embeds a "kind" code, a message (static
-/// string only), and an optional chained cause (`std` only).
+/// string only), and an optional chained cause (`std` only). The `kind` and
+/// `msg` fields can be accessed directly; cause can be accessed via


👍 Private fields feel neater, but this is just practical and I can't imagine any disadvantages.

I specifically wanted to replace kind then return the error, and realised it was difficult with the old API. Further, I want to emphasise that Error is a simple type which won't change much (at least the first two fields).

pitdicker · 2018-02-22T17:53:06Z

src/error.rs

+#[test]
+fn test_kind() {
+    use core::mem::size_of;
+    assert!(size_of::<ErrorKind>() <= 2 * size_of::<usize>());


Good to test 👍. Why 2 * size_of::<usize>()?

Because on 32-bit that should be the size (or at least 32 + 3 bits), unless we change the payload to something < 32 bits.

Without Custom this can be size_of::<usize>() now?

Whoops, without Custom this is pointless

pitdicker · 2018-02-22T17:57:56Z

src/jitter.rs

@@ -145,6 +145,8 @@ impl ::std::error::Error for TimerError {

 impl From<TimerError> for Error {
    fn from(err: TimerError) -> Error {
+        // Timer check is already quite permissive of failures so we can assume
+        // any errors reported are irrecoverable.


I am not sure I understand what you say with "Timer check is already quite permissive of failures".
Maybe it is safer to say that we use Unavailable because the timer jitter failed basic quality tests and we don't expect the hardware to change much.

The timer tests allow for things like non-monotonic (decreasing) behaviour several times; I checked that all the tests have plenty of scope for handling unusual but non-fatal output before throwing an error. If the test was less permissive I would use Failure/Unexpected here.

Does that explain my reasoning sufficiently? Perhaps changes are still required.

Ah, you mean we don't really expect false positives. Ok. Would you mind rewording it a little?

pitdicker · 2018-02-22T18:02:59Z

src/reseeding.rs

-    /// from this method: `ErrorKind::Transient` and `ErrorKind::NotReady`.
+    /// in case of error we simply delay reseeding, allowing the generator to
+    /// continue its output of random data and try reseeding again later;
+    /// because of this we always return kind `ErrorKind::Transient`.


Okay, I think keeping Transient is a good idea. So basically we only changed ErrorKind::Other to some other name?

pitdicker · 2018-02-22T18:07:15Z

src/reseeding.rs

-            break; // Successfully reseeded, delayed, or given up.
-        }
+        // Behaviour is identical to `try_reseed`; we just squelch the error.
+        let _res = self.try_reseed();


Nice clean-up, and I think I can get behind the choice to unify the two reseeding methods. But I think doing the retries immediately is something useful to keep. At least if the error is Transient, with RDRAND in mind.

You think? I don't expect RDRAND would report an error often, and we're only delaying until the next call, so I thought it was reasonable?

Doesn't it wait until self.threshold >> 8, which could be large? But okay, I think this is a reasonable solution. The time to pick between reseeds is not exact science anyway...

No, the code sets bytes_until_reseed = 0 in case of Transient errors.

pitdicker · 2018-02-22T18:12:44Z

src/reseeding.rs

+        if let Err(e) = res1 {
+            // In the unlikely event the internal PRNG fails, we don't know
+            // whether this is resolvable; reseed immediately and return
+            // original error kind.


I think this is a nicely found use of the reseeding wrapper. Does it impact benchmarks? Probably not because we benchmark with large slices to fill...

I think this is unnecessary because just about every PRNG (everything that implements SeedableRng) never fails. Personally I would leave it up to the user if he wants to handle it.

We don't benchmark this method at all currently, but yes, I might revert the changes to this method.

How do you think about this now?

Frankly I don't care much, especially since your "block rng" idea might significantly alter this. But eventually I guess we should investigate the performance of thread_rng more.

I don't care much about changes to this particular method; I can revert if you like but it's probably a small improvement?

I don't think the performance is a big deal, but I am not sure adding this functionality is a good idea. Are there situations where reseeding the RNG when try_fill fails is not the right action, or maybe even insecure?

I can't think of any, actually I have a trouble imagining when a CSPRNG can fail...
But I can see some use in this: it allows a wrapped RNG to signal it wants to be reseeded.

I have some trouble finishing that Block Rng experiment. It is a bit too big for me. Do you want to help / take over, probably after 0.5 is done?

Well, I'll leave this for now then.

Can you create an issue regarding the BlockRng experiment? Sounds like it may not be in time for the 0.5 release but I'll try to take a look.

vks · 2018-02-22T18:21:27Z

Actually @vks I put some effort into documenting exactly what all these different kinds are for.

Sorry, I did not mean to imply otherwise. It was just not clear to me.

I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled".

Maybe OsError or FailedSyscall is more descriptive for Failure? Transient and NotReady are probably more descriptive than my suggestions. I think it makes sense to remove Custom without a use case, it can be added later in a backwards compatible way.

dhardy · 2018-02-22T18:38:23Z

Failure was for expected-resolvable errors but now it covers most error causes... I like the new structure myself (other than names and Custom).

I looked at the error kinds of other libraries, and "they mostly describe the cause of the error, not how it should be handled".

This is exactly where I think many error libraries go wrong. I'm not against adding a few more categories, but ultimately the only purpose of ErrorKind is to hint at what a handler should do with the error, so it should make this part easy, not confusing. When a user wants to locate the cause, it is the msg and cause fields which help with that, not kind.

That's not to say this approach is right; e.g. io::ErrorKind strikes a compromise in having lots of categories listed but not making them too specific. This is okay, but handling usually comes down to aborting with a message no matter what the kind is. Since RNGs can sometimes overcome errors on retry I want to emphasise that.

pitdicker · 2018-02-22T18:49:35Z

I'm not against adding a few more categories, but ultimately the only purpose of ErrorKind is to hint at what a handler should do with the error

Yes, you pushed for that in the original issue, and I agree. Doing something meaningful with errors is maybe a pattern that is not common enough in the Rust world...

We already picked the names in that direction, with a name that indicates both the 'reason' or 'category' of the error, and a hint of the right way to handle it. NotReady indicates it should retry over a while (or manually initialize etc.), Transient indicates an immediate retry could help, Unavailable indicates you should just give up, and Other or Unexpected says we have no idea what happened and what you should do about it, although a retry might work.

I just do not want to drop the 'meaning' part and only have an 'instruction' left.

pitdicker · 2018-02-22T19:26:23Z

From your first comment:

Unavailable now has two uses — in JitterRng (see comment in PR) and for WASM (which has an auto-failing OsRng implementation) — so we really can assume it's irrecoverable when handling.

I thought this PR initially let OsRng keep returning Unavailable of NotReady if new() failed, and Unexpected for failures in try_fill_bytes. Did this change? I still think Unavailable is the best kind to return if new fails.

dhardy · 2018-02-23T08:22:48Z

OsRng::new() can fail if, for example, there are no file descriptors available. You pointed out yourself that you thought this should retry at least occasionally, hence it seems to me that Unavailable is not appropriate? But also Transient is not appropriate because we don't know that the feature will work later, so Unexpected seems more appropriate?

pitdicker · 2018-02-23T10:19:30Z

Yes, I wanted that, but that is mostly with long-running processes in mind. I think we can honestly say OsRng is unavailable if OsRng::new() fails. Especially when the modern system call interfaces fail, because that should only happen if the OS is somehow deeply broken/misconfigured. Still, if the problem is 'magically' fixed (probably by a human intervening), it is nice to eventually start using OsRng again. That was my reasoning in ReseedingRng and later EntropyRng. And the same is true with a problem like no available file descriptors because of an attack. That will probably also not fix itself, or at least not quickly.

While writing this comment I looked again at the possible errors that can occur when opening /dev/urandom:

EMFILE or ENFILE: No file descriptors or other 'resource exhaustion attack'.
EACCESS, ENOENT, ENODEV, ENXIO: No permission, random device deleted, some deep OS misconfiguration
EINTR: Interrupted while opening /dev/urandom
EAGAIN, EWOULDBLOCK: Not ready

I think we could map EINTR to Transient, and we already map the last ones to NotReady. For the others, they will probably not fix themselves without human intervention, so I would say Unavailable.

pitdicker · 2018-02-23T10:27:58Z

What did you think about Unforeseen instead of Unexpected?

dhardy · 2018-02-23T10:33:37Z

Unforseen would imply that no one foresaw this failure mode — but that would obviously not be the case if there was code to return an Unforseen error there. This is exactly what @vks didn't like about Unexpected, only more literal.

"Expectations" are usually about what is likely to happen, but the word also gets used in other ways, e.g. "expect the unexpected" (which should probably be "expect not to foresee everything").

So I guess we stick with Unexpected

Edit: so not everyone agrees about the meaning of expected: OED, Merriam-Webster, Wiktionary. Perhaps its just my maths/stats background making me assume expected = likely.

pitdicker · 2018-02-23T10:56:41Z

src/read.rs

@@ -62,7 +62,7 @@ impl<R: Read> RngCore for ReadRng<R> {
        if dest.len() == 0 { return Ok(()); }
        // Use `std::io::read_exact`, which retries on `ErrorKind::Interrupted`.
        self.reader.read_exact(dest).map_err(|err| {
-            Error::with_cause(ErrorKind::Unavailable, "ReadRng: read error", err)


Does it make sense to split this in two errors? To differentiate between unrecoverable errors and others? If I look at std::io::ErrorKind I would only map :io::ErrorKind::Other to ErrorKind::Unexpected. And all the others seem like ErrorKind::Unavailable to me.

Good point, but I would think only NotFound or PermissionDenied would indicate Unavailable. If Read gets used over the network, any of the refused/reset/aborted might indicate that human input is needed to resolve, but it's harder to say; all the rest should be Unexpected as I see it.

Ah, your right. If we can't imagine the error to occur, best to map it to Unexpected. I think besides the 5/6 you mentioned also BrokenPipe could also mean Unavailable.

I don't see how a broken pipe is possible when reading directly from a file so Unexpected is fine.

I updated the PR BTW.

You can read from stdin, just like PractRand does.

True. Okay, BrokenPipe maps → Unavailable

pitdicker · 2018-02-23T10:57:12Z

It is strange to be discussing / advocating things I do not care strongly about 😄.

For things like reading random bytes in OsRng there are not really any documented errors once OsRng::new() succeeded. I do not foresee any errors here. But we did add an error path, so in some way we did foresee it? No problem, let's stick with Unexpected.

pitdicker · 2018-02-23T13:37:09Z

src/os.rs

+            let file = File::open(path).map_err(|err| {
+                use std::io::ErrorKind::*;
+                match err.kind() {
+                    NotFound | PermissionDenied | ConnectionRefused =>


This does not really match the same posix errors, see https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/mod.rs#L96. I don't want to be picky, but think it is better to match those (can be extracted).

I don't get you? The names are all basically the same except for ENOENT and EAGAIN. You think we should match the constants defined in libc instead? Or you would choose a different selection of constants to map to Unavailable?

I don't think ConnectionRefused is an error we can encounter for OsRng, and am missing EMFILE, ENFILE, ENODEV and ENXIO. Interesting, if you search for "ENXIO random" python uses a similar list when opening /dev/random. Maybe using the libc constants is best here?

New Unexpected error kind Error's kind and msg fields are now public Some error handling simplifications and revisions

dhardy · 2018-02-26T11:59:09Z

Removed test and rebased. I think this is ready now?

pitdicker · 2018-02-26T17:10:50Z

Looks good to me. Only the check in OsRng that may be worth changing, but I'll leave the choice up to you.

dhardy · 2018-02-26T17:19:17Z

Ah sorry, GitHub hid those comments.

…re hard errors

dhardy · 2018-02-27T10:01:03Z

@pitdicker going by your previous comment it doesn't look like matching all individual codes is the best option (also a problem because ReadRng is also used by Redox); better just to map most codes to Unavailable?

This means that Unexpected is now unused, equivalent to the previous Other... feels like we're going round in circles (although some of the error handling is definitely better).

pitdicker · 2018-02-27T10:25:58Z

This means that Unexpected is now unused, equivalent to the previous Other... feels like we're going round in circles (although some of the error handling is definitely better).

I feel the same. But still you made good improvements here! Ready to merge?

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable

dhardy force-pushed the failure branch from 934ef6e to 6ae3d2d Compare February 22, 2018 10:24

pitdicker reviewed Feb 22, 2018

View reviewed changes

pitdicker reviewed Feb 23, 2018

View reviewed changes

Error handling revisions

8e3134e

New Unexpected error kind Error's kind and msg fields are now public Some error handling simplifications and revisions

dhardy force-pushed the failure branch from 455728c to 8e3134e Compare February 26, 2018 11:58

Error handling: all IO errors other than Interrupted and WouldBlock a…

2b04427

…re hard errors

dhardy merged commit 7069151 into rust-random:master Feb 27, 2018

dhardy deleted the failure branch February 27, 2018 14:02

pitdicker mentioned this pull request Mar 3, 2018

Add BlockRng abstraction #281

Merged

dhardy mentioned this pull request Mar 11, 2018

Error handling: error sources / kinds dhardy/rand#9

Closed

pitdicker pushed a commit that referenced this pull request Apr 4, 2018

Merge pull request #255 from dhardy/failure

5def561

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable #255

ErrorKind: rename Transient → Unexpected and revise uses of Unavailable #255

Conversation

dhardy commented Feb 8, 2018

dhardy commented Feb 15, 2018

vks commented Feb 16, 2018

dhardy commented Feb 16, 2018

vks commented Feb 16, 2018

dhardy commented Feb 16, 2018

pitdicker commented Feb 17, 2018

dhardy commented Feb 17, 2018

pitdicker commented Feb 17, 2018

dhardy commented Feb 22, 2018

dhardy commented Feb 22, 2018

pitdicker commented Feb 22, 2018

vks commented Feb 22, 2018

dhardy commented Feb 22, 2018

dhardy commented Feb 22, 2018 • edited

pitdicker commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker Feb 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy Feb 22, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vks commented Feb 22, 2018 • edited

dhardy commented Feb 22, 2018 • edited

pitdicker commented Feb 22, 2018

pitdicker commented Feb 22, 2018

dhardy commented Feb 23, 2018

pitdicker commented Feb 23, 2018

pitdicker commented Feb 23, 2018

dhardy commented Feb 23, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Feb 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dhardy commented Feb 26, 2018

pitdicker commented Feb 26, 2018

dhardy commented Feb 26, 2018

dhardy commented Feb 27, 2018

pitdicker commented Feb 27, 2018

dhardy commented Feb 22, 2018 •

edited

pitdicker Feb 22, 2018 •

edited

dhardy Feb 22, 2018 •

edited

vks commented Feb 22, 2018 •

edited

dhardy commented Feb 22, 2018 •

edited

dhardy commented Feb 23, 2018 •

edited