Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inspect use of error #3836

Closed
coot opened this issue Jun 27, 2022 · 1 comment · Fixed by #4660
Closed

Inspect use of error #3836

coot opened this issue Jun 27, 2022 · 1 comment · Fixed by #4660
Labels
galois-review Issues / PRs related to Galois review technical debt

Comments

@coot
Copy link
Contributor

coot commented Jun 27, 2022

Unrecoverable errors, such as raised by error, might fall into various categories (at least)

  1. use in ’unreachable code’ / impossible to reach.
  2. invariants between various functions & modules have been violated.
  3. unexpected input from users, the environment, etc.
    Upon first glance, it appears all the instances of error fall into category 1. Suggestions:
  • When feasible, rewrite to eliminate the call to error. Of course, this is sometimes much more trouble than it’s worth.
  • When rewriting away is infeasible, use a panic (a la GHC source code, for practical purposes the same as error) to indicate this is ”category 1”. And even better, insert a comment as to ”why we believe it’s impossible vs why ghc doesn’t think it’s impoossible” (so the next coder who looks
    at the code doesn’t need to figure this all out for himself).

The result of the above would be the following:

  • No calls to error in program.
  • Unreachable/impossible code is clearly marked so.
  • Category 2 errors are caught with assert
  • Category 3 errors are all embedded into SomeException (as appears to be
    the case).
@coot coot added the galois-review Issues / PRs related to Galois review label Aug 2, 2022
@samcowger samcowger mentioned this issue Aug 26, 2023
8 tasks
@samcowger
Copy link
Contributor

While addressing this issue in #4660, I put together a document enumerating and contextualizing the uses of error. In lieu of putting that document in the repo itself, I'm duplicating it here for posterity. If desired, it could be migrated it to a wiki page, but I have no issue leaving it here.

Issue #3836: Inspect use of error

We began this work by branching at 77b54c71ca67a759f82872791749fea549ee3bc7, committed 18 August 2023.

Enumerating Files of Interest

Per Mark's advice, I have classified files into two tiers.

The first tier consists of files in the following packages, excluding those used in testing within those packages:

  • ouroboros-network
  • ouroboros-network-api
  • ouroboros-network-protocols
  • ouroboros-network-framework

The second tier consists of files in the following packages, excluding those used in testing within those packages:

  • cardano-client
  • cardano-ping
  • monoidal-synchronisation
  • network-mux
  • ntp-client
  • ouroboros-network-mock
  • ouroboros-network-testing

For the moment, we are not focusing on uses of error in testing files within these packages. We consider error to be more plausibly useful in testing than in production code, and its removal to be less necessary. If need be, we can create a third tier of files consisting of those used in testing across all packages.

Tier 1

Tier 1 files contain 17 uses of error.

1: ouroboros-network/demo/chain-sync.hs:577

nextState [] = error "chainSyncServer: impossible"

The list on which nextState is called originates from chainGenerator in the same file. nextState will never see an empty list because chainGenerator creates an infinite list.

To avoid mentioning error here, I've lifted the chainGenerator result into an Infinite construction, courtesy of infinite-list.

2: ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/Types.hs:674

error $ "A pick policy requested an attribute for peer address "
     ++ " which is outside of the set given to pick from"

This use of error appears in pickPeers, which is a "PickPolicy wrapper function". pickPeers is provided:

  • A PeerSelectionState containing i.a. localRootPeers, publicRootPeers, and knownPeers
  • A PickPolicy called pick
  • A Set peeraddr called available

A PickPolicy is "an action that picks a subset of elements from a [collection] of peers". It makes this choice by examining each peer in this collection and examining them via several functions (generally of the shape peeraddr -> _) it is provided.

pickPeers provides pick, its PickPolicy, with functions of its own creation. Those functions check their input peer for membership in localRootPeers, publicRootPeers, and knownPeers, and trigger the error if the peer is not found in knownPeers, but pick will only ever apply them to peers in available. Therefore, the error may be triggered if a peer in available is not in knownPeers (i.e. if available is partially disjoint from knownPeers).

In practice, available appears to be constructed from the various peer collections in a PeerSelectionState, whose invariant (as described in assertPeerSelectionState in this same module) ensures that knownPeers is a superset of all other peer sets, so this error is quite unlikely to be triggered as long as any extant PeerSelectionState adheres to its invariant.

I've added documentation at this error site of the rationale I've derived, and have included a reference to the invariant-documenting function at the declaration site of PeerSelectionState.

I've also added a precondition that checks whether available is a subset of knownPeers.

3: ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor.hs:544

-- impossible since guardedDecisions always has something to wait for
error "peerSelectionGovernorLoop: impossible: nothing to do"

The comment is correct, but I will elaborate on why.

guardedDecisions's result is of type Guarded:

data Guarded m a =
    GuardedSkip !(Maybe (Min Time))
  | Guarded' !(Maybe (Min Time)) (FirstToFinish m a)

Guarded has a Semigroup instance:

instance Alternative m => Semigroup (Guarded m a) where
  Guarded'    ta a <> Guarded'    tb b = Guarded'    (ta <> tb) (a <> b)
  Guarded'    ta a <> GuardedSkip tb   = Guarded'    (ta <> tb)  a
  GuardedSkip ta   <> Guarded'    tb b = Guarded'    (ta <> tb)  b
  GuardedSkip ta   <> GuardedSkip tb   = GuardedSkip (ta <> tb)

Note that the presence of any Guarded'-constructed value in a <> operation dominates - that is, will guarantee a Guarded'-constructed value.

Here, "something to wait for" means "a Guarded value constructed via Guarded' rather than GuardedSkip. guardedDecisions constructs a Guarded value by concatenating (via <>) several Guarded values. At least one of these values is always constructed via Guarded': Ouroboros.Network.PeerSelection.Governor.Monitor.connections. (In fact, several others are as well.) The presence of this value guarantees that guardedDecisions produces a Guarded' value - i.e., "something to wait for".

4: ouroboros-network/src/Ouroboros/Network/PeerSelection/RootPeersDNS.hs:217

Nothing  -> error "localRootPeersProvider: impossible happened"

The error is triggered only if waitAnyCatchSTM were to disobey its contract.

waitAnyCatchSTM:

A version of waitAnyCatch that can be used inside an STM transaction".

waitAnyCatch :

Wait for any of the supplied asynchronous operations to complete. The value returned is a pair of the Async that completed, and the result that would be returned by wait on that Async.

If multiple Asyncs complete or have completed, then the value returned corresponds to the first completed Async in the list.

waitAnyCatchSTM, then, will always yield an asynchronous action that was part of the original list, so not finding one should necessitate a panic. (This analysis assumes a coherent instance of Eq (Async m Void), one that will "correctly" compare an action as equal with itself.) The index of this asynchronous action is used to get ahold of a DomainAccessPoint from domains, which formed the basis of the async computation list.

To avoid mentioning error, I've implemented withAsyncAllWithCtx and waitAnyCatchSTMWithCtx, which are siblings of withAsyncAll and waitAnyCatchSTM that pass around a context value with the results of the async computations they process. The DomainAccessPoint that the rest of the code expects is the context value, so it needn't be looked up.

5: ouroboros-network/src/Ouroboros/Network/PeerSelection/PeerMetric.hs:387

Nothing -> error "impossible empty pq"

The surrounding code documents well the impossibility of this error. The code inserts an element into an IntPSQ, then gets the minimum value from the new IntPSQ. The new queue is guaranteed not to be empty, by IntPSQ.insert's contract, so there will always be a minimum to find.

6: ouroboros-network/src/Ouroboros/Network/BlockFetch/ClientRegistry.hs:301

unless ok $ error "setFetchClientContext: called more than once"

The surrounding code tries to set a StrictTMVar, via tryPutTMVar, and triggers this error if the operation fails, which it does iff the StrictTMVar is already full. I believe this error is not impossible to trigger, but instead represents a failure of program logic - accidentaly resetting some state instead of initializing it from scratch - severe enough to warrant program termination.

I say "I believe" because the surrounding function (setFetchClientContext) is undocumented. It should probably have some commentary attesting to this logic and the rationale for this error. I've opened issue #4678 for this.

7: ouroboros-network/src/Ouroboros/Network/Diffusion/P2P.hs:296

error ("socketAddressType: unexpected address " ++ show addr)

The error is triggered iff the surrounding function, socketAddressType is called on a SockAddr that isn't SockAddrInet- or SockAddrInet6-constructed. I presume these applications expect IP sockets and that program termination is warranted if they see something else, so this error is logically appropriate.

However, in addition to being partial via error, socketAddressType is also partial via Maybe (SockAddr -> Maybe AddressType), which seems to be to match a pattern of other record fields (diNtnAddressType, cmAddressType, perhaps inter alia) being similarly partial. I have explicitly matched the remaining AddressType constructor and replaced this error with Nothing, per this discussion.

8: ouroboros-network/src/Ouroboros/Network/KeepAlive.hs:103

-- 'decisionSTM' above cannot return 'Quiesce'
Quiesce   -> error "keepAlive: impossible happened"

The comment is correct, decisionSTM only returns a Terminate- or Continue-constructed ControlMessage. After discussing the ramifications of introducing a local type that mirrors just the two possible constructors of ControlMessage, Marcin and I decided to leave this as-is.

9: ouroboros-network/src/Ouroboros/Network/TxSubmission/Outbound.hs:142

(Nothing, TokNonBlocking) -> error "txSubmissionOutbound: impossible happend!"

The surrounding context makes clear that this error is impossible. The matched variables are mbtxs and blocking, and mbtxs is initialized based on the state of blocking. If blocking is TokNonBlocking, the only possible constructor for mbtxs is Just.

I've refactored the code to eliminate this error by lifting more functionality into the pattern-match on blocking.

10: ouroboros-network/src/Ouroboros/Network/TxSubmission/Outbound.hs:163

Nothing -> error "txSubmissionOutbound: empty transaction's list"
-- Assert txs is non-empty: we blocked until txs was non-null,
-- and we know reqNo > 0, hence take reqNo txs is non-null.

The accompanying comment explains well the impossibility of this code path. No semantic changes are required, though I have bracketed the expression take reqNo txs for clarity.

11: ouroboros-network-api/src/Ouroboros/Network/PeerSelection/PeerSharing.hs:91

encodeRemoteAddress (SockAddrUnix _) = error "Should never be encoding a SockAddrUnix!"

This error is similar to no. 7, above - the code expects and supports IP sockets, and does not expect or support Unix sockets. No change is warranted.

12: ouroboros-network-api/src/Ouroboros/Network/AnchoredSeq.hs:810

fromMaybe (error "could not join sequences") $

The enclosing function exposes a total/irrefutable interface for join. It passes an always-passing predicate to join, which guarantees a Just-constructed result from join, so the error will never trigger. The function is documented to describe why this irrefutability is sound.

13: ouroboros-network-api/src/Ouroboros/Network/NodeToNode/Version.hs:174

_ -> error "decodeTerm: impossible happened!",

The error is triggered when a peerSharing value, in a context in which it is guarded to be between 0 and 2 (| ..., peerSharing >= 0, peerSharing <= 2), pattern-matches to a number other than 0, 1, or 2. Failure of the primary guard already triggers failure via Left, so I've eliminated the mention of error by reorganized the pattern-matching while preserving the existing circumstances in which failure via Left is indicated.

14: ouroboros-network-api/src/Ouroboros/Network/AnchoredFragment.hs:165

anchorFromPoint GenesisPoint _     = error "anchorFromPoint: genesis point"

I don't fully understand, logically, why calling anchorFromPoint on GenesisPoint warrants an error. It seems like the natural result would be AnchorGenesis, an Anchor constructor seemingly for this purpose, rather than an error. Such a change would bring anchorFromPoint and anchorToPoint closer to being logical inverses of one another, if this is desirable.

I note that this function is exported but does not appear to be used elsewhere within the IOG org (per an org-wide Github search). Perhaps it could be marked deprecated and/or removed? I've opened issue #4690 to discuss the merits of this.

15: ouroboros-network-framework/src/Ouroboros/Network/Protocol/Handshake/Version.hs:73

[] -> error "foldMapVersions: precondition violated"

This error appears in foldMapVersions, which converts a collection of things into a collection of Versions (represented as a Versions value). The precondition in question is that the collection of things be nonempty.

foldMapVersions is, almost precisely, foldMap1 from Data.Foldable1. I opened issue #4677 to encourage leveraging this instance once the codebase is upgraded to a version of GHC that includes this class.

16: ouroboros-network-framework/src/Ouroboros/Network/Snocket.hs:543

toLocalAddress _                   = error "localSnocket.toLocalAddr: impossible happened"

This error is triggered if a non-Unix address is encountered in the process of creating a local snocket. Local snockets seem to rely on Unix sockets or named pipes, and cannot be constructed from IP sockets, so the error seems justified. I've filled out the function clauses to explicitly match on other socket constructors, for robustness's sake and to improve the error messages.

17: ouroboros-network-framework/src/Ouroboros/Network/Channel.hs:231

if full then error failureMsg

This error is triggered when trying to write to a full buffer. This behavior is intentional: the enclosing function makes its potentially partial semantics clear in its documentation, and declares itself "primarily useful for testing protocols." No changes are warranted.

Tier 2

Tier 2 files contain 11 uses of error.

1: cardano-ping/src/Cardano/Network/Ping.hs:183

handshakeReqEnc [] _ = error "null version list"

The error is triggered when handshakeReqEnc is provided an empty list of NodeVersion - but its only caller, handshakeReq, also guards against providing an empty list. I've changed handshakeReqEnc to expect a NonEmpty NodeVersion to eliminate the error.

2: network-mux/src/Network/Mux/DeltaQ/TraceStats.hs:37

Nothing -> error "step: missing referenceTimePoint"

The error is triggered when the referenceTimePoint is Nothing - but this eventuality is handled in a prior function guard, so the error is never triggered. More careful pattern-matching eliminates the call to error.

3: network-mux/src/Network/Mux/DeltaQ/TraceStats.hs:253

= error "Infeasible sampleInterval"

This error is triggered when a developer tries to provide an unsuitable sample interval. The existing interval will not trigger the error.

4: network-mux/src/Network/Mux/Ingress.hs:185

Nothing -> error ("setupDispatchTable: missing " ++ show miniProtocolNum)

The error occurs if, in the building of an array (ptclArray), an intermediate array (pnumArray) resolves an index (miniProtocolNum) to an "empty" (represented as Nothing) slot.

pnumArray is constructed such that every miniProtocolNum in ptcls maps to a Just-constructed slot. ptclArray is constructed by using every miniProtocolNum in ptcls to index pnumArray, and will trigger this error on encountering Nothing as a result of that indexing. Therefore, by construction, the error will never trigger. I've included documentation to this effect.

5: ntp-client/src/Network/NTP/Client.hs:105

Right NtpSyncPending -> error "ntpClientThread: impossible happened"

ntpClientThread calls ntpQuery and decomposes its NtpStatus result. A NtpSyncPending-constructed value result triggers this error - but ntpQuery will only ever produce NtpSyncUnavailable or NtpDrift values.

I've created a new abstraction for NTP statuses that can only represent the two "completed" versions of an NtpStatus, and made ntpQuery yield that instead, so the match for the impossible case no longer exists.

6: ouroboros-network-mock/src/Ouroboros/Network/Mock/ConcreteBlock.hs:222

partialField n = error ("mkPartialBlock: you didn't fill in field " ++ n)

This error is used to populate fields of an intentionally-partial BlockHeader value. The value is used as the starting point in a carefully-constructed but somewhat-brittle blockchain generation procedure relying heavily on fix and lazy evaluation, and I am taken to believe that these errors are overwritten or otherwise left unevaluated with standard usage of this function. (For example, it seems that results of mkPartialBlockHeader, the enclosing function, are passed immediately to fixupBlock, which overwrites the stubbed-out fields.)

I think it would be possible to refactor this chain generation to avoid having to utter error, but it would not be a quick fix, and the testing-based nature and use of this package leads me to believe that such a fix may not provide substantial value.

Regardless, I've changed the error message to describe the function enclosing it, which is mkPartialBlockHeader, not mkPartialBlock.

7: ouroboros-network-mock/src/Ouroboros/Network/Mock/Chain.hs:235

go _ = error "successorBlock: point not on chain"

The enclosing function, successorBlock, tries to find the successor block of a given point on a given chain. At the moment, this error is triggered iff the point is not on the chain.

All callers already check point membership on the chain, via pointOnChain, before calling this function. I've documented successorBlock to alert users of the need to perform this check.

8/9: ouroboros-network-testing/src/Ouroboros/Network/Testing/Data/Script.hs:221/223

= error "interpretPickScript: given empty map to pick from"
-- ...
= error "interpretPickScript: given invalid pickNum"

interpretPickScript is used to create PickPolicys to initialize a PeerSelectionPolicy. Its two errors both ensure the same precondition that Ouroboros.Network.PeerSelection.Governor.Types.pickPeers ensures when executing a PickPolicy - that there are peers available to pick from, and that we are picking more than zero of them.

10/11: ouroboros-network-testing/src/Ouroboros/Network/Testing/Data/Signal.hs:251/305

linger = error "TODO: Signal.linger"
-- ...
until _ = error "TODO: Signal.until"

These are stubs of unimplemented functions. As far as I can tell, these functions are unused.

samcowger added a commit to samcowger/ouroboros-network that referenced this issue Oct 25, 2023
The document this deletes has been migrated to issue IntersectMBO#3836.
coot pushed a commit to samcowger/ouroboros-network that referenced this issue Oct 27, 2023
The document this deletes has been migrated to issue IntersectMBO#3836.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
galois-review Issues / PRs related to Galois review technical debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants