Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message for query utxo on oops #4777

Closed

Conversation

newhoggy
Copy link
Contributor

@newhoggy newhoggy commented Jan 12, 2023

This PR introduces the use of the oops library for simplifying error handling. Functions that use oops for error handling end in an underscore. For example:

  • determineEraExpr_
  • getNtcVersion_
  • getSbeInQuery_
  • handleQueryConvenienceErrors_
  • maybeQueryExpr_
  • queryEraHistory_
  • queryExpr_
  • queryNodeLocalState_
  • queryProtocolParams_
  • queryStakePools_
  • queryStateForBalancedTx_
  • querySystemStart_
  • queryUtxo_
  • setupLocalStateQueryExpr_
  • writeStakePools_

In most cases, equivalent functions that don't use oops types in their type signature are retained, but have been re-implemented to call the oops version of the function. This is to allow code to gradually migrate to the oops error handling style and give time to learn the new style.

As a result of these changes executeQueryCardanoMode and executeQueryAnyMode become unused and are removed.

Reading notes

This PR adds code that uses a different error handling style supported by the oops library. Functions that have a type oops enabled signature end in an underscore so that they are easily distinguishable from others.

Use of BlockArguments has been removed from the PR because it was optional in the first place and there was too much happening in the one PR.

Stylistic choices

As part of implementing this feature, select parts of the code base has been converted to use oops error handling library to solve a particularly nasty problem.

The shape of the problem can be described like this:

We have a monadic type called LocalStateQueryExpr which is used to express local state queries.

This monad was introduced to allow complex expressions to be written. These expressions can contain multiple queries.

Packaging multiple queries into the one expression means that the CLI can perform multiple interdependent queries over the same connection. Prior to the introduction of this monadic type, the only way to implement a CLI command that required multiple queries was to connect to the validator node multiple times once per query.

The new function executeLocalStateQueryExpr_ acts as the interface between the IO monad and this query expression monad.

The new monadic type introduces new problems of its own. For example executeLocalStateQueryExpr_ can fail with an AcquireFailure. The possibility of this error is imposed by the networking library we use.

Moreover, some, but not all query expressions may fail with UnsupportedNtcVersionError. This is the feature that this PR implements.

Further to the problem is that a query expression written by the developer may additionally want to fail with its own error.

The difficulty with multiples parts of the same system wanting to throw their own errors is that the executeLocalStateQueryExpr_ must now accomodate errors raised by each of these possibilities.

The simplest solution might have been to return a stack of Eithers, but large stacks of Eithers are unwieldly. I would have to worry about the ordering of the errors in the stack and carefully unpack the error I want to handle (which could be nested) and rebuild the rest of the Either stack that I don't want to handle just yet so those errors can be popagated.

Without the help of a good error handling library that allows multiple errors to be thrown generically and transparently across a function boundary, the burden of juggling multiple error types is imposed on the user.

Attempts have been made to solve this problem some other way. For example executeLocalStateQueryExpr_ at one stage took an additional argument that served as a mapping function from errors that executeLocalStateQueryExpr_ need to throw to some generic error that the query expression could throw.

The problem was further compounded by having to support UnsupportedNtcVersionError which means that often the developer will need to map the error twice to cross two function boundaries.

All of this means that it is often too difficult to write monadic queries and developers hardly ever write them and instead just run queries over separate connections.

The problem is a long lived one I've been trying to solve since July 2021 in this PR #2957

Using the oops library allows us to neatly solve the above problem. For example the executeLocalStateQueryExpr_ is now:

executeLocalStateQueryExpr_
  :: forall e mode a . ()
  => e `OO.CouldBe` AcquireFailure
  => LocalNodeConnectInfo mode
  -> Maybe ChainPoint
  -> (NodeToClientVersion -> ExceptT (OO.Variant e) (LocalStateQueryExpr (BlockInMode mode) ChainPoint (QueryInMode mode) () IO) a)
  -> ExceptT (OO.Variant e) IO a
executeLocalStateQueryExpr_ connectInfo mpoint f = ...

In the above, e is a type list of all possible types that could be thrown.

There is a constraint CouldBe constraint that says that AcquireFailure must at least in the type list.

This means that executeLocalStateQueryExpr_ is able to throw AcquireFailure while the query expression in function f is free to throw whatever it wants.

To show how functions using oops are implemented, but executeQuery function is reproduced below:

executeQuery
  :: forall result era mode. CardanoEra era
  -> ConsensusModeParams mode
  -> LocalNodeConnectInfo mode
  -> QueryInMode mode (Either EraMismatch result)
  -> ExceptT ShelleyQueryCmdError IO result
executeQuery era cModeP localNodeConnInfo q = do
  eraInMode <- calcEraInMode era $ consensusModeOnly cModeP
  case eraInMode of
    ByronEraInByronMode -> left ShelleyQueryCmdByronEra
    _ -> OO.runOops1 @ShelleyQueryCmdError do
      executeLocalStateQueryExpr_ localNodeConnInfo Nothing
        do \_ -> do
            result <- queryExpr q
            case result of
              Right a -> return a
              Left e -> OO.throw (ShelleyQueryCmdLocalStateQueryError $ EraMismatchError e)
        & do OO.catch @AcquireFailure \e -> OO.throw $ ShelleyQueryCmdAcquireFailure $ toAcquiringFailure e
        & do OO.catch @UnsupportedNtcVersionError \e -> OO.throw $ ShelleyQueryCmdUnsupportedNtcVersion e

The first thing to note that whilst this function uses oops, its type signature is just ExceptT ShelleyQueryCmdError.

The interface between code that uses oops and regular ExceptT code is demarcated by runOops1 @ShelleyQueryCommandError.

This function embeds an ExceptT (Variant '[x]) expression into a ExceptT x expression.

This means the oops expression in the following do block may only throw ShelleyQueryCmdError.

If functions inside this do block throw something else, they must be caught to return a value or re-throw as ShelleyQueryCmdError.

The catch and re-throw is done by the lines:

        & do OO.catch @AcquireFailure \e -> OO.throw $ ShelleyQueryCmdAcquireFailure $ toAcquiringFailure e
        & do OO.catch @UnsupportedNtcVersionError \e -> OO.throw $ ShelleyQueryCmdUnsupportedNtcVersion e

Failing to catch an except that may not be thrown results in an error that looks like this:

    • Uh oh! I couldn't find UnsupportedNtcVersionError inside the variant!
      If you're pretty sure I'm wrong, perhaps the variant type is ambiguous;
      could you add some annotations?

This is a sign that the developer must do something with UnsupportedNtcVersionError. This can be one of the following options:

  • Catch and re-throw another error that can be thrown. In this case ShelleyQueryCmdError.
  • Catch and return a value instead.
  • Adjust the type signature of the surrounding function to allow propagation of UnsupportedNtcVersionError by adding CouldBe e UnsupportedNtcVersionError. This is not an option in this case because runOops1 imposes the restriction that only one error ShelleyQueryCmdAcquireFailure may propagate, but in other contexts, this may be possible.

An interesting feature of the executeQuery is that it makes heavy use of the BlockArguments Haskell extension:

    _ -> OO.runOops1 @ShelleyQueryCmdError do
      executeLocalStateQueryExpr_ localNodeConnInfo Nothing
        do \_ -> do
            result <- queryExpr q
            case result of
              Right a -> return a
              Left e -> OO.throw (ShelleyQueryCmdLocalStateQueryError $ EraMismatchError e)
        & do OO.catch @AcquireFailure \e -> OO.throw $ ShelleyQueryCmdAcquireFailure $ toAcquiringFailure e
        & do OO.catch @UnsupportedNtcVersionError \e -> OO.throw $ ShelleyQueryCmdUnsupportedNtcVersion e

Firstly the extension allows us to write do instead of $ do.

Secondly the expression that follows a do does not need to be a Monad or Applicative. It can be a value of any type.

This happens here: do \_ -> ....

In this way the do is kind of acting like a parentheses around the expression.

The expression is equivalent to (\_ -> ...).

The contents of these virtual parentheses is determined by indent.

The second do in do \_ -> do is a regular do block.

Finally there is & do OO.catch ....

Again, this is like & (OO.catch ...).

executeQuery
  :: forall result era mode. CardanoEra era
  -> ConsensusModeParams mode
  -> LocalNodeConnectInfo mode
  -> QueryInMode mode (Either EraMismatch result)
  -> ExceptT ShelleyQueryCmdError IO result
executeQuery era cModeP localNodeConnInfo q = do
  eraInMode <- calcEraInMode era $ consensusModeOnly cModeP
  case eraInMode of
    ByronEraInByronMode -> left ShelleyQueryCmdByronEra
    _ -> OO.runOops1 @ShelleyQueryCmdError do
      ( executeLocalStateQueryExpr_ localNodeConnInfo Nothing
          (\_ -> do
              result <- queryExpr q
              case result of
                Right a -> return a
                Left e -> OO.throw (ShelleyQueryCmdLocalStateQueryError $ EraMismatchError e)))
        & (OO.catch @AcquireFailure \e -> OO.throw $ ShelleyQueryCmdAcquireFailure $ toAcquiringFailure e)
        & (OO.catch @UnsupportedNtcVersionError \e -> OO.throw $ ShelleyQueryCmdUnsupportedNtcVersion e)

BlockArguments allows us to not use parentheses which means the developer doesn't have to worry about closing parentheses lining up. They can look at indent instead which is easier to see and typical of other Haskell code.

An added benefit is that fewer lines are touched when expressions need to be regrouped.

For example the opening parenthesis for the expression Left e -> OO.throw (ShelleyQueryCmdLocalStateQueryError $ EraMismatchError e))) are distributed across three different lines so regrouping at any of the three levels involves touching this line.

Additionally, note that the type signature of catch is not typical of catch functions.

Catch functions typically take the handler in the second argument. catch however takes the handler in the first argument.

This is important because it means adding and removing handlers in the above code touches the fewest lines. We don't have to worry about nesting expressions further (which can touch a lot of lines and cause large git diffs) or breaking out handlers into a where clause.

The use of BlockArguments is entirely optional. However when evaluating its merits it is worth considering that Haskell provides two mechanisms for grouping expressions: parentheses and do blocks.

When refactoring code that has a mixture of both, the developer will have to maintain both the indent and the parentheses and ensure they play well together. BlockArguments allows the developer to no longer have to think about parentheses and just think about grouping code with indents. This halves the cognitive burden that grouping of expressions imposes on the developer.

Finally, oops has decent type inference.

For example GHC and HLS can infer that the type of this expression:

queryExpr_ q = do
  let minNtcVersion = ntcVersionOf q
  ntcVersion <- getNtcVersion
  if ntcVersion >= minNtcVersion
    then lift
      $ LocalStateQueryExpr $ ReaderT $ \_ -> ContT $ \f -> pure $
        Net.Query.SendMsgQuery q $
          Net.Query.ClientStQuerying
          { Net.Query.recvMsgResult = f
          }
    else OO.throw $ UnsupportedNtcVersionError minNtcVersion ntcVersion

is this:

queryExpr_ :: OO.CouldBeF e UnsupportedNtcVersionError =>
                   QueryInMode mode b
                   -> ExceptT
                        (OO.Variant e)
                        (LocalStateQueryExpr block point (QueryInMode mode) r IO)
                        b

Which I can easily rewrite to my preferred type signature:

queryExpr_ :: ()
  => e `OO.CouldBe` UnsupportedNtcVersionError
  => QueryInMode mode a
  -> ExceptT (OO.Variant e) (LocalStateQueryExpr block point (QueryInMode mode) r IO) a

Why not use another library

I have evaluated a number of other options. plucky, fused-effects, control-monad-exceptions, polysemy and haskus-utils-variant.

They all suffer from a common problem which is the catch functions take the handler in the second argument making stacking handlers more fiddly that it needs to be.

Additionally they can have bad error messages. Take for example plucky:

src/Cardano/CLI/Shelley/Run/Query.hs:1367:7: error:
    • No (remind me to write a better error message)
    • In the expression:
        executeLocalStateQueryExpr_ localNodeConnInfo Nothing
      In the second argument of ‘($)’, namely
        ‘executeLocalStateQueryExpr_ localNodeConnInfo Nothing
           $ \ _
               -> do result <- queryExpr q
                     case result of
                       Right a -> ...
                       Left e -> ...’
      In the expression:
        wrapExceptT ShelleyQueryCmdUnsupportedNtcVersion
          $ executeLocalStateQueryExpr_ localNodeConnInfo Nothing
              $ \ _
                  -> do result <- queryExpr q
                        case result of
                          Right a -> ...
                          Left e -> ...
     |
1367 |       executeLocalStateQueryExpr_ localNodeConnInfo Nothing $ \_ -> do
     |       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some libraries like fused-effects and polysemy make it difficult to interoperate with the rest of the Haskell ecosystem. For example it is difficult to get ResourceT working if at all and a lot of libraries in the ecosystem uses ResourceT.

Other common interoperability pain points include callbacks (for example with FFI callbacks), lazy IO and UnliftIO.

oops, likely plucky is based on ExceptT which is common in the Haskell ecosystem and most developers know how to use ExceptT. oops imposes very little friction when interoperating with libraries that don't use oops.

In some cases, a library's type inference is not up to scratch. For example in oops, if I try to infer the type of the expression:

foo_ = OO.throw () & OO.catch (\() -> pure ())

I get this:

foo_ :: ExceptT (OO.Variant e') ghc-prim-0.6.1:GHC.Types.Any ()

Which isn't quite right, but I can guess what the type signature should be:

foo_ :: Monad m => ExceptT (OO.Variant e') m ()

In plucky I'm not actually sure what the type of this is:

foo = PKY.throwT () & PKY.catchT (\() -> pure ())

The type doesn't compile to start with and the inferred type is wrong:

foo :: ExceptT e' m a

I would expect the type to be something like this to indicate no exceptions are thrown:

foo :: Monad m => ExceptT Void m a

but that doesn't work.

The following doesn't work either:

foo :: Monad m => ExceptT e m ()

The example might seem academic, but it is not. If oops fails to infer a valid type (is there even one) when the code catches all errors, then a developer could easily get into this situation by catching all errors. Code with all errors caught is valid code so it ought to have a type and that type should be inferrable.

Finally oops does one more thing to help the developer past type inference issues. Every function that is used for error handling has an explicit forall and the the generic type argument for the error x is always the first argument. This is because when writing error handling code where there may be multiple possible error types at place it is very beneficial for the developer to be able to use TypeApplications to fix the error type to a particular type. Doing this will often resolve type inference problems.

cardano-api/src/Cardano/Api/Convenience/Query.hs Outdated Show resolved Hide resolved
cardano-api/src/Cardano/Api/IPC/Monad.hs Outdated Show resolved Hide resolved
@@ -263,6 +277,23 @@ data QueryInShelleyBasedEra era result where
:: PoolId
-> QueryInShelleyBasedEra era (SerialisedStakeSnapshots era)

instance NtcVersionOf (QueryInShelleyBasedEra era result) where
Copy link
Contributor

@Jimbo4350 Jimbo4350 Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot happening in this PR. Can you break it up, starting with the introduction of the NodeToClientVersion class? And also how this can be used in the non-monadic query interface?

Copy link
Contributor Author

@newhoggy newhoggy Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NodeToClientVersion is never used by the users of the API. It is used internally to map from the query to which version of the protocol supports it.

Users will call queryExpr, passing in a QueryInEra. If that query is supported, queryExpr will return normally. Otherwise UnsupportedNtcVersionError is "thrown".

Users can catch UnsupportedNtcVersionError if they wanted handle it, for example fall back to another query, return no results or throw a different error.

@newhoggy newhoggy force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch 6 times, most recently from c879560 to a89d79f Compare January 15, 2023 06:23
Copy link
Contributor

@abailly-iohk abailly-iohk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to comment on precise changes without having a deep knowledge of the code, which I don't have, but here are my 2 cts:

  1. Being unfamiliar with the BlockArguments extension makes the code harder to read to me. Maybe it's only me, maybe it's not uncommon in which case we've made potential external contributors' work harder
  2. I fail to see how the pattern OO.catchM @AcquireFailure \e -> OO.throwM $ TxGenError $ show e is better than using catches with a list of Handle a
  3. This code adds quite a few layers (Oops monad, exceptT monad) which adds more burden to the potential future person wanting to refactor this or add a new feature
  4. This requires wrapping things in ExceptT and at the same time use exceptions which seems somewhat contradictory to me: I would expect a strategy for error handling either embraces exceptions or explicit return types but not both for the same layer
  5. This PR introduces heavy changes to code used quite often without a single test being changed, which would make me nervous if I were a maintainer of this codebase
  6. As already discussed on slack, this code adds yet another dependency on an external library which implies new contributors would need to know and understand this library

@abailly-iohk
Copy link
Contributor

As an additional comment: I unfortunately don't have the time to provide some counterproposal that would solve the problem while IMO being simpler and easier to maintain, which I regret and implies you should take my comments with a grain of salt.

@newhoggy newhoggy force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch 5 times, most recently from 49f4915 to b186e93 Compare January 15, 2023 22:23
@newhoggy
Copy link
Contributor Author

newhoggy commented Jan 15, 2023

Being unfamiliar with the BlockArguments extension makes the code harder to read to me. Maybe it's only me, maybe it's not uncommon in which case we've made potential external contributors' work harder

BlockArguments is optional. I'm recommending it because I think it makes the layout of the code nicer IMO.

I fail to see how the pattern OO.catchM @AcquireFailure \e -> OO.throwM $ TxGenError $ show e is better than using catches with a list of Handle a

As it is currently written, the benefits are non-obvious, but over time benefits accumulate. The following are the things we can expect.

  1. There is a lot of catching and rethrowing happening because only a small amount of code is currently written in the proposed style. The catch and rethrow is necessary the surround code that calls it is still using the old style so the new code has to accomodate it.

In new code, one of the things we can do is to not handle the error and allow it to propagate which is a thing we cannot easily do with the old code. This means we would have a lot less marshalling.

  1. We don't have to think about ordering. If we have Either a (Either b (Either c a))) and I only want to handle b, I would have to unpack the nested eithers and handling the thing I want and repackage the rest for a higher layer to handle. This means refactoring to add or remove an error can impact a lot of code.

With oops, I can handle b ignore everything else and those would propagate naturally.

  1. I don't have to marshal needlessly marshal errors between types. We have different error types that we translate between only because the error type between the caller and the callee are different.

  2. I don't have to worry about handling errors that are never thrown because the errors are expressed as constructors in a sum type.

When errors are expressed as a sum type for example data MyError = MyError1 | MyError2 | MyError3. If the function call only ever raises MyError2, I would have to handler MyError1 and MyError3.

This ends up being too hard an at the moment we tend to propagate everything.

This is extremely risky. Maybe I add a new MyError4 and I know it ought to be handled and not propagated. Because we currently just propagate everything I get no compiler errors when I add my constructor and I failed to add the required error handling as a result.

Therefore to the extent that maintenance of the current the code base isn't as bad as it could be it's because we are being lax with our error handling in a way where we could be introducing bugs.

  1. We have duplicate copies of functions that only differ by the error type. This is a consequence of using constructors of sum types to represent specific errors.

This requires wrapping things in ExceptT and at the same time use exceptions which seems somewhat contradictory to me: I would expect a strategy for error handling either embraces exceptions or explicit return types but not both for the same layer.

The PR as it currently stands uses a lot of ExceptT wrapping, but it does so because it interfaces with code with existing code.

The use of the wrapping shows how that interoperating with existing code is possible and there is no the need to rewrite the entire code base in one go.

Note that our codebase already does a lot of ExceptT wrapping via newExceptT, so this isn't new.

As more of the code transitions to the proposed style the ExceptT wrapping would disappear.

This PR introduces heavy changes to code used quite often without a single test being changed, which would make me nervous if I were a maintainer of this codebase.

I think this is good. The code being modified is being tested indirectly through our integration tests.

If a lot of tests had to be modified, I think this is bad because then I would have to worry if changes to the tests changes what the tests are testing.

As already discussed on slack, this code adds yet another dependency on an external library which implies new contributors would need to know and understand this library.

I want the code to be intuitive. Contributors adding a new function to our code ought to be able to imitate existing code.

Good error messages and type inference and good documentation will be what gets new contributors over the line.

Over time the code will also become more regular. The library promotes the writing of code that reads straight down as a sequence of statements.

The goal is to be able to cobble a bunch of statements together in a new function and the compiler or HLS can mostly just infer the type (which is a thing that fails enough with the current codebase that it bothers me).

@newhoggy newhoggy force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch 13 times, most recently from 6949851 to 197a213 Compare January 17, 2023 03:25
@newhoggy newhoggy force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch from 7c2307c to 98e2811 Compare May 16, 2023 01:08
@newhoggy newhoggy force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch from a0467f3 to a59f0f8 Compare May 16, 2023 13:28
@hamishmack hamishmack force-pushed the newhoggy/better-error-message-for-query-utxo-on-oops branch from a59f0f8 to ae0980b Compare May 17, 2023 00:25
@newhoggy newhoggy closed this Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants