Skip to content

Commit

Permalink
Merge #2419
Browse files Browse the repository at this point in the history
2419: Fix flaky stake pools integration test r=Anviking a=rvl

### Issue Number

Flaky test #2415

### Overview

This test failure seemed to be happening quite a lot the other day. Now I can't reproduce it.

I am pretty sure the error is from somewhere in `stakeDistribution` which is used by `listPools`.
My guess of the cause is that rollback causes ledger queries to temporarily fail. (confirmed: ADP-647)

The error was never logged, and the exception message was not part of the HTTP 503 response.

1. [x] Fix some minor test environment issues.
2. [x] If a ledger query fails, make sure that the exception is logged.
3. [x] Correct the return type of `listPools` and functions that it calls. 
4. [x] Reproduce error with logs, diagnose problem. Problem 1: unreproduceable. Problem 2: epoch rollover between tx submission and rewards balance check.
5. [x]  Fix the ~code and/or~ test case.
6. [x] Add utility functions for transforming BracketLog traces into time deltas and logging timings.
7. [x] Add back logging of how long ledger queries take.
 
### Comments

- I haven't reproduced the error, but it's worth merging these fixes anyway.

- The new error, which seems to happen reliably on the same test case, is this:
  ```
  [1 2021-01-05 17:33:29] Failures:
  [1 2021-01-05 17:33:29] 
  [1 2021-01-05 17:33:29]   src/Test/Integration/Scenario/API/Shelley/StakePools.hs:324:26: 
  [1 2021-01-05 17:33:29]   1) API Specifications, SHELLEY_STAKE_POOLS, STAKE_POOLS_JOIN_01rewards - Can join a pool, earn rewards and collect them
  [1 2021-01-05 17:33:29]        While verifying (Status {statusCode = 200, statusMessage = "OK"},Right (ApiWallet {id = ApiT {getApiT = WalletId {getWalletId = c83c7e41054d4f96238c00e1a8abf75c7a93f485}}, addressPoolGap = ApiT {getApiT = AddressPoolGap {getAddressPoolGap = 20}}, balance = ApiT {getApiT = WalletBalance {available = Quantity {getQuantity = 999998581353}, total = Quantity {getQuantity = 1000007589123}, reward = Quantity {getQuantity = 9007770}}}, delegation = ApiWalletDelegation {active = ApiWalletDelegationNext {status = Delegating, target = Just (ApiT {getApiT = PoolId {getPoolId = "\187\DC1L\179}u\250\ENQ&\ETX(\194\&5\163\218\226\149\163=\v\166t\165\235\RS>V\142"}}), changesAt = Nothing}, next = []}, name = ApiT {getApiT = WalletName {getWalletName = "Faucet Wallet"}}, passphrase = Just (ApiWalletPassphraseInfo {lastUpdatedAt = 2021-01-05 09:22:50.035321314 UTC}), state = ApiT {getApiT = Ready}, tip = ApiBlockReference {absoluteSlotNumber = ApiT {getApiT = SlotNo 5056}, slotId = ApiSlotId {epochNumber = ApiT {getApiT = EpochNo {unEpochNo = 101}}, slotNumber = ApiT {getApiT = SlotInEpoch {unSlotInEpoch = 6}}}, time = 2021-01-05 09:25:03.2 UTC, block = ApiBlockInfo {height = Quantity {getQuantity = 2533}}}}))
  [1 2021-01-05 17:33:29]        Waited longer than 90s to resolve action: "Wallet has consumed rewards".
  [1 2021-01-05 17:33:29]        expected: Quantity {getQuantity = 0}
  [1 2021-01-05 17:33:29]         but got: Quantity {getQuantity = 9007770}
  [1 2021-01-05 17:33:29] 
  [1 2021-01-05 17:33:29]   To rerun use: --match "/API Specifications/SHELLEY_STAKE_POOLS/STAKE_POOLS_JOIN_01rewards - Can join a pool, earn rewards and collect them/"
  [1 2021-01-05 17:33:29] 
  [1 2021-01-05 17:33:29] Randomized with seed 1938796546
  [1 2021-01-05 17:33:29] 
  [1 2021-01-05 17:33:29] Finished in 1514.5249 seconds
  [1 2021-01-05 17:33:29] 711 examples, 1 failure, 9 pending
  [1 2021-01-05 17:33:29] NO_CLEANUP of temporary directory /run/user/1000/test-06a07e3d6c739d45

  real	25m15.583s
  user	27m19.699s
  sys	2m15.754s
  ```

  The cause of this failure (according to the logs) is epoch rollover happening just after the rewards withdrawal transaction is submitted. New rewards accrue before the test case can check that previous rewards were withdrawn. Inherently flaky.

Co-authored-by: Rodney Lorrimar <rodney.lorrimar@iohk.io>
  • Loading branch information
iohk-bors[bot] and rvl committed Jan 11, 2021
2 parents f5f921c + dba653a commit 05a66cf
Show file tree
Hide file tree
Showing 23 changed files with 363 additions and 325 deletions.
2 changes: 1 addition & 1 deletion .buildkite/pipeline.yml
Expand Up @@ -17,7 +17,7 @@ steps:

- label: 'Stack Rebuild'
command:
- "mkdir -p $TESTS_LOGDIR"
- "rm -rf $TESTS_LOGDIR && mkdir $TESTS_LOGDIR"
- "nix-build .buildkite/default.nix -o sr"
- "./sr/bin/rebuild --build-dir=$BUILD_DIR --cache-dir=$CACHE_DIR"
timeout_in_minutes: 120
Expand Down
1 change: 0 additions & 1 deletion lib/cli/cardano-wallet-cli.cabal
Expand Up @@ -37,7 +37,6 @@ library
, cardano-addresses
, cardano-addresses-cli
, cardano-wallet-core
, contra-tracer
, directory
, filepath
, fmt
Expand Down
75 changes: 1 addition & 74 deletions lib/cli/src/Cardano/CLI.hs
Expand Up @@ -94,11 +94,9 @@ module Cardano.CLI
, requireFilePath
, getDataDir
, setupDirectory
, waitForService
, getPrometheusURL
, getEKGURL
, ekgEnabled
, WaitForServiceLog (..)
) where

import Prelude hiding
Expand Down Expand Up @@ -127,8 +125,6 @@ import Cardano.BM.Data.Severity
( Severity (..) )
import Cardano.BM.Data.SubTrace
( SubTrace (..) )
import Cardano.BM.Data.Tracer
( HasPrivacyAnnotation (..), HasSeverityAnnotation (..) )
import Cardano.BM.Setup
( setupTrace_, shutdown )
import Cardano.BM.Trace
Expand Down Expand Up @@ -167,8 +163,6 @@ import Cardano.Wallet.Api.Types
, WalletPutPassphraseData (..)
, fmtAllowedWords
)
import Cardano.Wallet.Network
( ErrNetworkUnavailable (..) )
import Cardano.Wallet.Orphans
()
import Cardano.Wallet.Primitive.AddressDerivation
Expand Down Expand Up @@ -201,8 +195,6 @@ import Control.Monad
( forM_, forever, join, unless, void, when )
import Control.Monad.IO.Class
( MonadIO )
import Control.Tracer
( Tracer, traceWith )
import Data.Aeson
( ToJSON (..), (.:), (.=) )
import Data.Bifunctor
Expand Down Expand Up @@ -323,7 +315,7 @@ import System.IO
import UnliftIO.Concurrent
( threadDelay )
import UnliftIO.Exception
( bracket, catch )
( bracket )

import qualified Cardano.BM.Configuration.Model as CM
import qualified Cardano.BM.Data.BackendKind as CM
Expand Down Expand Up @@ -1962,68 +1954,3 @@ optionalE
optionalE parse = \case
m | m == mempty -> Right Nothing
m -> Just <$> parse m

{-------------------------------------------------------------------------------
Polling for service availability
-------------------------------------------------------------------------------}

-- | Wait for a service to become available on a given TCP port. Exit on failure
-- with a proper error message.
waitForService
:: Service
-- ^ Name of the service
-> Tracer IO WaitForServiceLog
-- ^ A 'Trace' for logging
-> Port "node"
-- ^ TCP Port of the service
-> IO ()
-- ^ Service we're waiting after.
-> IO ()
waitForService service tracer port action = do
let handler (ErrNetworkInvalid net) = do
traceWith tracer $ MsgServiceErrNetworkInvalid net
exitFailure
handler _ = do
traceWith tracer $ MsgServiceTimedOut service
exitFailure

traceWith tracer $ MsgServiceWaiting service port
action `catch` handler
traceWith tracer $ MsgServiceReady service

-- | Log messages from 'waitForService'
data WaitForServiceLog
= MsgServiceWaiting Service (Port "node")
| MsgServiceReady Service
| MsgServiceTimedOut Service
| MsgServiceErrNetworkInvalid Text
deriving (Show, Eq)

instance ToText WaitForServiceLog where
toText = \case
MsgServiceWaiting (Service service) port -> mconcat
[ "Waiting for "
, service
, " to be ready on tcp/"
, T.pack (showT port)
]
MsgServiceReady (Service service) ->
service <> " is ready."
MsgServiceTimedOut (Service service) -> mconcat
[ "Waited too long for "
, service
, " to become available. Giving up!"
]
MsgServiceErrNetworkInvalid net -> mconcat
[ "The node backend is not running on the \"", net, "\" "
, "network. Please start the wallet server and the node "
, "backend on the same network. Exiting now."
]

instance HasPrivacyAnnotation WaitForServiceLog
instance HasSeverityAnnotation WaitForServiceLog where
getSeverityAnnotation = \case
MsgServiceWaiting _ _ -> Info
MsgServiceReady _ -> Info
MsgServiceTimedOut _ -> Info
MsgServiceErrNetworkInvalid _ -> Info
Expand Up @@ -51,6 +51,8 @@ data Context = Context
:: IORef [PoolGarbageCollectionEvent]
-- ^ The complete list of pool garbage collection events.
-- Most recent events are stored at the head of the list.
, _smashUrl :: Text
-- ^ Base URL of the mock smash server.
}
deriving Generic

Expand Down
Expand Up @@ -82,8 +82,6 @@ import Data.Text.Class
( showT, toText )
import Numeric.Natural
( Natural )
import System.Environment
( getEnv )
import Test.Hspec
( SpecWith, describe, pendingWith )
import Test.Hspec.Expectations.Lifted
Expand Down Expand Up @@ -325,6 +323,7 @@ spec = describe "SHELLEY_STAKE_POOLS" $ do
"withdrawal": "self"
}|]

waitForNextEpoch ctx
rTx <- request @(ApiTransaction n) ctx
(Link.createTransaction @'Shelley src)
Default (Json payloadWithdrawal)
Expand Down Expand Up @@ -1152,8 +1151,7 @@ spec = describe "SHELLEY_STAKE_POOLS" $ do
it "STAKE_POOLS_SMASH_01 - fetching metadata from SMASH works with delisted pools" $
\ctx -> runResourceT $ bracketSettings ctx $ do
liftIO $ flakyBecauseOf "#2337 (theorized)"
smashUrl <- liftIO $ getEnv "CARDANO_WALLET_SMASH_URL"
updateMetadataSource ctx (T.pack smashUrl)
updateMetadataSource ctx (_smashUrl ctx)
eventually "metadata is fetched" $ do
r <- listPools ctx arbitraryStake
verify r
Expand Down Expand Up @@ -1192,8 +1190,7 @@ spec = describe "SHELLEY_STAKE_POOLS" $ do

it "STAKE_POOLS_SMASH_HEALTH_01 - Can check SMASH health when configured" $
\ctx -> runResourceT $ bracketSettings ctx $ do
smashUrl <- liftIO $ getEnv "CARDANO_WALLET_SMASH_URL"
updateMetadataSource ctx (T.pack smashUrl)
updateMetadataSource ctx (_smashUrl ctx)
r <- request @ApiHealthCheck
ctx Link.getCurrentSMASHHealth
Default Empty
Expand All @@ -1212,9 +1209,8 @@ spec = describe "SHELLEY_STAKE_POOLS" $ do

it "STAKE_POOLS_SMASH_HEALTH_03 - Can check SMASH health via url" $
\ctx -> runResourceT $ do
smashUrl <- liftIO $ getEnv "CARDANO_WALLET_SMASH_URL"
let withUrl f (method, link) = (method, link <> "?url=" <> T.pack f)
let link = withUrl smashUrl Link.getCurrentSMASHHealth
let withUrl f (method, link) = (method, link <> "?url=" <> f)
let link = withUrl (_smashUrl ctx) Link.getCurrentSMASHHealth

r <- request @ApiHealthCheck ctx link Default Empty
expectResponseCode HTTP.status200 r
Expand Down
1 change: 1 addition & 0 deletions lib/core/cardano-wallet-core.cabal
Expand Up @@ -99,6 +99,7 @@ library
, text-class
, time
, tls
, tracer-transformers
, transformers
, typed-protocols
, unliftio
Expand Down
7 changes: 3 additions & 4 deletions lib/core/src/Cardano/Wallet.hs
Expand Up @@ -158,7 +158,6 @@ module Cardano.Wallet
, ErrGetTransaction (..)
, ErrNoSuchTransaction (..)
, ErrNetworkUnavailable (..)
, ErrCurrentNodeTip (..)
, ErrStartTimeLaterThanEndTime (..)

-- ** Root Key
Expand Down Expand Up @@ -202,10 +201,10 @@ import Cardano.Wallet.DB
, sparseCheckpoints
)
import Cardano.Wallet.Network
( ErrCurrentNodeTip (..)
, ErrGetAccountBalance (..)
( ErrGetAccountBalance (..)
, ErrNetworkUnavailable (..)
, ErrPostTx (..)
, ErrStakeDistribution (..)
, FollowAction (..)
, FollowExit (..)
, FollowLog (..)
Expand Down Expand Up @@ -2572,7 +2571,7 @@ data ErrWithdrawalNotWorth

-- | Errors that can occur when trying to list stake pool.
data ErrListPools
= ErrListPoolsNetworkError ErrNetworkUnavailable
= ErrListPoolsQueryFailed ErrStakeDistribution
| ErrListPoolsPastHorizonException PastHorizonException
deriving (Show)
{-------------------------------------------------------------------------------
Expand Down
36 changes: 18 additions & 18 deletions lib/core/src/Cardano/Wallet/Api/Server.hs
Expand Up @@ -227,8 +227,8 @@ import Cardano.Wallet.Api.Types
import Cardano.Wallet.DB
( DBFactory (..) )
import Cardano.Wallet.Network
( ErrCurrentNodeTip (..)
, ErrNetworkUnavailable (..)
( ErrNetworkUnavailable (..)
, ErrStakeDistribution (..)
, NetworkLayer
, timeInterpreter
)
Expand Down Expand Up @@ -1746,14 +1746,14 @@ getNetworkInformation
=> SyncTolerance
-> NetworkLayer IO Block
-> Handler ApiNetworkInformation
getNetworkInformation st nl = do
now <- liftIO $ currentRelativeTime ti
nodeTip <- liftHandler (NW.currentNodeTip nl)
apiNodeTip <- liftIO $ makeApiBlockReferenceFromHeader
getNetworkInformation st nl = liftIO $ do
now <- currentRelativeTime ti
nodeTip <- NW.currentNodeTip nl
apiNodeTip <- makeApiBlockReferenceFromHeader
(neverFails "node tip is within safe-zone" $ timeInterpreter nl)
nodeTip
nowInfo <- liftIO $ runMaybeT $ networkTipInfo now
progress <- liftIO $ syncProgress
nowInfo <- runMaybeT $ networkTipInfo now
progress <- syncProgress
st
(neverFails "syncProgress" $ timeInterpreter nl)
nodeTip
Expand Down Expand Up @@ -2662,15 +2662,6 @@ instance LiftHandler ErrNoSuchTransaction where
, toText tid
]

instance LiftHandler ErrCurrentNodeTip where
handler = \case
ErrCurrentNodeTipNetworkUnreachable e -> handler e
ErrCurrentNodeTipNotFound -> apiError err503 NetworkTipNotFound $ mconcat
[ "I couldn't get the current network tip at the moment. It's "
, "probably because the node is down or not started yet. Retrying "
, "in a bit might give better results!"
]

instance LiftHandler ErrSelectForDelegation where
handler = \case
ErrSelectForDelegationNoSuchWallet e -> handler e
Expand Down Expand Up @@ -2799,9 +2790,18 @@ instance LiftHandler ErrWithdrawalNotWorth where

instance LiftHandler ErrListPools where
handler = \case
ErrListPoolsNetworkError e -> handler e
ErrListPoolsQueryFailed e -> handler e
ErrListPoolsPastHorizonException e -> handler e

instance LiftHandler ErrStakeDistribution where
handler = \case
ErrStakeDistributionQuery _e ->
apiError err503 NetworkQueryFailed $ mconcat
[ "Unable to query the ledger at the moment. "
, "This error has been logged. "
, "Trying again in a bit might work."
]

instance LiftHandler ErrSignMetadataWith where
handler = \case
ErrSignMetadataWithRootKey e -> handler e
Expand Down
3 changes: 1 addition & 2 deletions lib/core/src/Cardano/Wallet/Api/Types.hs
Expand Up @@ -911,7 +911,7 @@ data ApiErrorCode
| InvalidCoinSelection
| NetworkUnreachable
| NetworkMisconfigured
| NetworkTipNotFound
| NetworkQueryFailed
| CreatedInvalidTransaction
| RejectedByCoreNode
| BadRequest
Expand Down Expand Up @@ -2221,4 +2221,3 @@ instance FromJSON (ApiT SmashServer) where

instance ToJSON (ApiT SmashServer) where
toJSON = toJSON . toText . getApiT

0 comments on commit 05a66cf

Please sign in to comment.