New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed 0.70.2 hotfix #2231

Merged
merged 2 commits into from Sep 21, 2017

Conversation

Projects
None yet
3 participants
@ximinez
Contributor

ximinez commented Sep 19, 2017

Recover old open ledger transactions to the queue:

  • Recover to the open ledger once, then to the queue.
  • If transaction fails to queue for any reason, drop it.
  • New result codes for transactions that can not queue.
  • Add minimum queue size
  • RIPD-1530
    fix #2215
@ximinez

This comment has been minimized.

Contributor

ximinez commented Sep 19, 2017

Build and test problems. Closing to fix.

@ximinez ximinez closed this Sep 19, 2017

if (app.getHashRouter().shouldRecover(tx->getTransactionID()))
return ripple::apply(
app, view, *tx, flags, j);
else

This comment has been minimized.

@nbougalis

nbougalis Sep 19, 2017

Member

Sorry to nitpick, but please use {} here. Yes, it's not strictly necessary, but the comment block following the else is large and the curlies will help make the code easier to read.

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

Fixed, and entirely sensible.

auto const result = [&]
{
if (app.getHashRouter().shouldRecover(tx->getTransactionID()))
return ripple::apply(

This comment has been minimized.

@nbougalis

nbougalis Sep 19, 2017

Member

micro-nit: single line works fine here.

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

Fixed.

@note The limit is signed while the counter is unsigned.
A negative limit will retry forever.
*/
bool shouldRecover(std::int32_t limit)

This comment has been minimized.

@nbougalis

nbougalis Sep 19, 2017

Member

If we don't care about negative limits (why should we?) then this can be neatly simplified as the following, if we increase limit by 1:

bool shouldRecover(std::int32_t limit)
{
    return ++recoveries_ % limit != 0;
}

This reads better, at least to me. And, as an extra benefit, it tracks exactly how many times we've tried to recover the item.

Also, I'd also make both limit and recoveries_ be unsigned int.

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

I wanted to leave the option available to basically remove the limitations by setting a negative limit.

I also wanted an option to have a limit of 0, which would basically prevent retries entirely.

If we don't want either of those, your optimization makes sense (and was my initial approach, too). (I did have another concern about integer overflow, but if a tx retries 2^32 times, that's a whole other issue.)

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

Per discussion, changed for the simplification

@@ -303,6 +308,7 @@ RCLConsensus::onClose(
// Build SHAMap containing all transactions in our open ledger
for (auto const& tx : initialLedger->txs)
{
JLOG(j_.debug()) << "Adding open ledger TX " << tx.first->getTransactionID();

This comment has been minimized.

@nbougalis

nbougalis Sep 19, 2017

Member

Micro-nit: indentation

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

Fixed

canBeHeld = accountIter == byAccount_.end() ||
replacementIter ||
accountIter->second.getTxnCount() <
setup_.maximumTxnPerAccount;
setup_.maximumTxnPerAccount ||

This comment has been minimized.

@nbougalis

nbougalis Sep 19, 2017

Member

This construct has now, officially, become impossible to read.

Is this better maybe?

/* Limit the number of transactions an individual account
   can queue. Mitigates the lost cost of relaying should
   an early one fail or get dropped.
 */
if (canBeHeld)
{
    if (accountIter != byAccount_.end())
        canBeHeld = false;

    if (!canBeHeld && replacementIter)
        canBeHeld = true;
    
    if (!canBeHeld && 
            accountIter->second.getTxnCount() < setup_.maximumTxnPerAccount)
        canBeHeld =  true;

    // Allow the transaction to get in front of the first queued
    // transaction. This allows recovery of open ledger transactions
    // and stuck transactions.
    if (!canBeHeld &&
            tx.getSequence() < accountIter->second.transactions.begin()->first)
        canBeHeld = true;
}

P.S.: I am convinced that the above logic is correct and equivalent to what's already there, but please double-check before using this.

This comment has been minimized.

@ximinez

ximinez Sep 19, 2017

Contributor

I updated it differently, but I think you'll like it.

@ximinez ximinez reopened this Sep 19, 2017

@nbougalis

Please address the issue with bool shouldRecover(std::uint32_t limit)

@@ -163,6 +164,10 @@ RCLConsensus::relay(RCLCxTx const& tx)
app_.overlay().foreach (send_always(
std::make_shared<Message>(msg, protocol::mtTRANSACTION)));
}
else
{
JLOG(j_.debug()) << "Not relaying disputedtx " << tx.id();

This comment has been minimized.

@nbougalis

nbougalis Sep 20, 2017

Member

space between disputed and tx

This comment has been minimized.

@ximinez

ximinez Sep 20, 2017

Contributor

Fixed

Serializer s;
tx->add(s);
msg.set_rawtransaction(&s.getData().front(), s.getLength());

This comment has been minimized.

@nbougalis

nbougalis Sep 20, 2017

Member

You can do:
msg.set_rawtransaction(s.data(), s.size());

*/
bool shouldRecover(std::uint32_t limit)
{
return ++recoveries_ % limit != 0;

This comment has been minimized.

@nbougalis

nbougalis Sep 20, 2017

Member

So getDefaultRecoverLimit returns 1 and since x mod 1 is always 0 this will always return false. If we make getDefaultRecoverLimit return 2, then this function will oscillate between true and false. Not sure if that's what we want.

I know that I was the one that suggested this change, but I'm now thinking that this should be reduced to:

bool shouldRecover(std::uint32_t limit)
{
    return recoveries_++ < limit;
}

With limit == 1 this function will return to the sequence true, false, false, false, ...

This comment has been minimized.

@ximinez

ximinez Sep 20, 2017

Contributor
    HashRouter (Stopwatch& clock, std::chrono::seconds entryHoldTimeInSeconds,
        std::uint32_t recoverLimit)
        : suppressionMap_(clock)
        , holdTime_ (entryHoldTimeInSeconds)
        , recoverLimit_ (recoverLimit + 1u)
    {
    }

That does the right thing, works after rediscovery, and allows the API to specify "recover this many times before giving up."

setup_.maximumTxnPerAccount;
// Allow if the account is not in the queue at all
canBeHeld = accountIter == byAccount_.end();

This comment has been minimized.

@nbougalis

nbougalis Sep 20, 2017

Member

Thanks! This is much more readable!

/** Determines if this item should be recovered from the open ledger.
Counts the number of times the item has been recovered.
If it hits the limit, reset the counter and return false.

This comment has been minimized.

@wilsonianb

wilsonianb Sep 20, 2017

reset the counter

This comment has been minimized.

@ximinez

ximinez Sep 20, 2017

Contributor

Rephrased.

// already had a chance via disputes
hint->second = false;
else
shouldRecover.emplace_hint(hint, txID,

This comment has been minimized.

@wilsonianb

wilsonianb Sep 20, 2017

why emplace_hint if hint is end()? Is it because we know this is the highest tx added from current_ (even though there may be larger disputed tx ids in the map)?

This comment has been minimized.

@ximinez

ximinez Sep 20, 2017

Contributor

Good catch. I mixed my metaphors and used the wrong function. Should be right now.

@@ -1454,6 +1482,7 @@ setup_TxQ(Config const& config)
TxQ::Setup setup;
auto const& section = config.section("transaction_queue");
set(setup.ledgersInQueue, "ledgers_in_queue", section);
set(setup.queueSizeMin, "minimum_queue_size", section);

This comment has been minimized.

@wilsonianb

wilsonianb Sep 21, 2017

Should this be documented in rippled-example.cfg

This comment has been minimized.

@ximinez

ximinez Sep 21, 2017

Contributor

Probably, yeah.

{ telCAN_NOT_QUEUE_BLOCKS, { "telCAN_NOT_QUEUE_BLOCKS", "Can not queue at this time: would block later queued transaction(s)." } },
{ telCAN_NOT_QUEUE_BLOCKED, { "telCAN_NOT_QUEUE_BLOCKED", "Can not queue at this time: blocking transaction in queue." } },
{ telCAN_NOT_QUEUE_FEE, { "telCAN_NOT_QUEUE_FEE", "Can not queue at this time: fee insufficient to replace queued transaction." } },
{ telCAN_NOT_QUEUE_FULL, { "telCAN_NOT_QUEUE_FULL", "Can not queue at this time: queue is full." } },

This comment has been minimized.

@wilsonianb

wilsonianb Sep 21, 2017

closing curly braces aren't aligned with the rest of the file

This comment has been minimized.

@ximinez

ximinez Sep 21, 2017

Contributor

Doesn't really help readability for me, since the lines are already so insane long, but fixed.

ximinez added some commits Sep 12, 2017

Recover open ledger transactions to the queue (RIPD-1530):
* If the transaction can't be queued, recover to the open ledger once,
  and drop it on the next attempt.
* New result codes for transactions that can not queue.
* Add minimum queue size.
* Remove the obsolete and incorrect SF_RETRY flag.
* fix #2215

@nbougalis nbougalis merged commit cd2d52a into ripple:master Sep 21, 2017

2 of 3 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
default Build finished.
Details

@ximinez ximinez deleted the ximinez:relayopenledger.70 branch Sep 21, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment