Retries `retry_block` called BEFORE the retry, contrary to documentation #1207

luizkowalski · 2020-11-23T11:14:53Z

Basic Info

Faraday Version: 1.1.0
Ruby Version: 2.7.1

Issue description

I'm dealing with an unstable service that fails randomly and the solution provided by them is "just retry" (I know...)
I've been playing around with the Retry middleware but I can't get my head around a detail: how to know that the retries exhausted

Steps to reproduce

This is my retry configuration for 3 attempts:

def retry_options
  {
    max: 3,
    interval: 0.5,
    interval_randomness: 0.5,
    backoff_factor: 3,
    retry_statuses: Array(500..599),
    retry_block: lambda do |env, _options, retries, _exc|
      log_retry(env, retries)
    end
  }
end

and this is how I log the retry attempt:

def log_retry(env, retries)
  request_data = env.to_h.slice(:url, :status, :response_headers, :request_body, :response_body)

  ::Chronicler.warn(
    'API call failed with 5xx',
    request_data.merge(retries: retries)
  )

  Snitch.error(message: 'Service is down', payload: { request: request_data }) if retries == 3
end

I expected to see 3 entries on our logging platform but instead, I always see 2:

Digging in Faraday's code looks like that my assumption (retries == 3) is wrong since what Faraday does is decrease the counting instead of increasing the retry counting so looks like I should do retries.zero?

But these API call logs I showed above actually failed for good and I didn't get the third retry.

So my question is: how can I check for exhausted retries?

I can gladly update the documentation later if someone helps me understand it and clarify the retry/exhaust part

The text was updated successfully, but these errors were encountered:

luizkowalski · 2020-11-24T09:47:45Z

I was studying the Retry middleware code and I found a couple more things:

retries -= 1 should probably be the last step before the retry itself. as I showed on the logs above, it logs 2 but the way I see it should show 3 because...
the docs say that retry_block is block that is executed after every retry. but reading the code, retry block is invoked before the retry itself. this is what happens: the API call fails, it is included in the HTTP codes and the exception is caught. right away, the retry count is decreased (shouldn't happen yet since the retry hasn't happened at this point), the block that is supposed to be called after the retry is invoked and THEN the actual retry happens.

I was excepting to log something like 3,2,1 but I only see 2,1. I mean, it would be fine if I would also see 0 in the logs but it is not the case

iMacTia · 2020-11-24T10:23:51Z

Hi @luizkowalski and thanks for raising this issue!
I believe the confusion is around the fact that when the retry middleware was written, the very first call attempt was also considered a "retry". I agree with you this doesn't really make sense semantically (the first call is not technically a RE-try 😄), but if you saw it that way would that help understanding what's going on?

Still, I'm a bit conflicted on the next steps: should we update the documentation to make this clearer or attempt a fix so that setting the max option effectively specifies the number of retries AFTER the original call?
I'd very much go for the latter, but that would definitely become a backwards-incompatible fix that might surprise a few people...

iMacTia · 2020-11-24T10:37:01Z

Trash my last comment, reviewing the code after a coffee I realised the max setting actually works as expected (and it's confirmed by tests as well).
So yeah your observations about the mismatch between documentation and code are spot on.

The retry_block is actually called BEFORE each retry, not after; and
The retries variable is confusing and should be updated after the retry.

I'm a bit conflicted on the next steps: should we update the documentation to make this clearer or attempt a fix so that setting the max option effectively specifies the number of retries AFTER the original call?
I'd very much go for the latter, but that would definitely become a backwards-incompatible fix that might surprise a few people...

Unfortunately the above still applies though, I'm a bit reluctant about introducing breaking changes now as this might affect production applications where developers noticed this issue and coded around it instead of raising it as you did.
I'd rather adjust the documentation for this time and postpone the changes to when the middleware will be isolated into a separate gem. We're in fact in the process of moving all adapters and middleware outside of the core Faraday gem so doing this kind of changes will be much easier and safer

luizkowalski · 2020-11-24T11:12:23Z

I realised the max setting actually works as expected (and it's confirmed by tests as well).

So then you would say that the retry mechanism is actually working on my application because I don't see logging with 0 as retry count meaning that the API call went through on the last try? Or should I check for exhausted retries as retry == 1 on my case?

I'm a bit reluctant about introducing breaking changes now as this might affect production applications where developers noticed this issue and coded around it

I totally agree with you though I think maybe a major upgrade should do fine and most developers should be aware of it when upgrading their gems, reading the Changelog, tests, etc.

middleware will be isolated into a separate gem

well, that changes everything I guess...this would be a breaking change and a nice time to introduce those changes

personally, I would be satisfied with documentation changes for now. I'm still a bit confused about the retry count I'm seeing on the logs, the whole 2 and 1 and no 0

iMacTia · 2020-11-24T15:42:51Z

I'm still a bit confused about the retry count I'm seeing on the logs, the whole 2 and 1 and no 0

@luizkowalski indeed, that's definitely something we should explore more 🤔!
I would definitely expect log_retry to be called a 3rd time with retries = 0 there.
The only plausible explanation I can find is, as you already pointed out, that the 3rd request was actually successful. But is there a way to confirm that somehow? Or alternatively, can you create the context where the request would fail all 3 times (e.g. making the service unreachable)?

luizkowalski · 2020-11-24T16:49:12Z

@iMacTia I did some investigation on our logging platform and actually found that it DID work on the third retry so we are safe here hahaha we didn't see a retry = 0 because it wasn't needed. In the beginning, I thought it was weird that all retries are like 2, 1 but apparently, the third time is the charm.

and now, looking back at it, I think it makes sense that we are seeing this 2, 1, 0 instead of 3, 2, 1 for retries because ruby is 0-index based and it is nicer to use retries.zero? to check if it exhausted other than retries == 1

maybe for now update the docs to reflect that the retry_block is called before the retry should be enough

luizkowalski · 2020-11-30T15:18:38Z

Closing this now, thank you @iMacTia for the clarifications

iMacTia · 2020-11-30T16:45:12Z

Thanks for the feedback @luizkowalski and apologies for not replying before.
Yes agree we should be updating the documentation, I'll keep your issue open to remind me that if you don't mind 😃!

luizkowalski · 2020-12-01T11:21:11Z

@iMacTia sure, no worries! thanks again!

Fixes #1207

luizkowalski closed this as completed Nov 30, 2020

iMacTia reopened this Nov 30, 2020

iMacTia added the documentation label Nov 30, 2020

iMacTia changed the title ~~Retries are not shown correctly when you log them~~ Retries retry_block called BEFORE the retry, contrary to documentation Nov 30, 2020

iMacTia added a commit that referenced this issue Dec 31, 2020

Improves retry middleware documentation.

dbb777c

Fixes #1207

iMacTia mentioned this issue Dec 31, 2020

Improve retry middleware documentation. #1228

Merged

iMacTia closed this as completed in #1228 Dec 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retries `retry_block` called BEFORE the retry, contrary to documentation #1207

Retries `retry_block` called BEFORE the retry, contrary to documentation #1207

luizkowalski commented Nov 23, 2020

luizkowalski commented Nov 24, 2020

iMacTia commented Nov 24, 2020

iMacTia commented Nov 24, 2020

luizkowalski commented Nov 24, 2020

iMacTia commented Nov 24, 2020

luizkowalski commented Nov 24, 2020

luizkowalski commented Nov 30, 2020 •

edited

iMacTia commented Nov 30, 2020

luizkowalski commented Dec 1, 2020

Retries retry_block called BEFORE the retry, contrary to documentation #1207

Retries retry_block called BEFORE the retry, contrary to documentation #1207

Comments

luizkowalski commented Nov 23, 2020

Basic Info

Issue description

Steps to reproduce

luizkowalski commented Nov 24, 2020

iMacTia commented Nov 24, 2020

iMacTia commented Nov 24, 2020

luizkowalski commented Nov 24, 2020

iMacTia commented Nov 24, 2020

luizkowalski commented Nov 24, 2020

luizkowalski commented Nov 30, 2020 • edited

iMacTia commented Nov 30, 2020

luizkowalski commented Dec 1, 2020

Retries `retry_block` called BEFORE the retry, contrary to documentation #1207

Retries `retry_block` called BEFORE the retry, contrary to documentation #1207

luizkowalski commented Nov 30, 2020 •

edited