Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries retry_block called BEFORE the retry, contrary to documentation #1207

Closed
luizkowalski opened this issue Nov 23, 2020 · 9 comments · Fixed by #1228
Closed

Retries retry_block called BEFORE the retry, contrary to documentation #1207

luizkowalski opened this issue Nov 23, 2020 · 9 comments · Fixed by #1228

Comments

@luizkowalski
Copy link

Basic Info

  • Faraday Version: 1.1.0
  • Ruby Version: 2.7.1

Issue description

I'm dealing with an unstable service that fails randomly and the solution provided by them is "just retry" (I know...)
I've been playing around with the Retry middleware but I can't get my head around a detail: how to know that the retries exhausted

Steps to reproduce

This is my retry configuration for 3 attempts:

def retry_options
  {
    max: 3,
    interval: 0.5,
    interval_randomness: 0.5,
    backoff_factor: 3,
    retry_statuses: Array(500..599),
    retry_block: lambda do |env, _options, retries, _exc|
      log_retry(env, retries)
    end
  }
end

and this is how I log the retry attempt:

def log_retry(env, retries)
  request_data = env.to_h.slice(:url, :status, :response_headers, :request_body, :response_body)

  ::Chronicler.warn(
    'API call failed with 5xx',
    request_data.merge(retries: retries)
  )

  Snitch.error(message: 'Service is down', payload: { request: request_data }) if retries == 3
end

I expected to see 3 entries on our logging platform but instead, I always see 2:

image

Digging in Faraday's code looks like that my assumption (retries == 3) is wrong since what Faraday does is decrease the counting instead of increasing the retry counting so looks like I should do retries.zero?

But these API call logs I showed above actually failed for good and I didn't get the third retry.

So my question is: how can I check for exhausted retries?

I can gladly update the documentation later if someone helps me understand it and clarify the retry/exhaust part

@luizkowalski
Copy link
Author

I was studying the Retry middleware code and I found a couple more things:

  1. retries -= 1 should probably be the last step before the retry itself. as I showed on the logs above, it logs 2 but the way I see it should show 3 because...

  2. the docs say that retry_block is block that is executed after every retry. but reading the code, retry block is invoked before the retry itself. this is what happens: the API call fails, it is included in the HTTP codes and the exception is caught. right away, the retry count is decreased (shouldn't happen yet since the retry hasn't happened at this point), the block that is supposed to be called after the retry is invoked and THEN the actual retry happens.

I was excepting to log something like 3,2,1 but I only see 2,1. I mean, it would be fine if I would also see 0 in the logs but it is not the case

@iMacTia
Copy link
Member

iMacTia commented Nov 24, 2020

Hi @luizkowalski and thanks for raising this issue!
I believe the confusion is around the fact that when the retry middleware was written, the very first call attempt was also considered a "retry". I agree with you this doesn't really make sense semantically (the first call is not technically a RE-try 😄), but if you saw it that way would that help understanding what's going on?

Still, I'm a bit conflicted on the next steps: should we update the documentation to make this clearer or attempt a fix so that setting the max option effectively specifies the number of retries AFTER the original call?
I'd very much go for the latter, but that would definitely become a backwards-incompatible fix that might surprise a few people...

@iMacTia
Copy link
Member

iMacTia commented Nov 24, 2020

Trash my last comment, reviewing the code after a coffee I realised the max setting actually works as expected (and it's confirmed by tests as well).
So yeah your observations about the mismatch between documentation and code are spot on.

  • The retry_block is actually called BEFORE each retry, not after; and
  • The retries variable is confusing and should be updated after the retry.

I'm a bit conflicted on the next steps: should we update the documentation to make this clearer or attempt a fix so that setting the max option effectively specifies the number of retries AFTER the original call?
I'd very much go for the latter, but that would definitely become a backwards-incompatible fix that might surprise a few people...

Unfortunately the above still applies though, I'm a bit reluctant about introducing breaking changes now as this might affect production applications where developers noticed this issue and coded around it instead of raising it as you did.
I'd rather adjust the documentation for this time and postpone the changes to when the middleware will be isolated into a separate gem. We're in fact in the process of moving all adapters and middleware outside of the core Faraday gem so doing this kind of changes will be much easier and safer

@luizkowalski
Copy link
Author

I realised the max setting actually works as expected (and it's confirmed by tests as well).

So then you would say that the retry mechanism is actually working on my application because I don't see logging with 0 as retry count meaning that the API call went through on the last try? Or should I check for exhausted retries as retry == 1 on my case?

I'm a bit reluctant about introducing breaking changes now as this might affect production applications where developers noticed this issue and coded around it

I totally agree with you though I think maybe a major upgrade should do fine and most developers should be aware of it when upgrading their gems, reading the Changelog, tests, etc.

middleware will be isolated into a separate gem

well, that changes everything I guess...this would be a breaking change and a nice time to introduce those changes

personally, I would be satisfied with documentation changes for now. I'm still a bit confused about the retry count I'm seeing on the logs, the whole 2 and 1 and no 0

@iMacTia
Copy link
Member

iMacTia commented Nov 24, 2020

I'm still a bit confused about the retry count I'm seeing on the logs, the whole 2 and 1 and no 0

@luizkowalski indeed, that's definitely something we should explore more 🤔!
I would definitely expect log_retry to be called a 3rd time with retries = 0 there.
The only plausible explanation I can find is, as you already pointed out, that the 3rd request was actually successful. But is there a way to confirm that somehow? Or alternatively, can you create the context where the request would fail all 3 times (e.g. making the service unreachable)?

@luizkowalski
Copy link
Author

@iMacTia I did some investigation on our logging platform and actually found that it DID work on the third retry so we are safe here hahaha we didn't see a retry = 0 because it wasn't needed. In the beginning, I thought it was weird that all retries are like 2, 1 but apparently, the third time is the charm.

and now, looking back at it, I think it makes sense that we are seeing this 2, 1, 0 instead of 3, 2, 1 for retries because ruby is 0-index based and it is nicer to use retries.zero? to check if it exhausted other than retries == 1

maybe for now update the docs to reflect that the retry_block is called before the retry should be enough

@luizkowalski
Copy link
Author

luizkowalski commented Nov 30, 2020

Closing this now, thank you @iMacTia for the clarifications

@iMacTia
Copy link
Member

iMacTia commented Nov 30, 2020

Thanks for the feedback @luizkowalski and apologies for not replying before.
Yes agree we should be updating the documentation, I'll keep your issue open to remind me that if you don't mind 😃!

@iMacTia iMacTia reopened this Nov 30, 2020
@iMacTia iMacTia changed the title Retries are not shown correctly when you log them Retries retry_block called BEFORE the retry, contrary to documentation Nov 30, 2020
@luizkowalski
Copy link
Author

@iMacTia sure, no worries! thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants