Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(provider): choose between constant & exponential backoff for actions #798

Merged
merged 3 commits into from
Dec 1, 2023

Conversation

beagleknight
Copy link
Contributor

@beagleknight beagleknight commented Nov 22, 2023

This changes the poll backoff function to use ConstantBackoff instead of ExponentialBackoff. The current poll_interval option is a bit misleading because assumes a constant polling interval instead of an exponential one. This causes to wait much more time in some scenarios where you requested the resource status to the API a few times.

In scenarios where you expect a specific amount of time, such as creating a cloud instance with a custom snapshot, this change makes the behavior more predictable.

@jooola jooola changed the title Use ConstantBackoff in poll interval feat: use constant backoff in poll interval Nov 22, 2023
@jooola
Copy link
Member

jooola commented Nov 22, 2023

I agree that poll_interval suggests a constant backoff. But we need to be cautious with such change as it might introduce rate limits errors for some users.

The origin of this change is c8d8905

Maybe we want to provide an extra setting such as poll_function = oneOf('constant', 'exponential').

The same change must be made to the plugin provider:

opts = append(opts, hcloud.WithPollBackoffFunc(hcloud.ExponentialBackoff(2, pollInterval)))

@jooola jooola requested a review from apricote November 22, 2023 16:24
@apricote
Copy link
Member

I agree with what @jooola said.

To explain the reasoning for switching it to exponential: The default rate limit for actions is 2 requests per second. If Terraform creates and waits for 5 resources (with refreshes every 500ms), you can get into a situation where the rate limit is exhausted and you can not really make progress, because every request you make is just using up the limit once again.

By using an exponential backoff you avoid this issue, because you just wait for longer between the API calls.

In scenarios where you expect a specific amount of time, such as creating a cloud instance with a custom snapshot, this change makes the behavior more predictable.

In my experience this time is highly variable, depending on the load on the snapshot storage. How long did you wait and how long did you expect it to take? Perhaps we need some kind of upper limit in the exponential backoff (maybe 30-60 seconds), to avoid situations like this.

Making constant vs. exponential configurable sounds good to me, I would prefer the exponential default though.

@fcsonline
Copy link

I have to say that I share the same thoughts as @beagleknight about the expected constant behaviour, but I also understand the risk of introducing a new behaviour combined with the 500ms that will cause much pain. The best solution, as you said is a new configuration option with the exponential default for backward compatibility.

@beagleknight
Copy link
Contributor Author

Thanks for your replies @jooola and @apricote .
I just updated the code to allow users to configure the polling function. I'm not sure if the code is ok, since I don't have much knowledge about go. Can you please review it?

@apricote
Copy link
Member

Thanks! I will take another look tomorrow :)

@apricote apricote changed the title feat: use constant backoff in poll interval feat(provider): choose between constant & exponential backoff for actions Nov 24, 2023
Copy link
Member

@apricote apricote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already looking pretty good :) I have updated the PR title to match the new content (as we use it for generating the Changelog).

There is one remaining inconsistency between the two providers. Do not worry about the failed integration tests, they do not work for external contributors. If you want, you can fix the formatting (see unit tests & lint job), otherwise, I can also take care of it before merging :)

EDIT: Docs are also missing, but I can also do them before merging if you dont want to.

hcloud/plugin_provider.go Outdated Show resolved Hide resolved
…ions

This adds a new configuration parameter called `poll_function` to decide which
type of function is used during the polling.

Allowed values are `constant` and `exponential`. The default value is `exponential`
to avoid changing the current behavior.
@beagleknight
Copy link
Contributor Author

Hey @apricote ! Thanks for your comments. I already addressed them and squashed all commits. Can you review it when you have some time? Thank you!

Copy link
Member

@apricote apricote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more issue i found through the unit tests.

I tried to fix and push to the branch myself, but looks like this is disabled.

hcloud/provider.go Outdated Show resolved Hide resolved
Co-authored-by: Julian Tölle <julian.toelle97@gmail.com>
Copy link
Member

@apricote apricote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are still failing because an import for "github.com/hashicorp/terraform-plugin-sdk/v2/helper/validation" is missing in hcloud/provider.go, could you add this and run make test locally to verify that it works?

@beagleknight
Copy link
Contributor Author

@apricote Sorry for the late response! I just added the missing import to the code (and finally installed the devtools in vscode 😄 )

Copy link
Member

@apricote apricote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @beagleknight! Will be included in the next release :)

@apricote apricote merged commit fa7ea1f into hetznercloud:main Dec 1, 2023
2 of 4 checks passed
@fcsonline fcsonline deleted the chore/use-constant-backoff branch December 10, 2023 19:05
@beagleknight
Copy link
Contributor Author

Hey @apricote ! Do you know when the next version will be released? We would like to use this change in our template :). Thanks!

jooola pushed a commit that referenced this pull request Jan 11, 2024
🤖 I have created a release *beep* *boop*
---


##
[1.45.0](v1.44.1...v1.45.0)
(2024-01-11)


### Features

* **provider:** choose between constant & exponential backoff for
actions
([#798](#798))
([fa7ea1f](fa7ea1f))
* **server:** add `primary_disk_size` attribute in resource and
datasource
([#801](#801))
([98c2f2d](98c2f2d))


### Bug Fixes

* **network:** mark data source fields as computed
([#805](#805))
([63e157c](63e157c))
* **placement-group:** mark data source fields as computed
([#806](#806))
([53069ac](53069ac))
* **server:** missing field primary_disk_size
([#811](#811))
([9bd0ef0](9bd0ef0))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
@apricote
Copy link
Member

Hey @apricote ! Do you know when the next version will be released? We would like to use this change in our template :). Thanks!

Released in v1.45.0

@fcsonline
Copy link

Thank you so much for the release, @apricote !! Testing it right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants