Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Releases: health-checks, auto-rollback, gradual rollout and a/b releases #139

Open
2 tasks done
friism opened this issue Feb 7, 2023 · 4 comments
Open
2 tasks done
Assignees
Labels

Comments

@friism
Copy link
Contributor

friism commented Feb 7, 2023

Required Terms

What service(s) is this request for?

runtime

Tell us about what you're trying to solve. What challenges are you facing?

We should improve how code changes (releases) are rolled out with Heroku. We should consider adding:

  • Healthchecks: Currently Heroku deems a dyno healthy once it has bound to $PORT. This is too simplistic since the app may not actually be ready to serve traffic by then
  • Auto-rollback: We don't automatically fail a release and auto-rollback, even when the platform can determine that dynos running the new release are crashing
  • Gradual rollout: We have three different mechanisms for rollouts. The standard common runtime behavior, Preboot in common runtime and gradual rollout in Private Spaces (PS). The PS behavior is often not gradual enough and causes latency spikes apps under load
  • A/B releases and canary releases: For even more assurance and flexibility

For non-web dynos, we should also establish a healthcheck convention and support rolling deploys (currently non-web dynos don't support any form of gradual rollout)

@trevorturk
Copy link

I'm excited to see this on the roadmap!

I'm especially interested in gradual rollout and canary deploys. This sort of thing has been on my Heroku wishlist for a long time.

I'd also like to suggest considering an "adaptive preboot" for example using Rails recent addition rails/rails#46936

If an app had a standard endpoint that could return 200 OK when everything is booted, we could make the zero downtime deploy via preboot much quicker, instead of waiting for a static 3 minutes. This could also be leveraged for auto-rollback, as in, don't switch over to the new code unless the health/heartbeat/up endpoint responds 200 OK.

Also worth mentioning is that I'd like to see a an option added to rollback which would bypass the preboot delay for emergency use.

Thanks!

@stevenharman
Copy link

re: Gradual Rollout.

I'd be happy just to see preboot get the boot, and instead see Common Runtime have a rolling restart like Private Spaces does. A cherry on top would be the ability to configure the percentage of the roll - it's hard-coded to 25% on Dogwood, IIRC. But in a large enough formation, it'd be nice to tune that down even further.

@locofocos
Copy link

I'd be excited to start with a limited version of this: a healthcheck endpoint + auto-rollback. There are cases where we have pushed code changes that caused our Rails application to fail to boot. A simple GET to a healthcheck endpoint would have returned a 500. I would love if heroku would make such a request to our new dynos during preboot, then halt the rest of the deploy if it can't get a 200 response.

@nightpool
Copy link

For canary deployments, having something gradual that would be controlled by the error rate of each release would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 📋 Researching
Development

No branches or pull requests

6 participants