Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Implement disable procedure for BigTent proxies #130

Closed
ozten opened this Issue Mar 6, 2013 · 19 comments

Comments

Projects
None yet
6 participants
Member

ozten commented Mar 6, 2013

During the initial launch, or later during normal operations, it should be possible to disable a BigTent proxied service.

We'll use this bug to document process and create any missing artifacts.

This should work for

  • Disabling yahoo.com after initial launch
  • Disabling later launches like gmail.com while leaving yahoo.com running
  • Disabling a specific service at a later point in time
  • It should be easy to re-enable a service once the regression is fixed

@ozten ozten was assigned Mar 6, 2013

Member

ozten commented Mar 6, 2013

An earlier suggestion was to remove Persona's idp_proxies config. This won't work, as we have a "recently seen primary" timeout:

We should ship an rpm of bigtent with server/views/well_known_browserid.ejs which has

{
    "disabled": true
}

which will cause browserid to show

for users whom had used the Yahoo proxy.

In order for this to work, the BigTent setting pub_key_ttl should be set to 0 at the launch. Once we gain confidence, we can bump this TTL up.

Note We don't touch Persona's config. It will still have proxy_idps with yahoo.com enabled.

Turning yahoo.com back on is easy! We just push an rpm which has a proper server/views/well_known_browserid.ejs. BrowserID will see support has returned and do the secondary -> primary transition.

Member

ozten commented Mar 6, 2013

We can do this a couple ways...

  1. Provide a script which is run on each webhead to re-write server/views/well_known_browserid.ejs.
  2. Provide a different branch, which is the same as the release train, but which has a disabled server/views/well_known_browserid.ejs.
  3. ???
Member

ozten commented Mar 6, 2013

  1. Add a new configuration element which disables the current bigtent service. This would cause the { "disabled": true} to be returned.
Member

ozten commented Mar 7, 2013

@lloyd What do you think of this proposal and specifically Option 4?

Owner

callahad commented Mar 7, 2013

I like option 4 -- push a config change, bounce the bigtent instances, done.

Member

ozten commented Mar 7, 2013

@gene1wood and @jrgm - How does this proposal sound?

The comments in the original description, plus a new configuration mentioned in Option 4.

Owner

gene1wood commented Mar 7, 2013

I also prefer option 4.

This may be out of scope for this ticket but instead of :

  • change config file on disk
  • restart node services causing potential bad user experience during the restart; or do an orchestrated draining of traffic node by node, in order to restart services while no traffic is on the node

it would be much nicer to :

  • change config file on disk
  • node app, every 30 seconds (or some number), reads the ondisk config and if there have been any changes, brings those changes live. This could be done by checking a timer countdown with every inbound request and when you've exceeded your 30 seconds (or whatever), check config and clear the timer

Or alternatively :

  • change config file on disk
  • the administrator sends some signal to the running node app to reload config without stopping the service
Member

ozten commented Mar 7, 2013

restart node services causing potential bad user experience during the restart; or do an orchestrated draining of traffic node by node, in order to restart services while no traffic is on the node

draining the traffic makes sense.

Depending on severity of the reason we're disabling bigtent, it may not be a big deal to restart the services to disable the service.

node app, every 30 seconds (or some number), reads the ondisk config and if there have been any changes,

This sounds complicated.

We already have something like this in the /.well-known/browserid TTL setting (mentioned in comment 1).

Ideally, you'd want this 30 second timer would have to be coordinated across Yahoo bigtent instances which is even more complicated / error prone.

the administrator sends some signal to the running node app to reload config without stopping the service

bigtent startup is very fast. Although existing connections would be broken, you effectively get this with deamontools and a killing the process, no?

Owner

callahad commented Mar 7, 2013

It should be simple enough to catch SIGUSR1 or something like that to trigger a re-read of a single config value...

Owner

gene1wood commented Mar 7, 2013

Although existing connections would be broken, you effectively get this with deamontools and a killing the process, no?

I hadn't thought so but I don't know. I'm unsure of what happens to users in the middle of an exchange and users that connect in the window between when the service stops and when daemontools starts it again. I would image the user would get 500 bad gateway from nginx during that period of time.

It should be simple enough to catch SIGUSR1 or something like that to trigger a re-read of a single config value...

Ya, that's the kind of thing I was thinking. Essentially I'm thinking of the "reload" capability that exists in most daemons (named, httpd, squid, etc)

Member

ozten commented Mar 7, 2013

My concern with "dynamic config" are:

  1. It is easy to shoot yourself in the foot
  2. We're optimizing for a feature that will only be used once or twice

If this is really important, I'll add it, but I'd rather use our existing code deployment technique for changing config values, where we've already solved these problems (such as draining traffic, etc).

Owner

gene1wood commented Mar 7, 2013

Oh ya I meant this as something general to persona and bigtent, not specific to this scenario of rolling back a yahoo change.

For config changes it would be sweet if persona or bigtent had a config reload function like many apps have.

Member

jrgm commented Mar 8, 2013

So, yeah option 4 with a simple config change, drain, restart.

(Off topic, but a re-read-your-configs signal could someday-but-not-now be a useful thing, as it's much easier than the orchestrated drain. But yeah, dynamic config can be a foot gun, although there are use cases for it in some environments). /cc @jbonacci

Contributor

lloyd commented Mar 11, 2013

I like option 4.

Contributor

lloyd commented Mar 11, 2013

Off Topic

Configuration reload is interesting, we could build a .reload() function into convict /cc @zaach

But like everyone else, I'm unsure we should actually have dynamic config. The argument for is quicker configuration change so that we can, i.e. disable bigtent more quickly and reliably. The argument against is that writing code that expects config to change is hard, and testing it is harder. It would be a shame to put our software in a subtly broken state and affect users because a cached version of a config param doesn't match the dynamic version...

So how about the other way. What operational changes could we make to have config change / restart be better?

Member

ozten commented Mar 13, 2013

@ozten ozten closed this Mar 13, 2013

Owner

gene1wood commented Mar 13, 2013

@lloyd : ya I'm not for dynamic config, just a reload option like apache squid bind etc have that allows me to force the app to reload it's config without having to stop accepting connections or restart the app

Member

jrgm commented Mar 14, 2013

Late to the party - I also prefer Option 4.
(And, I am wondering why I did not get a "signal" from GitHub, when @jrgm referenced my name six comments back...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment