provider/gce: Don't base the name of firewall rules on hash of CIDRs #7246

Merged
merged 11 commits into from Apr 24, 2017

Conversation

Projects
None yet
4 participants
Member

babbageclunk commented Apr 18, 2017

Description of change

Found when testing #7234 - the updated addresses would be logged for updating the firewall, but the update wouldn't happen. The error on updating was being swallowed somewhere so it wasn't clear what was happening from the logs. (I'll fix that in a separate PR when I track it down.)

We were generating the name of the firewall using a hash of the new rules, and trying to update the firewall to have the new name. GCE will prevent a mismatch between the URL and the name with this
error:

opening port(s) [3306/tcp from 35.187.146.143/32]: googleapi: Error 400: Invalid value for field 'resource.name': 'juju-893de4-0-d4bc50'. The name of the resource in request body must be the same as the resource name in the URL, invalid

Since we can't update the name of the firewall rules it seems like the options are:

  1. Create a new one (with the name still based on the hashed CIDRs) and then remove the old one.
  2. Change the naming to not be based on hashing the CIDRs - use a random unique string instead.

Option 1 can't be done atomically, and cleaning up the mess if the create happened but the delete didn't seems like it would be nasty. So this change implements option 2.

Since the names will be unpredictable we need to hang on to them when we get the existing rules from GCE so we can send them back with updates/deletes. The old version of the code wasn't doing that - names were thrown away when the compute.Firewall is converted into []network.IngressRule because it could regenerate the name from the CIDRs.

QA steps

  • Bootstrap on GCE, create two models with a cross-model relation between mysql and wordpress.
  • Expose wordpress
  • Check that you can successfully see the wordpress setup page at the wordpress public address.
  • In the GCE console, stop the wordpress machine and then start it again - note that its public address has changed.
  • Wait for the Juju instance poller to notice the updated public address.
  • Check that the wordpress setup page is available at the new address.
Member

babbageclunk commented Apr 18, 2017

!!chittychitty!!

Owner

jameinel commented Apr 20, 2017

Member

babbageclunk commented Apr 20, 2017

@jameinel: yeah, it does cause that problem, @wallyworld pointed it out to me in IRC. Since we can't update the name of the firewall rules it seems like the options are:

  1. Create a new one (with the name still based on the hashed CIDRs) and then remove the old one.
  2. Change the naming to not be based on hashing the CIDRs - use a random unique string instead.

Option 1 can't be done atomically, and cleaning up the mess if the create happened but the delete didn't seems like it would be nasty.

So I'm in the middle of doing 2 at the moment.

Since the names will be unpredictable we need to hang on to them when we get the existing rules from GCE so we can send them back with updates/deletes. The old version of the code wasn't doing that - names were thrown away when the compute.Firewall is converted into []network.IngressRule because it could regenerate the name from the CIDRs.

(Sorry, I should have marked this as a WIP - I'm still fixing up tests.)

@babbageclunk babbageclunk changed the title from provider/gce: Don't change the name of firewall when updating to WIP provider/gce: Don't change the name of firewall when updating Apr 20, 2017

Member

babbageclunk commented Apr 21, 2017

!!chittychitty!!

@babbageclunk babbageclunk changed the title from WIP provider/gce: Don't change the name of firewall when updating to provider/gce: Don't base the name of firewall rules on hash of CIDRs Apr 21, 2017

babbageclunk added some commits Apr 18, 2017

Don't rename the firewall when updating the rules
GCE will prevent a mismatch between the URL and the name with this
error:

opening port(s) [3306/tcp from 35.187.146.143/32]: googleapi: Error 400: Invalid value for field 'resource.name': 'juju-893de4-0-d4bc50'. The name of the resource in request body must be the same as the resource name in the URL, invalid
Decouple GCE firewall name from source CIDRs
We can't update the firewall name when we change the source CIDRs in it,
because GCE rejects the update (since the name doesn't match the URL).
That means we can't use hashes of the CIDRs as the name - when we change the
CIDRs then we'll no longer be able to find the matching rule since it
won't have the name we expect based on the CIDRs.

Instead, assign random names to the firewall rules as they're created,
and ensure we preserve them in the RuleSet so that we can send them back
to GCE when updating an existing rule.

We still use hashing of the CIDRs to match input rules against existing
rules to determine whether to update or create a rule.

UNTESTED
Update OpenPorts to use RandomSuffixNamer
This will check that the suffix generated is unique (except in the
special case of the rule for 0.0.0.0/0 - but there should only be one
like that).
Don't require a FirewallNamer for OpenPorts
Add OpenPortsWithNamer that tests call, everything else can just use the
default namer and doesn't need to worry about passing one in to OpenPorts.
Fix bug in RandomSuffixNamer
It meant that everything was being named "target" instead of the right
thing.
Added a test for RandomSuffixNamer
Since I found a bug in it in manual testing.
SourceCIDRs should be considered unordered
We need to sort them before hashing to ensure that ingress rules with
the same source CIDRs in different orders still get combined.
Increase length of hash used for CIDR key
Chasing down a bug caused by unrelated CIDRs colliding would be really
annoying, this will reduce the likelihood.

It seems to be we're missing a test - the existing connSuite tests have been updated to reflect the new naming algorithm, but a test to check the update bug which necessitated this PR I think is needed

provider/gce/google/conn_network.go
+
+// FirewallNamer generates a unique name for a firewall given the firewall, a
+// prefix and a set of current firewall rule names.
+type FirewallNamer func(*firewall, string, set.Strings) (string, error)
@wallyworld

wallyworld Apr 24, 2017

Owner

Best to name the function params to make parsing the description easier.

provider/gce/google/conn_network.go
- }
+// RandomSuffixNamer tries to find a unique name for the firewall by
+// appending a random suffix.
+func RandomSuffixNamer(fw *firewall, prefix string, names set.Strings) (string, error) {
@wallyworld

wallyworld Apr 24, 2017

Owner

could we call "names" "existingNames" for clarity

Member

babbageclunk commented Apr 24, 2017

$$merge$$

Contributor

jujubot commented Apr 24, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Apr 24, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10728

Member

babbageclunk commented Apr 24, 2017

These are spurious failures - one's in a new VSphere test that Andrew's looking at, the other is one I've seen before.

$$merge$$

Contributor

jujubot commented Apr 24, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

Contributor

jujubot commented Apr 24, 2017

Build failed: Tests failed
build url: http://juju-ci.vapour.ws:8080/job/github-merge-juju/10731

Member

babbageclunk commented Apr 24, 2017

grr
$$merge$$

Contributor

jujubot commented Apr 24, 2017

Status: merge request accepted. Url: http://juju-ci.vapour.ws:8080/job/github-merge-juju

@jujubot jujubot merged commit 97f544c into juju:develop Apr 24, 2017

@babbageclunk babbageclunk deleted the babbageclunk:gce-firewaller-fix branch Apr 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment