Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New config parser, HCL support, multiple bind addrs #3480

Merged
merged 70 commits into from Sep 25, 2017

Conversation

Projects
None yet
4 participants
@magiconair
Copy link
Contributor

commented Sep 18, 2017

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

  • add HCL support
  • all configuration fragments in tests and for default config are
    expressed as HCL fragments
  • HCL fragments can be provided on the command line so that they
    can eventually replace the command line flags.
  • HCL/JSON fragments are parsed into a temporary Config structure
    which can be merged using reflection (all values are pointers).
    The existing merge logic of overwrite for values and append
    for slices has been preserved.
  • A single builder process generates a typed runtime configuration
    for the agent.

The new implementation is more strict and fails in the builder process if no
valid runtime configuration can be generated. Therefore, additional
validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used and
should therefore be removed.

As a side effect the code also supports multiple bind addresses for HTTP,
HTTPS and DNS via go-sockaddr and static ip addresses.

Fixes #895
Fixes #2217
Fixes #2825

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 18, 2017

There are still a handful of failing tests which I'll address tomorrow.

@magiconair magiconair requested a review from slackpad Sep 18, 2017

@magiconair magiconair self-assigned this Sep 20, 2017

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 20, 2017

Still need to update all the docs ...

@magiconair magiconair force-pushed the refactor-config branch from d8b8905 to a923011 Sep 20, 2017

@magiconair magiconair changed the title New config parser New config parser, HCL support, multiple bind addrs Sep 20, 2017

@shilov

This comment has been minimized.

Copy link
Contributor

commented Sep 20, 2017

👍 big thanks for adding HCL support + enabling multiple bind addresses

cfg.ACLEnforceVersion8 = Bool(false)
a := NewTestAgent(t.Name(), cfg)
a := NewTestAgent(t.Name(), `
acl_enforce_version_8 = true

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

This actually set it to true when it was supposed to be false. I'll fix this and make sure this test positively ensures that things are working.

This comment has been minimized.

Copy link
@slackpad

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

The default is set to a wrong key acl_enforce_version8 (not the lacking final underscore) which may contribute to the test failing.

I am working on a patch which detects invalid field names again and restores the previous functionality. However, since encoding/json and hashicorp/hcl decode embedded structs differently (map[string]interface{} vs []map[string]interface{}). Therefore, just using mapstructure as before to detect unused fields won't work. This is a bug in HCL itself for which there is no easy fix AFAICT. Also, using the embedded JSON parser in hcl will also return []map[string]interface{} which will trip up mapstructure. hashicorp/hcl#209

Unless there is an option in mapstructure to do the mapping I think there are several viable approaches:

  1. Patching all []map to map and then use mapstructure for the mapping and invalid field detection. However, watches has []map[string]interface{} so there would need to be an exception for that field and others in the future.

  2. Build a custom invalid field detector which given a map[string]interface{} and struct type+tag name can detect which fields will not be mappable.

  3. Enhance the hashicorp/hcl decoder with an invalid field detector and do the same thing with encoding/json by first forking it and then adding that to the custom code until golang/go#15314 is added to the stdlib.

  4. Add that capability to mapstructure.

For the 1.0 milestone, the first option seems to be the one with the least amount of work so I'll try that first.

@@ -167,13 +101,20 @@ func TestAgent_TokenStore(t *testing.T) {
}

func TestAgent_CheckPerformanceSettings(t *testing.T) {
t.Skip("this test fails because of the change in agent.consulConfig(). We need to decide what to do")

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

TODO - I'll take a look at this.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

I've followed your suggestion and dropped the PerformanceRaftMultiplier parameter and instead compute the right values during the build step.

defer a.wgServers.Done()

err := s.ListenAndServe(p.Net, p.Addr, func() { notif <- p })
// todo(fs): does this also work for unix sockets?

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

We should add unix socket unit tests before we finish the beta.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

The underlying DNS lib does not support Unix sockets. We can add this at a later stage.

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

Got it - yeah if DNS never could have supported it previously it's not important to get that in now (also seems like a fairly rare / low priority use case).

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

unless you want to test DNS on the socket without exposing it on a network interface. But lets add that later.

return tc.IncomingTLSConfig()
}

func (c *RuntimeConfig) Sanitized() RuntimeConfig {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

We should add a test for this - It would be good to fuzz the structure, sanitize, and make sure anything with token in there is redacted. Maybe we should just use reflect here and a small list of other places to redact and make the logic generic.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

Added a test and implemented it reflectively. The expected test result is static so the test should fail once a new field with token, key or secret in the name is added. gofuzz failed on me with some error but I don't think it is necessary with this approach.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

I've used different values for the WAN retry join case in a later commit.

Head: []Source{DefaultSource()},
}

if b.stringVal(b.Flags.DeprecatedDatacenter) != "" && b.stringVal(b.Flags.Config.Datacenter) == "" {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

Why does this need a special case here? It's also handled in Build().

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

This looks like a left over from moving code around.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

It is a bit more complex than that but why do we want to carry on deprecated flags and fields to 1.0 in the first place? This looks like a good time to finally remove them.

CleanupDeadServers *bool `json:"cleanup_dead_servers,omitempty" hcl:"cleanup_dead_servers"`
DisableUpgradeMigration *bool `json:"disable_upgrade_migration,omitempty" hcl:"disable_upgrade_migration"`
LastContactThreshold *string `json:"last_contact_threshold,omitempty" hcl:"last_contact_threshold"`
// todo(fs): do we need uint64 here? If yes, then I need to write a special parser b/c of JSON limit of 2^53-1 for ints

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

This will always be much lower than the max. We could even accept an int here and cast it later.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

thought so. will fix

rt.DataDir = dataDir
},
},
{ // todo(fs): shouldn't this be '-encrypt-key'?

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

No this looks correct.

rt.DataDir = dataDir
},
},
{ // todo(fs): shouldn't this be '-node-name'?

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

This is correct for the CLI flags (it's node_name in JSON).

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

yeah, the fact that they are different is one of the issues.

rt.DataDir = dataDir
},
},
{

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

This is a duplicate of the previous test case - did you have something else in mind here?

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 24, 2017

Author Contributor

no, copy/paste left over when creating additional test cases.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

I've removed it.

},
},
{
desc: "client addr, addresses and ports < 0",

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

Paused here.

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 24, 2017

Contributor

Got to this spot - I'll continue from here...

@slackpad

This comment has been minimized.

Copy link
Contributor

commented Sep 24, 2017

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 24, 2017

I've pushed a change which removes all deprecated fields and flags and keeps a list in agent/config/doc.go The last remaining thing is recursor since we have both recursor and recursors and usually merge the single recursor into the list.

@magiconair magiconair added this to the 1.0 milestone Sep 25, 2017

@slackpad
Copy link
Contributor

left a comment

Made it through the whole thing and I think we are super close. If we can get a check in there to catch unknown fields (the simple plan with an exception for watches sounded reasonable) before the beta I think that would a good cross-check for the all the unit tests and to help users.

patch: func(rt *RuntimeConfig) {
rt.Bootstrap = true
rt.ServerMode = true
rt.LeaveOnTerm = false

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

In client mode the LeaveOnTerm and SkipLeaveOnInt values should be the opposite of server mode, so true and false, respectively. We should add a case that covers that behavior.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

done.

flags: []string{
`-data-dir=runtime_test.go`,
},
patch: func(rt *RuntimeConfig) {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Do we need a patch section when it's checking for an error?

hcl: []string{`dns_config = { udp_answer_limit = 0 }`},
err: "dns_config.udp_answer_limit cannot be 0. Must be positive",
},
{

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Duplicate of previous test.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

removed

"dogstatsd_addr": "0wSndumK",
"dogstatsd_tags": [ "3N81zSUB","Xtj8AnXZ" ],
"filter_default": true,
"prefix_fi

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Seems like we can eventually divorce the runtime config from the structure with both check and checks.

@@ -2913,7 +2858,7 @@ func TestDNS_ServiceLookup_AnswerLimits(t *testing.T) {
expectedANYQuery int
expectedANYQueryID int
}{
{"0", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
//{"0", 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Curious what happened here? Does this fail to validate then we can just delete it.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

The new code did not allow dns_config.udp_answer_limit to be zero which made this test fail. I guess we commented this out while getting things to work. I've restored the previous behavior.

@@ -0,0 +1 @@
9e8edccb-1f67-0ef6-a36f-e5121d91d8fb

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

This file should be deleted (or does a test use this)?


return nil
}
// done(fs): func ValidateSegments(conf *Config) error {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Did this get moved elsewhere? There's an enterprise version of this file that's in the enterprise fork, so we had this out here so we can swap it with build tags.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

I had incorporated this into the normal validation. However, to support different types of validation for enterprise I've split up the tests so that we can overload them. I'll send you the ent fragments you can add.


func TestDefaultConfig(t *testing.T) {
for i := 0; i < 500; i++ {
t.Run("bla", func(t *testing.T) {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

bla might as well be an empty string


// GetPrivateIPv4 returns the list of private network IPv4 addresses on
// all active interfaces.
func GetPrivateIPv4() ([]*net.IPAddr, error) {

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

We could probably remove these helpers and change the default config to go-sockaddr templates that accomplishes this as well.

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

That makes mocking the address resolution more difficult. Let me think about a solution.

@@ -58,7 +58,8 @@ func NewIPv4Addr(ipv4Str string) (IPv4Addr, error) {
// Strip off any bogus hex-encoded netmasks that will be mis-parsed by Go. In
// particular, clients with the Barracuda VPN client will see something like:
// `192.168.3.51/00ffffff` as their IP address.
trailingHexNetmaskRe := trailingHexNetmaskRE.Copy()
// trailingHexNetmaskRe := trailingHexNetmaskRE.Copy()

This comment has been minimized.

Copy link
@slackpad

slackpad Sep 25, 2017

Contributor

Is this a monkey patch? Why was this needed?

This comment has been minimized.

Copy link
@magiconair

magiconair Sep 25, 2017

Author Contributor

Yes, for some reason this creates a data race which made tests fail randomly at high concurrency. I still need to understand why that is.

@slackpad

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2017

I'll try to sort the Raft multiplier stuff.

@slackpad

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2017

@magiconair I'm thinking we can get rid of PerformanceRaftMultiplier from the runtime struct and do this scaling at the end of builder.go:Build(), like we do for other derived parameters:

base.RaftConfig.HeartbeatTimeout = a.config.ConsulRaftHeartbeatTimeout * time.Duration(a.config.PerformanceRaftMultiplier)
base.RaftConfig.LeaderLeaseTimeout = a.config.ConsulRaftLeaderLeaseTimeout * time.Duration(a.config.PerformanceRaftMultiplier)
base.RaftConfig.ElectionTimeout = a.config.ConsulRaftElectionTimeout * time.Duration(a.config.PerformanceRaftMultiplier)
@slackpad

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2017

@magiconair Heads up I pushed my small stuff from the -slackpad branch up onto yours, so please pull that.

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2017

Will do and will go through the comments. I have something working for the invalid fields. Not pretty but it should do for the beta. I'll go through the rest of the comments and fix/respond

magiconair added some commits Aug 28, 2017

new config parser for agent
This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

magiconair added some commits Sep 25, 2017

@magiconair magiconair force-pushed the refactor-config branch from 29f1ee4 to dddc6af Sep 25, 2017

magiconair added some commits Sep 25, 2017

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2017

I've rebased the branch onto master and added the code for detecting invalid fields. I've immediately found two tests which were broken so this is very useful. The approach is a bit hacky but I've added a long description of why we're doing this. If I can come up with a better approach then I'll try over the next days.

The open issues are:

  • drop recursor in favor of recursors
  • look at the commented out section int TestDNS_ServiceLookup_AnswerLimits
  • document/follow up the monkey patches for hashicorp/go-sockaddr and hashicorp/hcl
  • require .json or .hcl extension for all config files. (need input/decision)

@slackpad you need to incorporate the enterprise test fragments for the segment code I've sent you.

@slackpad

This comment has been minimized.

Copy link
Contributor

commented Sep 25, 2017

magiconair added some commits Sep 25, 2017

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2017

We also need to decide where to put the documentation that was part of agent/config.Config. We can either restore it to the Config struct, add it to the RuntimeConfig or omit it from code altogether and put it in the public documentation. Opinions?

magiconair and others added some commits Sep 25, 2017

@slackpad slackpad merged commit 1221658 into master Sep 25, 2017

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@slackpad slackpad deleted the refactor-config branch Sep 25, 2017

@magiconair

This comment has been minimized.

Copy link
Contributor Author

commented Sep 25, 2017

F I N A L L Y !

@slackpad slackpad referenced this pull request Sep 25, 2017

Closed

Config refactor burn down #3492

9 of 10 tasks complete

HadarGreinsmark added a commit to HadarGreinsmark/consul that referenced this pull request Oct 19, 2017

New config parser, HCL support, multiple bind addrs (hashicorp#3480)
* new config parser for agent

This patch implements a new config parser for the consul agent which
makes the following changes to the previous implementation:

 * add HCL support
 * all configuration fragments in tests and for default config are
   expressed as HCL fragments
 * HCL fragments can be provided on the command line so that they
   can eventually replace the command line flags.
 * HCL/JSON fragments are parsed into a temporary Config structure
   which can be merged using reflection (all values are pointers).
   The existing merge logic of overwrite for values and append
   for slices has been preserved.
 * A single builder process generates a typed runtime configuration
   for the agent.

The new implementation is more strict and fails in the builder process
if no valid runtime configuration can be generated. Therefore,
additional validations in other parts of the code should be removed.

The builder also pre-computes all required network addresses so that no
address/port magic should be required where the configuration is used
and should therefore be removed.

* Upgrade github.com/hashicorp/hcl to support int64

* improve error messages

* fix directory permission test

* Fix rtt test

* Fix ForceLeave test

* Skip performance test for now until we know what to do

* Update github.com/hashicorp/memberlist to update log prefix

* Make memberlist use the default logger

* improve config error handling

* do not fail on non-existing data-dir

* experiment with non-uniform timeouts to get a handle on stalled leader elections

* Run tests for packages separately to eliminate the spurious port conflicts

* refactor private address detection and unify approach for ipv4 and ipv6.

Fixes hashicorp#2825

* do not allow unix sockets for DNS

* improve bind and advertise addr error handling

* go through builder using test coverage

* minimal update to the docs

* more coverage tests fixed

* more tests

* fix makefile

* cleanup

* fix port conflicts with external port server 'porter'

* stop test server on error

* do not run api test that change global ENV concurrently with the other tests

* Run remaining api tests concurrently

* no need for retry with the port number service

* monkey patch race condition in go-sockaddr until we understand why that fails

* monkey patch hcl decoder race condidtion until we understand why that fails

* monkey patch spurious errors in strings.EqualFold from here

* add test for hcl decoder race condition. Run with go test -parallel 128

* Increase timeout again

* cleanup

* don't log port allocations by default

* use base command arg parsing to format help output properly

* handle -dc deprecation case in Build

* switch autopilot.max_trailing_logs to int

* remove duplicate test case

* remove unused methods

* remove comments about flag/config value inconsistencies

* switch got and want around since the error message was misleading.

* Removes a stray debug log.

* Removes a stray newline in imports.

* Fixes TestACL_Version8.

* Runs go fmt.

* Adds a default case for unknown address types.

* Reoders and reformats some imports.

* Adds some comments and fixes typos.

* Reorders imports.

* add unix socket support for dns later

* drop all deprecated flags and arguments

* fix wrong field name

* remove stray node-id file

* drop unnecessary patch section in test

* drop duplicate test

* add test for LeaveOnTerm and SkipLeaveOnInt in client mode

* drop "bla" and add clarifying comment for the test

* split up tests to support enterprise/non-enterprise tests

* drop raft multiplier and derive values during build phase

* sanitize runtime config reflectively and add test

* detect invalid config fields

* fix tests with invalid config fields

* use different values for wan sanitiziation test

* drop recursor in favor of recursors

* allow dns_config.udp_answer_limit to be zero

* make sure tests run on machines with multiple ips

* Fix failing tests in a few more places by providing a bind address in the test

* Gets rid of skipped TestAgent_CheckPerformanceSettings and adds case for builder.

* Add porter to server_test.go to make tests there less flaky

* go fmt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.