Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to register watch handlers (regression with 0.8.4) #3177

Closed
preetapan opened this issue Jun 22, 2017 · 15 comments
Closed

Unable to register watch handlers (regression with 0.8.4) #3177

preetapan opened this issue Jun 22, 2017 · 15 comments
Assignees

Comments

@preetapan
Copy link
Member

If you have a question, please direct it to the
consul mailing list if it hasn't been
addressed in either the FAQ or in one
of the Consul Guides.

When filing a bug, please include the following:

consul version for both Client and Server

Client: 0.8.4
Server: master

Operating system and Environment details

Description of the Issue (and unexpected/desired result)

Creating watches fail with the error [ERR] Failed to run watch: %v Failed to connect to agent: Unknown protocol scheme: http+tcp

Reproduction steps

Run consul with a watch handler configuration https://www.consul.io/docs/agent/watches.html. I will link to the one I used in a gist as a comment below.

Log Fragments or Link to gist

[ERR] Failed to run watch: %v Failed to connect to agent: Unknown protocol scheme: http+tcp
Include appropriate Client or Server log fragments. If the log is longer
than a few dozen lines, please include the URL to the
gist.

TIP: Use -log-level=TRACE on the client and server to capture the maximum log detail.

@slackpad slackpad added this to Backlog in Consul 0.9.0 Jun 22, 2017
@preetapan
Copy link
Member Author

preetapan commented Jun 22, 2017

@magiconair I traced the root cause down to the String() method for ProtoAddr, it is used in the [registerWatches][https://github.com/hashicorp/consul/blob/31a310f551c36243d2aae9ed80054c200e079b81/agent/agent.go#L515] method, and now the String() method returns something like http+tcp://addr:port instead of http://addr:port, causing the Run method on the watch to fail.

I was able to fix it locally changing the String() method to not add the Net part of the struct but that likely will break something else so wanted you to take a look. This looks like it broke prior to 0.8.4 release.

@slackpad
Copy link
Contributor

That error print looks like it's missing the right number of args, while we are in here.

@magiconair
Copy link
Contributor

yes and yes. will fix :/

@preetapan
Copy link
Member Author

yeah I noticed/fixed that locally too,

@magiconair
Copy link
Contributor

@preetapan thx for chasing this down.

@magiconair
Copy link
Contributor

The ProtoAddr approach I've added is broken. Should just use net.Addr and distinguish between HTTP and HTTPS endpoints.

@magiconair
Copy link
Contributor

This should fix it. Testing.

diff --git a/agent/agent.go b/agent/agent.go
index 0564b0b6..53b26f75 100644
--- a/agent/agent.go
+++ b/agent/agent.go
@@ -512,12 +512,12 @@ func (a *Agent) registerWatches() error {
                go func(wp *watch.Plan) {
                        wp.Handler = makeWatchHandler(a.LogOutput, wp.Exempt["handler"])
                        wp.LogOutput = a.LogOutput
-                       addr := addrs[0].String()
+                       addr := addrs[0].Proto
                        if addrs[0].Net == "unix" {
                                addr = "unix://" + addr
                        }
                        if err := wp.Run(addr); err != nil {
-                               a.logger.Println("[ERR] Failed to run watch: %v", err)
+                               a.logger.Printf("[ERR] Failed to run watch: %v", err)
                        }
                }(wp)
        }

@preetapan
Copy link
Member Author

I don't think it should be just the Proto part, looks like the run method expects something like "http://ipaddr:port"

@preetapan
Copy link
Member Author

Here's how I tested watches - https://gist.github.com/preetapan/70b918e03be2b82a2b1f0e732f039b1e

@slackpad
Copy link
Contributor

Would be good to add an e2e test as part of this because I think we test both sides separately but must not have an integrated test if this got through.

magiconair added a commit that referenced this issue Jun 22, 2017
This patch fixes watch registration through the config file
and a broken log line when the watch registration fails.

Fixes #3177
@magiconair
Copy link
Contributor

pushed a temp fix to a branch. Will check on how to add a good test for that.

@slackpad slackpad moved this from Backlog to Frank in Consul 0.9.0 Jun 22, 2017
@magiconair
Copy link
Contributor

I think we have tests that check whether watches work. What failed here is whether watches from the config get registered. So when you start an agent which has watches in the config will there be watch handlers running. What happens when you reload? This might also be harder to catch when the the problem is in the config parsing. Splitting user and runtime config will make this easier to detect.

magiconair added a commit that referenced this issue Jun 23, 2017
This patch fixes watch registration through the config file
and a broken log line when the watch registration fails.

Fixes #3177
@magiconair
Copy link
Contributor

The way we're registering watches is slightly different for load and reload

https://github.com/hashicorp/consul/blob/master/agent/agent.go#L499-L525
https://github.com/hashicorp/consul/blob/master/agent/agent.go#L2342-L2362

Load supports HTTPS and unix sockets whereas reload supports only HTTP. The returned addr from ClientListener can be a net.UnixAddr but we're not adding the unix:// prefix.

I'd like to roll this into a single function which registers watches and stops existing watches. This would probably also mean to move the WatchPlans out of the Config since they are state and not configuration AFAICT.

What we should test IMO is that watches get registered and re-registered, i.e. that loading a configuration mutates the WatchPlans since we have other tests to check whether the watches actually work. However, this may not be so simple since the watch.Plan isn't just a plan but keeps the watch state as well. To be able to test this we would have to split the watch.Plan into a Plan and a Watcher which executes the plan so that loading the config reliably mutates the state but doesn't spawn any go routines.

This is a larger change that we might want to do over several iterations. I suggest to merge #3181 to fix the issue at hand and create another issue to refactor watches to make watch registration more testable. What I could do is to equalize the behavior of loading and reloading in terms of endpoints (support HTTP and HTTPS on both TCP and sockets) unless I'm missing something fundamental here.

@preetapan
Copy link
Member Author

preetapan commented Jun 23, 2017

Read the code quite a bit because it is not clear at all. TLDR during load though it looks like it supports unix and https, it actually does not. The if block that supports unix sockets is actually never executed, because config.HttpAddrs() only returns http/https addresses, so that if statement will never be true. Another misleading thing is though it gets a slice of Addrs, it only looks at the first one when creating the watcher goroutine. So it ignores any https addresses returned by config.HttpAddrs, because that function adds http addresses before https addresses.

To cut a long story short, the load part is functionally equivalent to reload, but written in a very brittle way (e.g. only using addrs[0]), so doing some small refactoring to unify both methods would help a lot towards code clarity.

The rest of your comment above (WatchPlan vs Watcher), I would try to find an easier way to make it testable rather than more refactoring, but i am not sure what that is right now. Perhaps something like an atomic counter that gets set and reset when watches are loaded, and check the value of that in the unit test.

@preetapan preetapan changed the title Unable to register watch handlers (potential regression with 0.8.4) Unable to register watch handlers (regression with 0.8.4) Jun 23, 2017
@magiconair
Copy link
Contributor

Quick comment: you can do http and https over TCP or UNIX sockets. I think you can disable HTTP access which would provide the HTTPS address. This doesn't work in the reload case. If http is disabled then watch reload should not work. Also if http is done via UNIX sockets then watch reload should not work. Need to test though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants