Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discoverd hashicorp/raft #1612

Merged
merged 1 commit into from Sep 9, 2015

Conversation

Projects
None yet
4 participants
@benbjohnson
Copy link
Contributor

benbjohnson commented Jun 23, 2015

Overview status

This pull request replaces the etcd service dependency in discoverd with a built-in Raft implementation. Previously, discoverd required internal state to be maintained by the server.State object which was kept in sync with the etcd server. Now the state is held inside the Raft cluster so the State and Backend objects were merged together into a Store object.

Notes

Merging the state and backend together caused all the tests to break. The test suite previously contained mostly multi-step integration tests which made it difficult to narrow down the source of errors that were occurring. After struggling with debugging the errors, I refactored the test suite into unit tests to limit the scope of each test and used the standard library testing package.

TODO

  • Cluster join flags on discoverd.
  • Build discoverd using tup.
  • CLI testing
  • Integration testing
// Create and serve an SSE stream.
s := sse.NewStream(w, ch, nil)
s.Serve()
s.Wait()

This comment has been minimized.

Copy link
@lmars

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Sep 8, 2015

Author Contributor

I tried using sse.ServeStream but it doesn't give me the stream reference to use with s.CloseWithError() a couple lines below.


// Command type header bytes.
const (
addServiceCommandType = byte(0)

This comment has been minimized.

Copy link
@lmars

lmars Jun 23, 2015

Member

Use byte(iota)?

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Jun 26, 2015

Author Contributor

I prefer to hardcode the values when they'll be persisted to disk. Otherwise it's too easy to insert a value in the middle and corrupt the other values.

This comment has been minimized.

Copy link
@lmars

lmars Jun 26, 2015

Member

👍


// Set default test settings.
s.BindAddress = ":20000"
s.Advertise, _ = net.ResolveTCPAddr("tcp", "localhost:20000")

This comment has been minimized.

Copy link
@lmars

lmars Jun 23, 2015

Member

In the etcd tests, we get a random port, probably worth doing that here.

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Jun 26, 2015

Author Contributor

Does that safely work for you? I've had issues where it'll reassign the port after the net.Listener is closed. Although that may have only been an issue because I was running go test with -parallel.

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Jun 26, 2015

Author Contributor

I usually prefer to pass in the net.Listener created with 127.0.0.1:0 to the object accepting the connections but hashicorp/raft makes it difficult to do without implementing your own raft.StreamLayer.

This comment has been minimized.

Copy link
@lmars

lmars Jun 26, 2015

Member

I agree the random port method isn't ideal, but I haven't had issues with it myself.

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Sep 8, 2015

Author Contributor

Fixed in cfcd183

}

// Wait for leadership.
<-s.LeaderCh()

This comment has been minimized.

Copy link
@lmars

lmars Jun 23, 2015

Member

Is it worth panicking if this takes a while?

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Sep 8, 2015

Author Contributor

Fixed in cfcd183

. "github.com/flynn/flynn/discoverd/testutil/etcdrunner"
)

func RunDiscoverdServer(t TestingT, port string, etcdAddr string) (string, func()) {

This comment has been minimized.

Copy link
@lmars

lmars Jun 23, 2015

Member

This is now used in the discoverd client tests, so we will need to keep it.

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Jun 26, 2015

Author Contributor

OK, i'll put it back in.

hh "github.com/flynn/flynn/pkg/httphelper"
"github.com/flynn/flynn/pkg/stream"
"github.com/hashicorp/raft"
"github.com/hashicorp/raft-boltdb"

This comment has been minimized.

Copy link
@lmars

lmars Jun 23, 2015

Member

These two pakages need to be vendored using godep.

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch from aaf03ac to e84239c Jun 25, 2015

@lmars

This comment has been minimized.

Copy link
Member

lmars commented Jun 26, 2015

@benbjohnson I have come across two issues trying to bootstrap the latest code:

@benbjohnson

This comment has been minimized.

Copy link
Contributor Author

benbjohnson commented Jun 29, 2015

@lmars Sorry for dropping off abruptly on Friday. Thanks for looking into the bootstrapping. I'm still wrapping my head around all the moving parts. I'm AFK for most of tomorrow but I'll dive into this on Tuesday.

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch from 745a2d0 to aa53875 Jun 30, 2015

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch from 66a0f98 to 2df721a Jul 15, 2015

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 2 times, most recently from f382e8e to 9a3a740 Jul 28, 2015

@titanous titanous referenced this pull request Aug 4, 2015

Closed

TImeout on deployments #1659

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch from ca8db41 to 78c5981 Aug 5, 2015

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 9 times, most recently from 5108f45 to 21a89c4 Aug 13, 2015

@benbjohnson benbjohnson changed the title WIP: discoverd hashicorp/raft discoverd hashicorp/raft Aug 18, 2015

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 4 times, most recently from 27e1541 to f6b092d Aug 18, 2015

}

if inst != nil {
log.Printf("XXX> INVALIDATE.4: service=%v, entry=%v (BROADCAST LEADER)", service, inst.ID)

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Remove all the debug prints from this method.

var stderr, stdout io.Reader
if os.Getenv("DEBUG") != "" {
if true || os.Getenv("DEBUG") != "" {

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Debug short circuit?

}

// Ping HTTP API.
resp, err := http.Get("http://" + host + "/raft/leader")

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Use fmt.Sprintf

@@ -23,6 +23,8 @@ import (
"github.com/flynn/flynn/pkg/cluster"
"github.com/flynn/flynn/pkg/shutdown"
"github.com/flynn/flynn/pkg/version"

_ "github.com/flynn/flynn/Godeps/_workspace/src/github.com/inconshreveable/gonative"

This comment has been minimized.

Copy link
@titanous

This comment has been minimized.

Copy link
@benbjohnson

benbjohnson Sep 8, 2015

Author Contributor

It was getting deleted when I ran godeps because there's no import reference in the project. This probably isn't the best spot for it though.

This comment has been minimized.

Copy link
@titanous

titanous Sep 8, 2015

Member

I'm going to nuke it in the 1.5 branch, so we can probably leave this out.

@@ -143,6 +145,9 @@ func (s *Stream) Error(err error) {
}

func (s *Stream) Close() {
s.mu.Lock()
defer s.mu.Unlock()

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Use sync.Once?

func (s *DiscoverdSuite) TestMultiNodeRestart(t *c.C) {
// FIXME: Retrieve image ID of discoverd: imageURIs["discoverd"]
// FIXME: exec.Run 3 instances: image uri, args
}

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Drop this file for now.

@@ -160,6 +160,7 @@ func (s *PostgresSuite) testDeploy(t *c.C, d *pgDeploy) {
if event.Kind != discoverd.EventKindServiceMeta {
continue
}

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Drop this line.

@@ -111,6 +111,7 @@ var imageURIs = map[string]string{
"test-apps": "",
"postgresql": "",
"controller-examples": "",
"discoverd": "",

This comment has been minimized.

Copy link
@titanous

titanous Sep 5, 2015

Member

Drop this for now.

@titanous

This comment has been minimized.

Copy link
Member

titanous commented Sep 5, 2015

First review pass done, overall this looks great, just minor style nits.

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 5 times, most recently from eede265 to 83a60fe Sep 7, 2015

@lmars

This comment has been minimized.

Copy link
Member

lmars commented Sep 8, 2015

I ran this through custom CI and got the following panic in the logs of a newly booted host in SchedulerSuite.TestOmniJobs:

https://gist.github.com/lmars/312d92fc1aabe33f97d2

The output is from an on-disk log file and the formatting is a bit off, not sure why.

The raw multipart build log is here if you need to see more:

https://s3.amazonaws.com/flynn-ci-logs/20150908001758-d2cfb9f7-build-83a60fed5ba648a15819a4f4cf74f08e2c761031-2015-09-08-00-23-35.txt

@titanous

This comment has been minimized.

Copy link
Member

titanous commented Sep 8, 2015

Looks like a similar panic is in this build: https://ci.flynn.io/builds/20150907203159-72ac9fd5

(just search "panic")

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 2 times, most recently from 363c4d3 to 4aa2b9f Sep 8, 2015

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch 2 times, most recently from cfcd183 to 13f1731 Sep 8, 2015

discoverd: Add raft-backed implementation
This commit removes the etcd dependency from discoverd and replaces it
with Hashicorp's raft library.

Signed-off-by: Ben Johnson <benbjohnson@yahoo.com>

@benbjohnson benbjohnson force-pushed the benbjohnson:discoverd-raft branch from 13f1731 to fc17c05 Sep 9, 2015

@titanous

This comment has been minimized.

Copy link
Member

titanous commented Sep 9, 2015

LGTM!

benbjohnson added a commit that referenced this pull request Sep 9, 2015

@benbjohnson benbjohnson merged commit 89089a9 into flynn:master Sep 9, 2015

1 check failed

continuous-integration/flynn The Flynn CI build failed
Details

@benbjohnson benbjohnson deleted the benbjohnson:discoverd-raft branch Sep 9, 2015

@lmars

This comment has been minimized.

Copy link
Member

lmars commented Sep 9, 2015

😍

@kwando

This comment has been minimized.

Copy link

kwando commented Sep 9, 2015

wie!

jvatic added a commit that referenced this pull request Oct 19, 2015

Godeps: Restore go-bindata entry in Godeps.json
Accidentally removed in fc17c05 (#1612)

Signed-off-by: Jesse Stuart <jesse@jessestuart.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.