Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support etcd v2.0 #48

Merged
merged 2 commits into from
Mar 8, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 20 additions & 11 deletions etcd/src/jepsen/system/etcd.clj
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,26 @@
(def log-file "/var/log/etcd.log")

(defn peer-addr [node]
(str (name node) ":7001"))
(str (name node) ":2380"))

(defn addr [node]
(str (name node) ":4001"))
(str (name node) ":2380"))

(defn cluster-url [node]
(str "http://" (name node) ":2380"))

(defn listen-client-url [node]
(str "http://" (name node) ":2379"))

(defn cluster-info [node]
(str (name node) "=http://" (name node) ":2380"))

(defn peers
"The command-line peer list for an etcd cluster."
[test]
(->> test
:nodes
(map peer-addr)
(map cluster-info)
(str/join ",")))

(defn running?
Expand All @@ -58,14 +67,14 @@
:--exec binary
:--no-close
:--
:-peer-addr (peer-addr node)
:-addr (addr node)
:-peer-bind-addr "0.0.0.0:7001"
:-bind-addr "0.0.0.0:4001"
:-data-dir data-dir
:-name (name node)
(when-not (= node (core/primary test))
[:-peers (peers test)])
:-advertise-client-urls (cluster-url node)
:-listen-peer-urls (cluster-url node)
:-listen-client-urls (listen-client-url node)
:-initial-advertise-peer-urls (cluster-url node)
:-initial-cluster-state "new"
:-initial-cluster (peers test)
:>> log-file
(c/lit "2>&1")))

Expand Down Expand Up @@ -127,7 +136,7 @@
(swap! running assoc node (running?)))

; And spin some more until Raft is ready
(let [c (v/connect (str "http://" (name node) ":4001"))]
(let [c (v/connect (str "http://" (name node) ":2379"))]
(while (try+ (v/reset! c :test "ok") false
(catch [:status 500] e true)
(catch [:status 307] e true))
Expand All @@ -144,7 +153,7 @@
(defrecord CASClient [k client]
client/Client
(setup! [this test node]
(let [client (v/connect (str "http://" (name node) ":4001"))]
(let [client (v/connect (str "http://" (name node) ":2379"))]
(v/reset! client k (json/generate-string nil))
(assoc this :client client)))

Expand Down
1 change: 0 additions & 1 deletion lxc.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,6 @@ Set up hostfiles on each box with hardcoded IP addresses


```
127.0.0.1 localhost n1 n1.local
::1 localhost ip6-localhost ip6-loopback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heyyyy I'm concerned this is gonna break a ton of stuff. Why change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed the net/ip issue, so that's fixed. But the localhost issue remains: when 127.0.0.1 localhost n1 n1.local is removed from /etc/hosts everything works fine, but when localhost is specified in etc/hosts, the cluster machines cannot talk to each other:

2015/01/30 11:48:24 raft: 5440ff22fe632778 [logterm: 1, index: 5] sent vote request to 9b116f88cab4dc9 at term 2
2015/01/30 11:48:24 sender: error posting to 5aa594b5d9b66c42: dial tcp 192.168.122.12:2380: connection refused
2015/01/30 11:48:24 sender: the connection with 5aa594b5d9b66c42 becomes inactive

Slightly strange.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh hang on, I wonder if your notes aren't listening on loopback, and they're trying to initiate a connection back to their own hostname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed the guide https://github.com/coreos/etcd/blob/master/Documentation/clustering.md#static where this doesn't seem to be an issue. I'll ping the etcd crowd and see if they can offer any advice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yicheng Qin from etcd says [1] a possible reason may be that listen-peer-urls is at localhost:2380 by default, which cannot receive message from 192.168.122.12:2380.

[1] https://groups.google.com/d/msg/etcd-dev/AslsLq2Lddk/83L_3jBcfNUJ

fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
Expand Down