Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: add 'endpoint hashkv' command #8351

Merged
merged 5 commits into from
Aug 7, 2017
Merged

*: add 'endpoint hashkv' command #8351

merged 5 commits into from
Aug 7, 2017

Conversation

gyuho
Copy link
Contributor

@gyuho gyuho commented Aug 2, 2017

Fix #8348.
Address #8305.

+------------------------+------------+
| ENDPOINT | HASH |
+------------------------+------------+
| http://127.0.0.1:2379 | 1084519789 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space here.

@xiang90
Copy link
Contributor

xiang90 commented Aug 2, 2017

@gyuho Thanks for getting this done SO FAST!

@@ -28,6 +28,12 @@ Sometimes an etcd cluster will possibly have v3 data which should not be overwri
ETCDCTL_API=3 etcdctl get "" --from-key --keys-only --limit 1 | wc -l
```

In case v2 data were migrated with TTLs, ensure that stores are consistent by checking the hashes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/were/was

for i := 0; i < 3; i++ {
cli := clus.Client(i)

hresp, err := cli.HashKV(context.Background(), clus.Members[0].GRPCAddr(), 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/[0]/[i]

var hv uint32
for i := 0; i < 3; i++ {
cli := clus.Client(i)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can get rid of the unreliable time.Sleep:

if _, err := cli.Get(context.TODO(), "foo"); err != nil { t.Fatal(err) }

@@ -50,6 +51,11 @@ type Maintenance interface {
// Status gets the status of the endpoint.
Status(ctx context.Context, endpoint string) (*StatusResponse, error)

// HashKV returns the state of backend database at the time of the RPC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/the state of/a hash of the/
s/backend database/KV state/? it's not the entire backend...

@@ -50,6 +51,11 @@ type Maintenance interface {
// Status gets the status of the endpoint.
Status(ctx context.Context, endpoint string) (*StatusResponse, error)

// HashKV returns the state of backend database at the time of the RPC.
// If revision is non-zero, hash is computed only in key bucket and keys at or below
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// If revision is zero, the hash is computed on all keys. If the revision is non-zero, the hash is computed on all keys at or below the given revision.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder should the comment for maintenance HashKV the same as the one in rpc?
HashKV computes the hash of all MVCC keys up to a given revision.
https://github.com/coreos/etcd/blob/master/etcdserver/etcdserverpb/rpc.proto#L187

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clientv3 Compact and the rpc.proto Compact have different comments too; the difference here seems to be similar. Although to be closer to clientv3 Compact it should talk about KV history instead of a "backend database".

Duplicated comments (/anything) can easily get out of sync and I think there are too many exceptions to using the RPC comment directly to justify tooling to keep it synced up.

@@ -641,6 +641,49 @@ Get the status for all endpoints in the cluster associated with the default endp
+------------------------+------------------+----------------+---------+-----------+-----------+------------+
```

### ENDPOINT HASH

ENDPOINT HASH fetches the backend state of each endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetches the hash of the key-value store of an endpoint.

Short: "Prints out backend database states specified in `--endpoints` flag",
Run: epHashCommandFunc,
}
hc.PersistentFlags().Int64Var(&epHashRev, "rev", 0, "if non-zero, hash keys at or below the given revision")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"maximum revision to hash (default: all revisions)"

display.EndpointHash(hashList)

if err != nil {
os.Exit(ExitError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExitWithError?

@gyuho gyuho added the WIP label Aug 2, 2017
@gyuho gyuho force-pushed the hash branch 2 times, most recently from c0491b0 to 518a6f7 Compare August 3, 2017 09:21
@gyuho gyuho removed the WIP label Aug 3, 2017
@gyuho gyuho force-pushed the hash branch 2 times, most recently from cc99ec8 to 33282b5 Compare August 3, 2017 09:25
Copy link
Contributor

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok in general

In case v2 data was migrated with TTLs, ensure that stores are consistent by checking the hashes:

```sh
ETCDCTL_API=3 etcdctl endpoint hash --cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be hashkv? The naming should be consistent with the underlying RPC / client call if possible (e.g., what if etcdctl ever gets support for the old hash rpc)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the failure mode for inconsistent data is "member is completely broken", I would recommend increasing the warnings given here.

"You must ensure all members have consistent data before migrating, otherwise some members will not be able to correctly serve queries for new data."

This really should also explain what to do if someone has frequently expiring keys - for instance, someone with keys expiring every second will have an extremely high chance of having inconsistent data. Should they keep retrying?


##### Simple format

Prints a humanized table of each endpoint URL and backend hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/backend/KV history/?

func newEpHashCommand() *cobra.Command {
hc := &cobra.Command{
Use: "hash",
Short: "Prints out backend database states specified in `--endpoints` flag",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Prints the KV history hash for each endpoint in --endpoints"?

@gyuho gyuho changed the title *: add 'endpoint hash' command *: add 'endpoint hashkv' command Aug 4, 2017
@gyuho gyuho added the WIP label Aug 4, 2017
@gyuho gyuho force-pushed the hash branch 3 times, most recently from 80b2412 to 1ee4267 Compare August 4, 2017 18:09
For etcd v3.3+, run `ETCDCTL_API=3 etcdctl endpoint hashkv --cluster` to ensure key-value stores are consistent post migration.

**Warn**: When v2 store has expiring TTL keys and migrate command intends to preserve TTLs, it is possible that inconsistent data is migrated. TODO?

Copy link
Contributor Author

@gyuho gyuho Aug 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on this?

@heyitsanthony @xiang90

ref. #8305 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/it is possible that inconsistent data is migrated. TODO/migration may be inconsistent with the last committed v2 state when run on any member with a raft index less than the last leader's raft index./
?

Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
Signed-off-by: Gyu-Ho Lee <gyuhox@gmail.com>
@gyuho gyuho removed the WIP label Aug 6, 2017
Copy link
Contributor

@heyitsanthony heyitsanthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks

@gyuho gyuho merged commit a9b9ef5 into etcd-io:master Aug 7, 2017
@gyuho gyuho deleted the hash branch August 7, 2017 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants