Add placement change operational guide #998

richardartoul · 2018-10-01T15:35:16Z

No description provided.

codecov · 2018-10-01T16:09:51Z

Codecov Report

Merging #998 into master will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #998      +/-   ##
==========================================
+ Coverage   77.82%   77.85%   +0.03%     
==========================================
  Files         411      411              
  Lines       34516    34516              
==========================================
+ Hits        26863    26874      +11     
+ Misses       5778     5770       -8     
+ Partials     1875     1872       -3

Flag	Coverage Δ
#dbnode	`81.42% <ø> (+0.04%)`	⬆️
#m3ninx	`75.25% <ø> (ø)`	⬆️
#query	`64.35% <ø> (ø)`	⬆️
#x	`84.72% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 918ee35...061fbdc. Read the comment docs.

justinjc · 2018-10-01T16:42:36Z

docs/operational_guide/placement_changes.md

+
+## Overview
+
+M3DB was designed from the ground up to a be a distributed / clustered database that is isolation group aware. Clusters will seamlessly scale with your data, and you can start with a cluster as small as 3 nodes and grow it to a size of several hundred nodes with no downtime or expensive migrations.


I get that the 3 nodes comment is referring to clusters, but I wonder if this will confuse people in thinking it's not possible to test M3DB on their own (single) machine?

Yeah, perhaps just say "small number of nodes".

justinjc · 2018-10-01T16:45:02Z

docs/operational_guide/placement_changes.md

+
+In other words, all you have to do is issue the desired instruction and the M3 stack will take care of making sure that your data is distributed with appropriate replication and isolation.
+
+In the case of the M3DB nodes, nodes that have received new shards will immediately begin receiving writes (but not serving reads) for the new shards that they are responsible for. They will also begin streaming in all the data for their newly acquired shards from the peers that already have data for those shards. Once the nodes have finished streaming in the data for the shards that they have acquired, they will mark their status for those shards as `Available` in the placement and begin accepting writes. Simultaneously, the nodes that are losing ownership of any shards will mark their status for those shards as `LEAVING`. Once all the nodes accepting ownership of the new shards have finished streaming data from them, they will relinquish ownership of those shards and remove all the data associated with the shards they lost from memory and from disk.


Should Available be in all caps?

no but LEAVING should not be capitalized

justinjc · 2018-10-01T16:48:20Z

docs/operational_guide/placement_changes.md

+
+In the case of the M3DB nodes, nodes that have received new shards will immediately begin receiving writes (but not serving reads) for the new shards that they are responsible for. They will also begin streaming in all the data for their newly acquired shards from the peers that already have data for those shards. Once the nodes have finished streaming in the data for the shards that they have acquired, they will mark their status for those shards as `Available` in the placement and begin accepting writes. Simultaneously, the nodes that are losing ownership of any shards will mark their status for those shards as `LEAVING`. Once all the nodes accepting ownership of the new shards have finished streaming data from them, they will relinquish ownership of those shards and remove all the data associated with the shards they lost from memory and from disk.
+
+M3Coordinator nodes will also pickup the new placement from etcd and alter which M3DB nodes they issue writse and reads to appropriately.


s/writse/writes

justinjc · 2018-10-01T16:49:34Z

docs/operational_guide/placement_changes.md

+
+#### Isolation Group
+
+This value controls how nodes that own the same M3DB shards are isolated from each other. For example, in a single datacenter configuration this value could be set to the rack that the M3DB node lives on. As a result, the placement will guarantee that nodes that exist on the same rack do not share any shards, allowing the cluster to survive the failure of an entire rack. Alternatively, if M3DB was deployed in an AWS region, the isolation group could be set to the regions availability zone and that would ensure that the cluster would survive the loss of an entire availability zone.


s/regions/region's

justinjc · 2018-10-01T16:51:57Z

docs/operational_guide/placement_changes.md

+
+#### Weight
+
+This value should be an integer and controls how the cluster will weigh the number of shards that an individual node will own. If you're running the M3DB cluster on homogenous hardware, then you probably want to assign all M3DB nodes the same weight so that shards are distributed evenly. On the otherhand, if you're running the cluster on heterogenous hardware, then this value should be higher for nodes with higher resources for whatever the limiting factor is in your cluster setup. For example, if disk space (as opposed to memory or CPU) is the limiting factor in how many shards any given node in your cluster can tolerate, then you could assign a higher value to nodes in your cluster that have larger disks and the placement calcualtions would assign them a higher number of shards.


s/calcualtions/calculations

justinjc · 2018-10-01T16:53:27Z

docs/operational_guide/placement_changes.md

+
+The instructions below all contain sample curl commands, but you can always review the API documentation by navigating to
+
+`http://<M3_COORDINATOR_HOST_NAME>:<CONFIGURED_PORT(default 7201)>/api/v1/openapi`


Also: https://m3db.io/openapi/

nice....how does that get updated

Netlify will automatically update this during the build script assuming that the assets folder is up to date (via make asset-gen-query).

justinjc · 2018-10-01T16:57:12Z

docs/operational_guide/placement_changes.md

+```
+
+#### Replacing a Node
+TODO


Isn't this just a remove and then an add?

@robskillington Is this correct

robskillington · 2018-10-01T18:10:42Z

docs/operational_guide/placement_changes.md

+
+## Overview
+
+M3DB was designed from the ground up to a be a distributed / clustered database that is isolation group aware. Clusters will seamlessly scale with your data, and you can start with a cluster as small as 3 nodes and grow it to a size of several hundred nodes with no downtime or expensive migrations.


nit with opening sentence: "to a be" -> "to be a" and "distributed / clustered database" -> "distributed (clustered)".

Perhaps:

M3DB was designed from the ground up to be a distributed (clustered) database that is availability zone or rack aware (by using isolation groups).

prateek · 2018-10-01T20:06:33Z

docs/operational_guide/namespace_changes.md

@@ -82,7 +82,7 @@ TODO

 The operations below include sample CURLs, but you can always review the API documentation by navigating to

-`http://<M3_COORDINATOR_IP_ADDRESS>:<CONFIGURED_PORT(default 7201)>/api/v1/openapi`
+`http://<M3_COORDINATOR_HOST_NAME>:<CONFIGURED_PORT(default 7201)>/api/v1/openapi` or our [online API documentation](https://m3db.io/openapi/).


unrelated to your change but the deleting a namespace HTTP endpoint isn't correctly quoted: https://github.com/m3db/m3/blob/2eba3b8e0abc28b962d9d0132a7b0e82328a5c4e/docs/operational_guide/namespace_changes.md. mind updating.

prateek · 2018-10-01T20:17:54Z

src/query/generated/assets/openapi/spec.yml

@@ -402,14 +419,14 @@ definitions:
      instances:
        type: "array"
        items:
-          $ref: "#/definitions/Instance"
+          $ref: "#/definitions/InstanceRequest"


is this just to avoid the extra fields?

matejzero · 2018-10-02T05:08:23Z

docs/operational_guide/placement_changes.md

+
+#### Removing a Node
+
+Send a DELETE request to the `/api/v1/placement/<NODE_ID>` endpoint.


After sending a DELETE, can the node be removed right away or do we need to wait for the cluster to rebalance?

Good point I will clarify, but yes you need to wait for all the other nodes to stream data from the node you removed.

richardartoul · 2018-10-02T13:26:40Z

docs/operational_guide/placement_changes.md

+
+Before reading the rest of this document, we recommend familiarizing yourself with the [M3DB placement documentation](placement.md)
+
+**Note**: The primary limiting factor for the maximum size of an M3DB cluster is the number of shards.  TODO: Explain how to pick an appropriate number of shards and the tradeoff with a (small) linear increase in required node resources with the number of shards.


@robskillington @prateek Can I get some guidance on this?

Gah, I wish we would do some work to remove this silly requirement to choose some kind of size that matters. If groups of shards use about as many resources as one single shard, then this greatly improves the ergonomics/usability of the DB. You could grow a cluster of size 1 to a size of a few thousand nodes, if say you start off with 16k shards.

Yeah thats fair, but we should give some guidance for now. How about 64 shards for development / testing purposes, 1024 for anything you're gonna use in production, and 4096 if you know its gonna be a big cluster and you really don't want to worry about it? We've run 1024 and 4096 in production so we know those values are ok

robskillington · 2018-10-03T14:18:50Z

docs/operational_guide/namespace_changes.md

@@ -120,7 +120,7 @@ Adding a namespace does not require restarting M3DB, but will require modifying

 Deleting a namespace is a simple as using the `DELETE` `/api/v1/namespace` API on an M3Coordinator instance.

-curl -X DELETE <M3_COORDINATOR_IP_ADDRESS>:<CONFIGURED_PORT(default 7201)>api/v1/namespace/<NAMESPACE_NAME>
+`curl -X DELETE <M3_COORDINATOR_IP_ADDRESS>:<CONFIGURED_PORT(default 7201)>api/v1/namespace/<NAMESPACE_NAME>`


Missing / before api/v1/... here.

robskillington · 2018-10-03T14:19:34Z

docs/operational_guide/placement_changes.md

+
+- Resource Constrained / Development clusters: `64 shards`
+- Production clusters: `1024 shards`
+- Production clusters with high-resource nodes (Over 128GiB of ram, etc) and an expected cluster size of several hundred nodes: `4096 shards`


Nice, this is a good recommendation for now.

CLAassistant · 2018-10-03T14:22:21Z

All committers have signed the CLA.

robskillington

LGTM

Richard Artoul added 5 commits October 1, 2018 11:22

Fix broken dashboards

7e5f1e6

more docs

3c1fc0b

improve docs

8212d13

Add initialization command

6772ddd

Update openapi docs

e6bc7be

richardartoul requested review from nerd0, arnikola, justinjc, benraskin92, prateek, robskillington and schallert October 1, 2018 16:10

richardartoul mentioned this pull request Oct 1, 2018

M3DB Operational Guides #964

Closed

8 tasks

richardartoul changed the title ~~[WIP] - Add placement change operational guide~~ Add placement change operational guide Oct 1, 2018

justinjc reviewed Oct 1, 2018

View reviewed changes

robskillington reviewed Oct 1, 2018

View reviewed changes

doc improvements

2eba3b8

prateek reviewed Oct 1, 2018

View reviewed changes

Richard Artoul added 2 commits October 1, 2018 16:07

Explain node replace

1f4cc41

Fix quoting in namespace_modify

2b96a58

prateek reviewed Oct 1, 2018

View reviewed changes

matejzero reviewed Oct 2, 2018

View reviewed changes

richardartoul commented Oct 2, 2018

View reviewed changes

Richard Artoul added 3 commits October 2, 2018 11:50

Add comment about placement change completion

7d3e902

Add warning about peers bootstrapper

1dc7cff

Add number of shards information

901e7d9

robskillington reviewed Oct 3, 2018

View reviewed changes

Fix typo

061fbdc

robskillington approved these changes Oct 3, 2018

View reviewed changes

richardartoul merged commit ccf2888 into master Oct 3, 2018

prateek deleted the ra/placement_change_guide branch October 13, 2018 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add placement change operational guide #998

Add placement change operational guide #998

richardartoul commented Oct 1, 2018

codecov bot commented Oct 1, 2018 •

edited

Loading

justinjc Oct 1, 2018

robskillington Oct 1, 2018

justinjc Oct 1, 2018

richardartoul Oct 1, 2018

justinjc Oct 1, 2018

justinjc Oct 1, 2018

justinjc Oct 1, 2018

justinjc Oct 1, 2018

richardartoul Oct 1, 2018

justinjc Oct 1, 2018

justinjc Oct 1, 2018

richardartoul Oct 1, 2018

robskillington Oct 1, 2018 •

edited

Loading

prateek Oct 1, 2018

richardartoul Oct 2, 2018

prateek Oct 1, 2018

richardartoul Oct 2, 2018

matejzero Oct 2, 2018

richardartoul Oct 2, 2018

richardartoul Oct 2, 2018

robskillington Oct 2, 2018 •

edited

Loading

richardartoul Oct 2, 2018

robskillington Oct 3, 2018

robskillington Oct 3, 2018

CLAassistant commented Oct 3, 2018 •

edited

Loading

robskillington left a comment


		## Overview

		M3DB was designed from the ground up to a be a distributed / clustered database that is isolation group aware. Clusters will seamlessly scale with your data, and you can start with a cluster as small as 3 nodes and grow it to a size of several hundred nodes with no downtime or expensive migrations.


		In other words, all you have to do is issue the desired instruction and the M3 stack will take care of making sure that your data is distributed with appropriate replication and isolation.

		In the case of the M3DB nodes, nodes that have received new shards will immediately begin receiving writes (but not serving reads) for the new shards that they are responsible for. They will also begin streaming in all the data for their newly acquired shards from the peers that already have data for those shards. Once the nodes have finished streaming in the data for the shards that they have acquired, they will mark their status for those shards as `Available` in the placement and begin accepting writes. Simultaneously, the nodes that are losing ownership of any shards will mark their status for those shards as `LEAVING`. Once all the nodes accepting ownership of the new shards have finished streaming data from them, they will relinquish ownership of those shards and remove all the data associated with the shards they lost from memory and from disk.


		In the case of the M3DB nodes, nodes that have received new shards will immediately begin receiving writes (but not serving reads) for the new shards that they are responsible for. They will also begin streaming in all the data for their newly acquired shards from the peers that already have data for those shards. Once the nodes have finished streaming in the data for the shards that they have acquired, they will mark their status for those shards as `Available` in the placement and begin accepting writes. Simultaneously, the nodes that are losing ownership of any shards will mark their status for those shards as `LEAVING`. Once all the nodes accepting ownership of the new shards have finished streaming data from them, they will relinquish ownership of those shards and remove all the data associated with the shards they lost from memory and from disk.

		M3Coordinator nodes will also pickup the new placement from etcd and alter which M3DB nodes they issue writse and reads to appropriately.


		#### Isolation Group

		This value controls how nodes that own the same M3DB shards are isolated from each other. For example, in a single datacenter configuration this value could be set to the rack that the M3DB node lives on. As a result, the placement will guarantee that nodes that exist on the same rack do not share any shards, allowing the cluster to survive the failure of an entire rack. Alternatively, if M3DB was deployed in an AWS region, the isolation group could be set to the regions availability zone and that would ensure that the cluster would survive the loss of an entire availability zone.


		#### Weight

		This value should be an integer and controls how the cluster will weigh the number of shards that an individual node will own. If you're running the M3DB cluster on homogenous hardware, then you probably want to assign all M3DB nodes the same weight so that shards are distributed evenly. On the otherhand, if you're running the cluster on heterogenous hardware, then this value should be higher for nodes with higher resources for whatever the limiting factor is in your cluster setup. For example, if disk space (as opposed to memory or CPU) is the limiting factor in how many shards any given node in your cluster can tolerate, then you could assign a higher value to nodes in your cluster that have larger disks and the placement calcualtions would assign them a higher number of shards.


		The instructions below all contain sample curl commands, but you can always review the API documentation by navigating to

		`http://<M3_COORDINATOR_HOST_NAME>:<CONFIGURED_PORT(default 7201)>/api/v1/openapi`


		#### Removing a Node

		Send a DELETE request to the `/api/v1/placement/<NODE_ID>` endpoint.


		Before reading the rest of this document, we recommend familiarizing yourself with the [M3DB placement documentation](placement.md)

		Note: The primary limiting factor for the maximum size of an M3DB cluster is the number of shards. TODO: Explain how to pick an appropriate number of shards and the tradeoff with a (small) linear increase in required node resources with the number of shards.

Add placement change operational guide #998

Add placement change operational guide #998

Conversation

richardartoul commented Oct 1, 2018

codecov bot commented Oct 1, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Oct 1, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robskillington Oct 2, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CLAassistant commented Oct 3, 2018 • edited Loading

robskillington left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 1, 2018 •

edited

Loading

robskillington Oct 1, 2018 •

edited

Loading

robskillington Oct 2, 2018 •

edited

Loading

CLAassistant commented Oct 3, 2018 •

edited

Loading