Documentation: operational aspects, hints and tips #252

shlomi-noach · 2017-08-07T06:49:03Z

Documenting operational aspects:

howtos
troubleshooting
deployment examples

sjmudd · 2017-08-08T19:01:49Z

docs/deployment-raft.md

+
+- For `SQLite`:
+  - `SQLite` is bundled with `orchestrator`.
+  - Make sure the `SQLite3DataFile` is writable to the `orchestrator` user.


Suggestion: writable by
Question: does orchestrator try to ensure this file has any specific permissions?

fixed. Answer: nope.

sjmudd · 2017-08-08T19:03:21Z

docs/deployment-raft.md

+  As suggested, you may want to put `orchestrator` service and `MySQL` service on same box. If using `SQLite` there's nothing else to do.
+
+- Consider adding a proxy on top of the service boxes; the proxy would redirect all traffic to the leader node. There is one and only one leader node, and the status check endpoint is `/api/leader-check`.
+  - Clients may _only iteract with the leader_. Setting up a proxy is one way to ensure that. See [proxy section](raft.md#proxy).


"may" ? Should this not be "must" ?

sjmudd · 2017-08-08T19:05:17Z

docs/deployment-raft.md

+- Run failure detection
+- Register their own health check
+
+None-leader nodes may _NOT_:


"may" ? Should this not be "must" ?

sjmudd · 2017-08-08T19:10:04Z

docs/deployment-raft.md

+
+- Run arbitrary commands (e.g. `relocate`, `begin-downtime`)
+- Run recoveries per human request.
+- Serve client HTTP requests (but some endpoints, such as load-balancer and health checks, are valid).


Could the non-leaders redirect these requests to the leader? That might simplify the proxy setup but maybe it assumes connectivity that may not work: (e.g. it probably requires the non-leaders to know the public interface of the leader and currently orchestrator is not necessarily aware of that.

sjmudd · 2017-08-08T19:18:16Z

docs/deployment-raft.md

+- Copy backend DB data:
+  - If `MySQL`, run backup/restore, either logical or physical.
+  - If `SQLite`, run `.dump` + restore, see [10. Converting An Entire Database To An ASCII Text File](https://sqlite.org/cli.html).
+


It is not possible to "bootstrap" a new empty orchestrator node into a raft cluster just by talking to the cluster? while using the equivalent of dumping the sqlite or mysql db backend to the new node will work it feels attractive to be able to pull this info down from the cluster directly.

I may have missed this: but are access "credentials" required to join the cluster?

bootstrap: see #246 (comment)
credentials: no.

sjmudd · 2017-08-08T19:20:57Z

docs/deployment-shared-backend.md

+
+In a shared backend setup multiple `orchestrator` services will all speak to the same backend.
+
+- For **synchronous replication**, the advise is:


suggestion: "advice"

sjmudd · 2017-08-08T19:25:39Z

docs/deployment.md

-  - Only one service will be the leader at any given time.
-  - The leader is the one polling for servers; doing database cleanup; checking for crash recovery scenarios etc.
- You may choose to have all your `orchestrator` services load-balanced
+However how does `orchestrator` discover completely new topologies?


Suggestion: "However, how ..." ?? (comma is important here).

sjmudd · 2017-08-08T19:26:37Z

docs/deployment.md

- The (single) MySQL backend has the necessary state for managing concurrent operations.
- `orchestrator` has "maintenance locks" which prevent destructive concurrent operations on the same instance. At worst an
-  operation will be rejected due to not being able to acquire maintenance lock.
+- You may ask `orchestrator` to _discover_ (probe) any single server in such topology, and from there on it will crawl its way across the entire topology.


suggestion: " ... in such a topology ..."

sjmudd · 2017-08-08T19:30:21Z

docs/deployment.md

+```
+
+This setup comes from production environments. The cron entries get updated by `puppet` to reflect the appropriate `promotion_rule`. A server may have `prefer` at this time, and `prefer_not` in 5 minutes from now. Integrate your own service discovery method, your own scripting, to provide with your up-to-date `promotion-rule`.
+


You can also use the http interface to achieve the same result. That avoids the need for direct access to the orchestrator database (for writes) so may be preferred.

Thanks for pointing this out. I'll be giving examples using orchestrator CLI, orchestrator-client (which is actually usign the HTTP interface) and directly accessing the API.

actually, the very example you responded to used orchestrator-client...

shlomi-noach · 2017-08-09T08:24:09Z

cc @github/database-infrastructure as interested party.

Shlomi Noach added 18 commits August 7, 2017 09:48

Documentation: operational aspects, hints and tips

a39d4de

shared backend db deployment docs

d498b66

shared backend db deployment docs

16da227

shared backend db deployment docs

afaf157

shared backend db deployment docs

56585be

deployments: raft

be72e49

deployments: raft

ae2220e

deployments: raft

24be71e

deployments: raft

c6273ab

general deployment

0d763a5

raft ops

878fb23

general deployment

ad71300

general deployment

38d4835

general deployment

9f8be5e

general deployment

7d36756

general deployment

6d6ed89

general deployment

b10994b

general deployment

fd32bb3

sjmudd reviewed Aug 8, 2017

View reviewed changes

Shlomi Noach added 3 commits August 9, 2017 10:26

typos and grammar

1b9abf9

Merge branch 'master' into docs-ops

97418f7

general deployment

649b09d

Shlomi Noach added 5 commits August 9, 2017 11:10

pseudo-gtid screencapture

cf9c959

pseudo-gtid doc

2bd65d4

pseudo-gtid doc

4270757

pseudo-gtid doc

265a353

pseudo-gtid doc

8f7aa2a

Shlomi Noach added 9 commits August 9, 2017 11:40

metadata deployment

9889b14

metadata deployment

2dd668f

deployment

ee1dcb9

deployment

70dccce

readme

55497e3

general wrap up

317ddd1

raft accepts IP or hostnames

0093ab6

Merge branch 'master' into docs-ops

1564d09

Merge branch 'master' into docs-ops

9c00ed7

shlomi-noach merged commit 07de2fc into master Aug 10, 2017

shlomi-noach deleted the docs-ops branch August 10, 2017 05:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation: operational aspects, hints and tips #252

Documentation: operational aspects, hints and tips #252

shlomi-noach commented Aug 7, 2017

sjmudd Aug 8, 2017

shlomi-noach Aug 9, 2017

sjmudd Aug 8, 2017

shlomi-noach Aug 9, 2017

sjmudd Aug 8, 2017

sjmudd Aug 8, 2017

shlomi-noach Aug 9, 2017

sjmudd Aug 8, 2017

shlomi-noach Aug 9, 2017

sjmudd Aug 8, 2017

sjmudd Aug 8, 2017

sjmudd Aug 8, 2017

sjmudd Aug 8, 2017 •

edited

Loading

shlomi-noach Aug 9, 2017

shlomi-noach Aug 9, 2017

shlomi-noach commented Aug 9, 2017


		In a shared backend setup multiple `orchestrator` services will all speak to the same backend.

		- For synchronous replication, the advise is:

		```

		This setup comes from production environments. The cron entries get updated by `puppet` to reflect the appropriate `promotion_rule`. A server may have `prefer` at this time, and `prefer_not` in 5 minutes from now. Integrate your own service discovery method, your own scripting, to provide with your up-to-date `promotion-rule`.

Documentation: operational aspects, hints and tips #252

Documentation: operational aspects, hints and tips #252

Conversation

shlomi-noach commented Aug 7, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjmudd Aug 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shlomi-noach commented Aug 9, 2017

sjmudd Aug 8, 2017 •

edited

Loading