Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: operational aspects, hints and tips #252

Merged
merged 35 commits into from
Aug 10, 2017
Merged

Conversation

shlomi-noach
Copy link
Collaborator

Documenting operational aspects:

  • howtos
  • troubleshooting
  • deployment examples


- For `SQLite`:
- `SQLite` is bundled with `orchestrator`.
- Make sure the `SQLite3DataFile` is writable to the `orchestrator` user.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: writable by
Question: does orchestrator try to ensure this file has any specific permissions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Answer: nope.

As suggested, you may want to put `orchestrator` service and `MySQL` service on same box. If using `SQLite` there's nothing else to do.

- Consider adding a proxy on top of the service boxes; the proxy would redirect all traffic to the leader node. There is one and only one leader node, and the status check endpoint is `/api/leader-check`.
- Clients may _only iteract with the leader_. Setting up a proxy is one way to ensure that. See [proxy section](raft.md#proxy).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may" ? Should this not be "must" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

- Run failure detection
- Register their own health check

None-leader nodes may _NOT_:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may" ? Should this not be "must" ?


- Run arbitrary commands (e.g. `relocate`, `begin-downtime`)
- Run recoveries per human request.
- Serve client HTTP requests (but some endpoints, such as load-balancer and health checks, are valid).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the non-leaders redirect these requests to the leader? That might simplify the proxy setup but maybe it assumes connectivity that may not work: (e.g. it probably requires the non-leaders to know the public interface of the leader and currently orchestrator is not necessarily aware of that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #246

- Copy backend DB data:
- If `MySQL`, run backup/restore, either logical or physical.
- If `SQLite`, run `.dump` + restore, see [10. Converting An Entire Database To An ASCII Text File](https://sqlite.org/cli.html).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It is not possible to "bootstrap" a new empty orchestrator node into a raft cluster just by talking to the cluster? while using the equivalent of dumping the sqlite or mysql db backend to the new node will work it feels attractive to be able to pull this info down from the cluster directly.
  • I may have missed this: but are access "credentials" required to join the cluster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap: see #246 (comment)
credentials: no.


In a shared backend setup multiple `orchestrator` services will all speak to the same backend.

- For **synchronous replication**, the advise is:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: "advice"

- Only one service will be the leader at any given time.
- The leader is the one polling for servers; doing database cleanup; checking for crash recovery scenarios etc.
- You may choose to have all your `orchestrator` services load-balanced
However how does `orchestrator` discover completely new topologies?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "However, how ..." ?? (comma is important here).

- The (single) MySQL backend has the necessary state for managing concurrent operations.
- `orchestrator` has "maintenance locks" which prevent destructive concurrent operations on the same instance. At worst an
operation will be rejected due to not being able to acquire maintenance lock.
- You may ask `orchestrator` to _discover_ (probe) any single server in such topology, and from there on it will crawl its way across the entire topology.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: " ... in such a topology ..."

```

This setup comes from production environments. The cron entries get updated by `puppet` to reflect the appropriate `promotion_rule`. A server may have `prefer` at this time, and `prefer_not` in 5 minutes from now. Integrate your own service discovery method, your own scripting, to provide with your up-to-date `promotion-rule`.

Copy link
Collaborator

@sjmudd sjmudd Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use the http interface to achieve the same result. That avoids the need for direct access to the orchestrator database (for writes) so may be preferred.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. I'll be giving examples using orchestrator CLI, orchestrator-client (which is actually usign the HTTP interface) and directly accessing the API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the very example you responded to used orchestrator-client...

@shlomi-noach
Copy link
Collaborator Author

cc @github/database-infrastructure as interested party.

@shlomi-noach shlomi-noach merged commit 07de2fc into master Aug 10, 2017
@shlomi-noach shlomi-noach deleted the docs-ops branch August 10, 2017 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants