Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Provide some suggestions for larger orchestrator environments #1204

Merged
merged 5 commits into from
Jul 26, 2020

Conversation

sjmudd
Copy link
Collaborator

@sjmudd sjmudd commented Jul 5, 2020

Add some suggestions on settings for orchestrator in a large environment.

@MOON-CLJ
Copy link
Contributor

MOON-CLJ commented Jul 5, 2020

In addition to #1203, we have one more modification。

we comment this line out.

go orcraft.PublishCommand("async-snapshot", "")

because we do frequent integration tests in our product environment. and because we patch AuditPurgeDays to 31(We've already revert it to 7),we have a lots of data in our backend db。

select count(*) from topology_recovery;
+----------+
| count(*) |
+----------+
|    62421 |
+----------+
1 row in set (0.02 sec)

select count(*) from topology_failure_detection;
+----------+
| count(*) |
+----------+
|    95159 |
+----------+
1 row in set (0.02 sec)

Snapshot reads the full amount of data in the backend db as well as the raft state each time, taking up a lot of memory and golang gc resources. Then takes up a lot of goroutine scheduling resources, then raft heartbeat timeouts and frequent elections.

So we only do snapshots periodically now.

snapshotInterval = 30 * time.Minute

maybe we need a config for this behavior

go orcraft.PublishCommand("async-snapshot", "")

cc @shlomi-noach

@sjmudd
Copy link
Collaborator Author

sjmudd commented Jul 6, 2020

Hm. The snapshot thing I have configured without touching code.

"SnapshotTopologiesIntervalHours": 0,

Judging from the configuration description I think this is sufficient. You should not need to change the code to make orchestrator work the way you need it to.

If extra configuration is required I'm sure that @shlomi-noach would prefer you providing a PR with a new configuration setting to make the setting configurable as needed. The process for doing that is quite straight forward and this is much better long term. Having had experience of maintaining a patched fork of orchestrator I would really recommend that you push changes back upstream to Shlomi explaining why they are needed and if they make sense I'm sure he'll accept them.

@shlomi-noach
Copy link
Collaborator

CI fails because there's no incoming link to this new documentation page. Please add a link in configuration.md

@sjmudd
Copy link
Collaborator Author

sjmudd commented Jul 6, 2020

@shlomi-noach good morning. I hadn't realised this was necessary and it's good to require that, so should be fixed I think with the extra commit I've pushed.

@MOON-CLJ
Copy link
Contributor

MOON-CLJ commented Jul 6, 2020

go orcraft.PublishCommand("async-snapshot", "")

Hm. The snapshot thing I have configured without touching code. "SnapshotTopologiesIntervalHours": 0,

@sjmudd SnapshotTopologiesIntervalHours is no the same thing with snapshotInterval,snapshotInterval was about raft snapshot.

so what we suggest is to add a configuration item here below

go orcraft.PublishCommand("async-snapshot", "")

@shlomi-noach shlomi-noach merged commit edf1a10 into openark:master Jul 26, 2020
@sjmudd sjmudd deleted the smudd/configuration-large branch December 25, 2021 20:16
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants