Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Availability and Scaling setup #2975

Open
genofire opened this issue Feb 16, 2023 · 2 comments
Open

High Availability and Scaling setup #2975

genofire opened this issue Feb 16, 2023 · 2 comments
Labels
T-Other Questions, user support, anything else.

Comments

@genofire
Copy link
Contributor

Is Dendriten without polylith still possible to scale and high availability setup possible?

Is it possible to run multiple dendrite with connection to same database?

@S7evinK
Copy link
Contributor

S7evinK commented Feb 17, 2023

Dendrite wasn't able to run in "HA" mode to begin with, as in you couldn't have more than one component of the same type running at the same time.
Dendrite will still be more performant than Synapse for small user deployments. (While we have anonymous usage stats, we currently don't know if there are huge, e.g. >1k users, Dendrite deployments in the wild and how they are performing)

Going to quote @kegsay for the reason behind this change here:

hey folks, after much discussions we've finally decided on a direction for dendrite, instead of being constantly tugged between embedded/p2p and massively-scalable deployments. We've ultimately decided to go down the embedded/p2p route, and will be making changes over the next few months to reflect this new reality. This is a significant change: probably the biggest one since we moved from Kafka to NATS. The ramifications include:

  • polylith mode will be removed from the project, including internal HTTP calls.
  • component databases will change to take advantage of monolith mode: for example we currently store every event twice, once in roomserver and once in syncapi. This will be optimised so we only store it once.
  • we want to make dendrite more modular: running in embedded mode should not attach appservice code for example, ideally not even build it to keep binary sizes low.
  • we will be adding runtime/trace support, which makes it significantly easier to debug performance bottlenecks in single processes. We're aware performance is an issue, and having tracing support here will be a game changer in allowing us to see if its due to bad SQL (it usually is..), GC, or poor O(n^2)+ algorithms in code.
  • "internal API"-like functions are now as cheap as regular function calls, so code which previously assumed this was expensive and tried to minimise these calls will be re-evaluated. This is a big change for dendrite devs, as it's a huge change in thinking.
  • the directory structure of the project will be revised, given we no longer have components. E.g where should a shared events database live when both roomserver/syncapi need it.
  • Our use of NATS will be re-evaluated, and it may be removed from the project. Much of its benefits came from being able to seamlessly provide a message queue for both mono and polylith modes: but now it's increasingly a liability and source of bugs (jetstream directory bloat, random "timed out sending to NATS" when used in P2P, etc).

We will also be taking this time to land in a few unrelated but also likely breaking changes, some of which came from folks here who I met at FOSDEM, including but not limited to:

  • Config YAML: All secrets will now be referred to by file path, and not hard-coded in the YAML. This will make systemd and k8s deployments easier, as secrets can be mounted to files and the YAML needn't be secret. The YAML will also go through a major version bump to strip out polylith sections and rejig sections entirely given components like roomserver don't really have any meaning anymore. We'll likely have a migration script which can automatically map from v2 to v3 format wise.
  • Registration/Login: we are looking into adding native OIDC support - this keeps the Dendrite codebase small and maintainable in this area, whilst providing more options for server admins to add things like SSO which we are aware is a sore point currently. Basic password login/registration will remain as a simple way to provide accounts for users, and keeps sytest/complement happy, but anything more complex than that and we'll be looking to OIDC for answers.

We will be keeping both Postgres/SQLite support, even though we won't use Postgres in embedded scenarios. The maintenance burden for us here is significant, but postgres' performance, coupled with the fact that we've basically told people to run using postgres and actively pushed people to do so, mean we will maintain support for it as a first class citizen.

We're aiming to land as much of this as possible over the next few months, incrementally. Breaking changes from a server admin's pov will be kept to a minimum and will be associated with a version bump.

@S7evinK S7evinK added the T-Other Questions, user support, anything else. label Feb 17, 2023
@genofire
Copy link
Contributor Author

if you could deactivated accounts (after registry without capture ...), here is an server with over >1k:

dendrite_sum7=# select count(*) from userapi_accounts;
 count
-------
  3374
(1 row)

dendrite_sum7=# select count(*) from userapi_daily_visits ;
 count
-------
  1630
(1 row)

dendrite_sum7=# SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname)) AS size
    FROM pg_database ORDER BY pg_database_size(pg_database.datname) DESC;
      datname       |  size
--------------------+---------
 dendrite_sum7      | 30 GB

(see #2464)


till you do not release v1.0.0 you would not find any huge server ....

for an update process in kubernetes (with zero downtime) it would be nice, to have the possibility running two dendrite's at the same time (on the same database) ...

holgersson32644 pushed a commit to holgersson32644/holgersson-overlay that referenced this issue Mar 21, 2023
Upstream dropped the polylith mode in 0.12.0 as announced in earlier releases.
This leads to some renaming of files, see e.g. upstream issue 2975[1].

As I've got no OpenRC system or container right now I can't test the modified
OpenRC init scripts. In case you have one feedback either way is appreciated
- confirmations that it's working, bug reports if it failes or even just
suggestions for improvement.

As the ebuild didn't build (failed in the install phase) before this fixup
there is no revbump necessary.

This commit also adds an service file which is based upon upstream's
example for their monolith setup (but with paths in /usr instead of opt)[2].

[1] matrix-org/dendrite#2975
[2] https://github.com/matrix-org/dendrite/blob/main/docs/systemd/monolith-example.service

Signed-off-by: Nils Freydank <nils.freydank@posteo.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-Other Questions, user support, anything else.
Projects
None yet
Development

No branches or pull requests

2 participants