Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve advice regarding poor performance #4276

Merged
merged 2 commits into from Jun 18, 2019

Conversation

6 participants
@Ralith
Copy link
Contributor

commented Dec 8, 2018

This documents the assorted work-arounds that finally recovered my HS from total dysfunction, and which I've successfully applied to rescue others in similar straits on multiple occasions now.

@aaronraimist

This comment has been minimized.

Copy link
Contributor

commented Dec 8, 2018

Fixes #3939

@aaronraimist
Copy link
Contributor

left a comment

You are missing a CHANGELOG file and a sign off (that’s why the test failed). See CONSTRIBUTING.rst

@Ralith Ralith force-pushed the Ralith:performance-advice branch from ac4ba12 to cd498ef Dec 8, 2018

@jcaesar

This comment has been minimized.

Copy link

commented Dec 9, 2018

From elsewhere I heard that using (a tuned) postgres also helps. Not sure if you want to add that.

@Ralith

This comment has been minimized.

Copy link
Contributor Author

commented Dec 9, 2018

Good idea. Is an overloaded sqlite most strongly associated with adverse memory, CPU, or disk use?

@Half-Shot

This comment has been minimized.

Copy link
Contributor

commented Dec 9, 2018

@Ralith The general wisdom I've heard these days is if your homeservers does any of the following:

  • Federates
  • Hosts a "large" number of users
  • You are in a lot of rooms

Then running postgres is a must, and hopefully you can tune postgres to not be too consuming if you are on a small box. Sqlite is really only good for spinning up quick demos or deving on, but not for communities or federating with the outside world. Personally I think we should make it a requirement of our packages and make it damn clear on the README that folks should run postgres if they are experiencing slowness.

@Ralith Ralith force-pushed the Ralith:performance-advice branch 2 times, most recently from 2d4599c to afb8ad7 Dec 9, 2018

@Ralith

This comment has been minimized.

Copy link
Contributor Author

commented Dec 9, 2018

I merged the sections, added mention of postgres, and tweaked the wording a bit.

@richvdh richvdh changed the base branch from master to develop Dec 10, 2018

README.rst Outdated
excess of outgoing federation requests (see `discussion
<https://github.com/matrix-org/synapse/issues/3971>`_). If your server is
also issuing far more outgoing federation requests than can be accounted
for by your users' activity, this is a likely cause. The misbehavior can

This comment has been minimized.

Copy link
@richvdh

richvdh Dec 10, 2018

Member

I'd really rather we fixed the bug than baked this into the README so that it becomes normality. I'm particularly concerned that, even if we do fix the bug, we'll end up forgetting to update the readme. #3971 already documents the workaround in the issue description - is there any need to repeat it here?

This comment has been minimized.

Copy link
@Ralith

Ralith Dec 10, 2018

Author Contributor

I've been encountering a somewhat steady stream of people having this problem, and who knows how many go unseen. The situation right now is pretty dire and it is difficult for an affected server operator to find this information if it's not published somewhere central.

A better solution than mentioning the README would be to disable presence by default until it's no longer a performance disaster. Is that on the table?

This comment has been minimized.

Copy link
@aaronraimist

aaronraimist Feb 12, 2019

Contributor

I'd argue that since the title of the section is "Help!! Synapse eats all my RAM!", it is already somewhat baked in to the README except without all the pertinent information. Not everyone knows to look for that issue.

I think it is important to get this section included in Synapse 1.0 until more performance improvements come in the future.

@ghost

This comment has been minimized.

Copy link

commented Dec 23, 2018

Good idea. Is an overloaded sqlite most strongly associated with adverse memory, CPU, or disk use?

To maybe answer the question: SQLite can only write to one thing at once so it's just the database being so minimal that is the issue. On an SSD federating the 3 main rooms on matrix & riot my 24 cores only had 2 of them being used at around 60% and the database itself was just the limit. SSD was pretty idle as well.
It's great for what it's for, but big scale isn't really what it's for x.x

@aaronraimist

This comment has been minimized.

Copy link
Contributor

commented Feb 12, 2019

Just adding a note here saying that this PR will need to be updated since the README has been reorganized a bit recently. I think this PR should be included in 1.0.

@Ralith

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2019

I'd be happy to update it if there was any indication that it might be merged. It was in merge-ready state for some time following the previous iteration, after all.

@richvdh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

Ok if you could update it I'll merge it. I'm still unhappy that we're encouraging people to disable presence, but I guess we can remove it once we fix the bug.

@richvdh

This comment has been minimized.

Copy link
Member

commented Feb 13, 2019

(thanks)

@anoadragon453

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

@Ralith Do you plan to continue this PR?

Improve advice regarding poor performance
Signed-off-by: Benjamin Saunders <ben.e.saunders@gmail.com>

@Ralith Ralith force-pushed the Ralith:performance-advice branch from afb8ad7 to 047486a Jun 9, 2019

@codecov

This comment has been minimized.

Copy link

commented Jun 9, 2019

Codecov Report

Merging #4276 into develop will increase coverage by 9.97%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4276      +/-   ##
===========================================
+ Coverage    62.53%   72.51%   +9.97%     
===========================================
  Files          326      333       +7     
  Lines        35649    33883    -1766     
  Branches      5848        0    -5848     
===========================================
+ Hits         22293    24570    +2277     
+ Misses       11803     9313    -2490     
+ Partials      1553        0    -1553
Impacted Files Coverage Δ
synapse/replication/slave/storage/directory.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/receipts.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/profile.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/transactions.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/registration.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/keys.py 0% <0%> (-100%) ⬇️
...se/replication/slave/storage/_slaved_id_tracker.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/appservice.py 0% <0%> (-100%) ⬇️
synapse/util/logformatter.py 0% <0%> (-88.89%) ⬇️
synapse/replication/slave/storage/events.py 0% <0%> (-88.71%) ⬇️
... and 334 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2decc92...047486a. Read the comment docs.

@codecov

This comment has been minimized.

Copy link

commented Jun 9, 2019

Codecov Report

Merging #4276 into develop will increase coverage by 9.97%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4276      +/-   ##
===========================================
+ Coverage    62.53%   72.51%   +9.97%     
===========================================
  Files          326      333       +7     
  Lines        35649    33883    -1766     
  Branches      5848        0    -5848     
===========================================
+ Hits         22293    24569    +2276     
+ Misses       11803     9314    -2489     
+ Partials      1553        0    -1553
Impacted Files Coverage Δ
synapse/replication/slave/storage/directory.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/receipts.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/profile.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/transactions.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/registration.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/keys.py 0% <0%> (-100%) ⬇️
...se/replication/slave/storage/_slaved_id_tracker.py 0% <0%> (-100%) ⬇️
synapse/replication/slave/storage/appservice.py 0% <0%> (-100%) ⬇️
synapse/util/logformatter.py 0% <0%> (-88.89%) ⬇️
synapse/replication/slave/storage/events.py 0% <0%> (-88.71%) ⬇️
... and 334 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2decc92...b36de88. Read the comment docs.

@Ralith

This comment has been minimized.

Copy link
Contributor Author

commented Jun 9, 2019

Rebased and slightly edited.

@richvdh richvdh added this to In progress in Homeserver Task Board via automation Jun 18, 2019

@richvdh richvdh requested a review from matrix-org/synapse-core Jun 18, 2019

@richvdh
Copy link
Member

left a comment

looks great, thank you!

@richvdh richvdh merged commit 8fcd2ca into matrix-org:develop Jun 18, 2019

20 of 22 checks passed

ci/circleci: sytestpy2merged Your tests failed on CircleCI
Details
ci/circleci: sytestpy2postgresmerged Your tests failed on CircleCI
Details
buildkite/synapse Build #2231 passed (20 minutes, 16 seconds)
Details
buildkite/synapse/check-sample-config Passed (1 minute, 9 seconds)
Details
buildkite/synapse/isort Passed (47 seconds)
Details
buildkite/synapse/newspaper-newsfile Passed (45 seconds)
Details
buildkite/synapse/packaging Passed (50 seconds)
Details
buildkite/synapse/pep-8 Passed (1 minute, 16 seconds)
Details
buildkite/synapse/pipeline Passed (10 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-postgres-9-dot-4 Passed (15 minutes, 50 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-postgres-9-dot-5 Passed (16 minutes)
Details
buildkite/synapse/python-2-dot-7-slash-sqlite Passed (7 minutes, 24 seconds)
Details
buildkite/synapse/python-2-dot-7-slash-sqlite-slash-old-deps Passed (7 minutes, 44 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-postgres-9-dot-4 Passed (16 minutes, 26 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-postgres-9-dot-5 Passed (16 minutes, 38 seconds)
Details
buildkite/synapse/python-3-dot-5-slash-sqlite Passed (8 minutes, 10 seconds)
Details
buildkite/synapse/python-3-dot-6-slash-sqlite Passed (8 minutes, 13 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-postgres-11 Passed (16 minutes, 3 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-postgres-9-dot-5 Passed (16 minutes, 25 seconds)
Details
buildkite/synapse/python-3-dot-7-slash-sqlite Passed (8 minutes, 3 seconds)
Details
ci/circleci: sytestpy3merged Your tests passed on CircleCI!
Details
ci/circleci: sytestpy3postgresmerged Your tests passed on CircleCI!
Details

Homeserver Task Board automation moved this from In progress to Done Jun 18, 2019

hawkowl added a commit that referenced this pull request Jul 5, 2019

Merge tag 'v1.1.0' into shhs
Synapse 1.1.0 (2019-07-04)
==========================

As of v1.1.0, Synapse no longer supports Python 2, nor Postgres version 9.4.
See the [upgrade notes](UPGRADE.rst#upgrading-to-v110) for more details.

This release also deprecates the use of environment variables to configure the
docker image. See the [docker README](https://github.com/matrix-org/synapse/blob/release-v1.1.0/docker/README.md#legacy-dynamic-configuration-file-support)
for more details.

No changes since 1.1.0rc2.

Synapse 1.1.0rc2 (2019-07-03)
=============================

Bugfixes
--------

- Fix regression in 1.1rc1 where OPTIONS requests to the media repo would fail. ([\#5593](#5593))
- Removed the `SYNAPSE_SMTP_*` docker container environment variables. Using these environment variables prevented the docker container from starting in Synapse v1.0, even though they didn't actually allow any functionality anyway. ([\#5596](#5596))
- Fix a number of "Starting txn from sentinel context" warnings. ([\#5605](#5605))

Internal Changes
----------------

- Update github templates. ([\#5552](#5552))

Synapse 1.1.0rc1 (2019-07-02)
=============================

As of v1.1.0, Synapse no longer supports Python 2, nor Postgres version 9.4.
See the [upgrade notes](UPGRADE.rst#upgrading-to-v110) for more details.

Features
--------

- Added possibilty to disable local password authentication. Contributed by Daniel Hoffend. ([\#5092](#5092))
- Add monthly active users to phonehome stats. ([\#5252](#5252))
- Allow expired user to trigger renewal email sending manually. ([\#5363](#5363))
- Statistics on forward extremities per room are now exposed via Prometheus. ([\#5384](#5384), [\#5458](#5458), [\#5461](#5461))
- Add --no-daemonize option to run synapse in the foreground, per issue #4130. Contributed by Soham Gumaste. ([\#5412](#5412), [\#5587](#5587))
- Fully support SAML2 authentication. Contributed by [Alexander Trost](https://github.com/galexrt) - thank you! ([\#5422](#5422))
- Allow server admins to define implementations of extra rules for allowing or denying incoming events. ([\#5440](#5440), [\#5474](#5474), [\#5477](#5477))
- Add support for handling pagination APIs on client reader worker. ([\#5505](#5505), [\#5513](#5513), [\#5531](#5531))
- Improve help and cmdline option names for --generate-config options. ([\#5512](#5512))
- Allow configuration of the path used for ACME account keys. ([\#5516](#5516), [\#5521](#5521), [\#5522](#5522))
- Add --data-dir and --open-private-ports options. ([\#5524](#5524))
- Split public rooms directory auth config in two settings, in order to manage client auth independently from the federation part of it. Obsoletes the "restrict_public_rooms_to_local_users" configuration setting. If "restrict_public_rooms_to_local_users" is set in the config, Synapse will act as if both new options are enabled, i.e. require authentication through the client API and deny federation requests. ([\#5534](#5534))
- The minimum TLS version used for outgoing federation requests can now be set with `federation_client_minimum_tls_version`. ([\#5550](#5550))
- Optimise devices changed query to not pull unnecessary rows from the database, reducing database load. ([\#5559](#5559))
- Add new metrics for number of forward extremities being persisted and number of state groups involved in resolution. ([\#5476](#5476))

Bugfixes
--------

- Fix bug processing incoming events over federation if call to `/get_missing_events` fails. ([\#5042](#5042))
- Prevent more than one room upgrade happening simultaneously on the same room. ([\#5051](#5051))
- Fix a bug where running synapse_port_db would cause the account validity feature to fail because it didn't set the type of the email_sent column to boolean. ([\#5325](#5325))
- Warn about disabling email-based password resets when a reset occurs, and remove warning when someone attempts a phone-based reset. ([\#5387](#5387))
- Fix email notifications for unnamed rooms with multiple people. ([\#5388](#5388))
- Fix exceptions in federation reader worker caused by attempting to renew attestations, which should only happen on master worker. ([\#5389](#5389))
- Fix handling of failures fetching remote content to not log failures as exceptions. ([\#5390](#5390))
- Fix a bug where deactivated users could receive renewal emails if the account validity feature is on. ([\#5394](#5394))
- Fix missing invite state after exchanging 3PID invites over federaton. ([\#5464](#5464))
- Fix intermittent exceptions on Apple hardware. Also fix bug that caused database activity times to be under-reported in log lines. ([\#5498](#5498))
- Fix logging error when a tampered event is detected. ([\#5500](#5500))
- Fix bug where clients could tight loop calling `/sync` for a period. ([\#5507](#5507))
- Fix bug with `jinja2` preventing Synapse from starting. Users who had this problem should now simply need to run `pip install matrix-synapse`. ([\#5514](#5514))
- Fix a regression where homeservers on private IP addresses were incorrectly blacklisted. ([\#5523](#5523))
- Fixed m.login.jwt using unregistred user_id and added pyjwt>=1.6.4 as jwt conditional dependencies. Contributed by Pau Rodriguez-Estivill. ([\#5555](#5555), [\#5586](#5586))
- Fix a bug that would cause invited users to receive several emails for a single 3PID invite in case the inviter is rate limited. ([\#5576](#5576))

Updates to the Docker image
---------------------------
- Add ability to change Docker containers [timezone](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) with the `TZ` variable. ([\#5383](#5383))
- Update docker image to use Python 3.7. ([\#5546](#5546))
- Deprecate the use of environment variables for configuration, and make the use of a static configuration the default. ([\#5561](#5561), [\#5562](#5562), [\#5566](#5566), [\#5567](#5567))
- Increase default log level for docker image to INFO. It can still be changed by editing the generated log.config file. ([\#5547](#5547))
- Send synapse logs to the docker logging system, by default. ([\#5565](#5565))
- Open the non-TLS port by default. ([\#5568](#5568))
- Fix failure to start under docker with SAML support enabled. ([\#5490](#5490))
- Use a sensible location for data files when generating a config file. ([\#5563](#5563))

Deprecations and Removals
-------------------------

- Python 2.7 is no longer a supported platform. Synapse now requires Python 3.5+ to run. ([\#5425](#5425))
- PostgreSQL 9.4 is no longer supported. Synapse requires Postgres 9.5+ or above for Postgres support. ([\#5448](#5448))
- Remove support for cpu_affinity setting. ([\#5525](#5525))

Improved Documentation
----------------------
- Improve README section on performance troubleshooting. ([\#4276](#4276))
- Add information about how to install and run `black` on the codebase to code_style.rst. ([\#5537](#5537))
- Improve install docs on choosing server_name. ([\#5558](#5558))

Internal Changes
----------------

- Add logging to 3pid invite signature verification. ([\#5015](#5015))
- Update example haproxy config to a more compatible setup. ([\#5313](#5313))
- Track deactivated accounts in the database. ([\#5378](#5378), [\#5465](#5465), [\#5493](#5493))
- Clean up code for sending federation EDUs. ([\#5381](#5381))
- Add a sponsor button to the repo. ([\#5382](#5382), [\#5386](#5386))
- Don't log non-200 responses from federation queries as exceptions. ([\#5383](#5383))
- Update Python syntax in contrib/ to Python 3. ([\#5446](#5446))
- Update federation_client dev script to support `.well-known` and work with python3. ([\#5447](#5447))
- SyTest has been moved to Buildkite. ([\#5459](#5459))
- Demo script now uses python3. ([\#5460](#5460))
- Synapse can now handle RestServlets that return coroutines. ([\#5475](#5475), [\#5585](#5585))
- The demo servers talk to each other again. ([\#5478](#5478))
- Add an EXPERIMENTAL config option to try and periodically clean up extremities by sending dummy events. ([\#5480](#5480))
- Synapse's codebase is now formatted by `black`. ([\#5482](#5482))
- Some cleanups and sanity-checking in the CPU and database metrics. ([\#5499](#5499))
- Improve email notification logging. ([\#5502](#5502))
- Fix "Unexpected entry in 'full_schemas'" log warning. ([\#5509](#5509))
- Improve logging when generating config files. ([\#5510](#5510))
- Refactor and clean up Config parser for maintainability. ([\#5511](#5511))
- Make the config clearer in that email.template_dir is relative to the Synapse's root directory, not the `synapse/` folder within it. ([\#5543](#5543))
- Update v1.0.0 release changelog to include more information about changes to password resets. ([\#5545](#5545))
- Remove non-functioning check_event_hash.py dev script. ([\#5548](#5548))
- Synapse will now only allow TLS v1.2 connections when serving federation, if it terminates TLS. As Synapse's allowed ciphers were only able to be used in TLSv1.2 before, this does not change behaviour. ([\#5550](#5550))
- Logging when running GC collection on generation 0 is now at the DEBUG level, not INFO. ([\#5557](#5557))
- Reduce the amount of stuff we send in the docker context. ([\#5564](#5564))
- Point the reverse links in the Purge History contrib scripts at the intended location. ([\#5570](#5570))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.