Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Apparent memory leak with use_presence option #12053

Closed
hyossing opened this issue Feb 21, 2022 · 8 comments
Closed

Apparent memory leak with use_presence option #12053

hyossing opened this issue Feb 21, 2022 · 8 comments
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@hyossing
Copy link

hyossing commented Feb 21, 2022

Hello,

Description

GC(Garbage Collection) time increases unbounded for specific config in homeserver.yaml.

use_presence: False

Our running synapse's homeserver.yaml is quite old, so we just keep the use_presence options.
Then,

Screen Shot 2022-02-22 at 7 08 10 AM

In my opinion, the gc time increase more as requests are increases.

It is not easy to catch the problem, because It is a kind of aging issue.

  • Restarting synapse solve the problem because leaked memory is gone.

So, it took a time to know about it.

If we set the settings to the latest, then the unbounded increase is not appear anymore.

# Presence tracking allows users to see the state (e.g online/offline)
# of other local and remote users.
#
presence:
  # Uncomment to disable presence tracking on this homeserver. This option
  # replaces the previous top-level 'use_presence' option.
  #
  # enabled: false

  # Presence routers are third-party modules that can specify additional logic
  # to where presence updates from users are routed.
  #
  presence_router:
    # The custom module's class. Uncomment to use a custom presence router module.
    #
    #module: "my_custom_router.PresenceRouter"

    # Configuration options of the custom module. Refer to your module's
    # documentation for available options.
    #
    #config:
    #  example_option: 'something'

Steps to reproduce

Archive.zip

  1. unzip
  2. yarn install

  3. docker-compose up -d

  4. create postgresql database
  • CREATE DATABASE synapse ENCODING 'UTF8' LC_COLLATE='C' LC_CTYPE='C' template=template0
  1. access grafana - http://localhost:3000
  2. import synapse dashboard and check it does work. - synapse_rev1.json
  3. node synapse-gc-test.js

  • Run the script one or two days, then check the GC time increase.

Thanks you.
Hyosung

@erikjohnston
Copy link
Member

Thanks, do you see an increase in memory usage too? This may be due to caches slowly filling up, so you may see this fixed if you enable time based expiry for caches (which is enabled by default in v1.53.0). Not all caches are affected by time based expiry, so please let us know if time based expiry fixes this issue.

@erikjohnston erikjohnston added the X-Needs-Info This issue is blocked awaiting information from the reporter label Feb 23, 2022
@hyossing
Copy link
Author

Hello,

I'll test the new config expiry_time for the GC time issue.

Let you know the result.

Thanks you.

@hyossing
Copy link
Author

hyossing commented Mar 8, 2022

Hello,

I've tested it.

However, It is the same.

In my opinion, the use_presence options cause the problem.

Please, deprecate the option.

Screen Shot 2022-03-08 at 3 52 01 PM

Screen Shot 2022-03-08 at 3 53 27 PM

@erikjohnston
Copy link
Member

Oh, hmm. I wonder if its because we've disabled time based expiry for the caches for get_users_in_room and get_users_who_share_room_with_user. In the caches section do you see those caches increasing in size?

@hyossing
Copy link
Author

hyossing commented Mar 9, 2022

I didn't see that.

Screen Shot 2022-03-10 at 4 37 51 AM

My test code does just send-to-device and sync to get it.

Only, device_inbox is used.

Thanks you.

@erikjohnston
Copy link
Member

Oh, that is very interesting. Looks like my assumption that it was due to caches is wrong.

Can i ask what your test code does? Does it create lots of new users? Maybe the data leak is we're storing a presence update for each new user forever?

@hyossing
Copy link
Author

Hello,

My test code is on the top of the comments in this issue. (Archive.zip)

I explained how to reproduce the issue.

Thanks you.

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Mar 21, 2022
@callahad callahad changed the title Unbounded GC Time Increase for use_presence option. Apparent memory leak when presence enabled Mar 24, 2022
@callahad callahad changed the title Apparent memory leak when presence enabled Apparent memory leak with use_presence option Mar 24, 2022
@erikjohnston
Copy link
Member

erikjohnston commented May 6, 2022

This is likely fixed by #12213, which is in v1.58.0.

The cause was that /sync would call set_state which did not correctly no-op when presence was enabled. This caused us to add things to the wheel timer without ever taking things out of it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

2 participants