Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

synapse explodes with a stack overflow and logging errors #4240

Closed
richvdh opened this issue Nov 29, 2018 · 21 comments
Closed

synapse explodes with a stack overflow and logging errors #4240

richvdh opened this issue Nov 29, 2018 · 21 comments
Labels
z-bug (Deprecated Label)

Comments

@richvdh
Copy link
Member

richvdh commented Nov 29, 2018

We've had a bunch of reports that synapse goes into a meltdown with logging errors. For example (from #4086):

    2018-10-25 15:58:33,973 - twisted - 243 - ERROR - POST-299240- Traceback (most recent call last): 
    2018-10-25 15:58:33,973 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/handlers.py", line 76, in emit
    2018-10-25 15:58:33,974 - twisted - 243 - ERROR - POST-299240-     if self.shouldRollover(record):
    2018-10-25 15:58:33,974 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
    2018-10-25 15:58:33,974 - twisted - 243 - ERROR - POST-299240-     msg = "%s\n" % self.format(record)
    2018-10-25 15:58:33,974 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format 
    2018-10-25 15:58:33,975 - twisted - 243 - ERROR - POST-299240-     return fmt.format(record)
    2018-10-25 15:58:33,975 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format 
    2018-10-25 15:58:33,975 - twisted - 243 - ERROR - POST-299240-     raise e
    2018-10-25 15:58:33,977 - twisted - 243 - ERROR - POST-299240- UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 46: ordinal not in range(128)
    2018-10-25 15:58:33,978 - twisted - 243 - ERROR - POST-299240- Logged from file _base.py, line 254
    2018-10-25 15:58:33,978 - twisted - 243 - ERROR - POST-299240- Traceback (most recent call last): 
    2018-10-25 15:58:33,978 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
    2018-10-25 15:58:33,978 - twisted - 243 - ERROR - POST-299240-     msg = self.format(record)
    2018-10-25 15:58:33,979 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format 
    2018-10-25 15:58:33,979 - twisted - 243 - ERROR - POST-299240-     return fmt.format(record)
    2018-10-25 15:58:33,979 - twisted - 243 - ERROR - POST-299240-   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format 
    2018-10-25 15:58:33,979 - twisted - 243 - ERROR - POST-299240-     raise e
    2018-10-25 15:58:33,980 - twisted - 243 - ERROR - POST-299240- UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 46: ordinal not in range(128)
    2018-10-25 15:58:33,980 - twisted - 243 - ERROR - POST-299240- Logged from file _base.py, line 254
    (etc)

And from #4191:

2018-11-15 12:24:45,358 - twisted - 243 - ERROR - POST-117962- Traceback (most recent call last): 
2018-11-15 12:24:45,359 - twisted - 243 - ERROR - POST-117962-   File "/usr/lib/python2.7/logging/handlers.py", line 76, in emit 
2018-11-15 12:24:45,360 - twisted - 243 - ERROR - POST-117962-     if self.shouldRollover(record):
2018-11-15 12:24:45,360 - twisted - 243 - ERROR - POST-117962-   File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
2018-11-15 12:24:45,360 - twisted - 243 - ERROR - POST-117962-     msg = "%s\n" % self.format(record)
2018-11-15 12:24:45,360 - twisted - 243 - ERROR - POST-117962-   File "/usr/lib/python2.7/logging/__init__.py", line 732, in format
2018-11-15 12:24:45,361 - twisted - 243 - ERROR - POST-117962-     return fmt.format(record)
2018-11-15 12:24:45,361 - twisted - 243 - ERROR - POST-117962-   File "/usr/lib/python2.7/logging/__init__.py", line 474, in format
2018-11-15 12:24:45,361 - twisted - 243 - ERROR - POST-117962-     s = self._fmt % record.__dict__
2018-11-15 12:24:45,361 - twisted - 243 - ERROR - POST-117962- UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 68: ordinal not in range(128)

(Note that both of the above reports are from python 2.7 instances. I don't know if it also applies to python 3.)

I think what's happening here is that we are attempting to log a non-ascii character, which then causes a logging error; the error is written to stderr, which is directed back into the logger, so we end up with a stack-overflow and high CPU usage.

However, I've been unable to reproduce the problem, and certainly non-ascii characters are normally logged without any problem. I don't really understand why self._fmt % record.__dict__ throws a UnicodeDecodeError.

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2018

Part of the problem here is of course that we loop stderr back into the logger - for which see #1539. However, I think there is another problem to be resolved here.

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2018

A workaround for the stack overflows is to run synapse in the foreground rather than via synctl, and set the -n option (no redirect stdio): python -B -m synapse.app.homeserver -n -c homeserver.yaml.

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2018

#4202 also contains a stacktrace that looks related, from python3.

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2018

I don't really understand why self._fmt % record.__dict__ throws a UnicodeDecodeError.

This happens (on python 2) when we log a unicode string and a non-ascii byte object in the same line:

>>> l.info("a %s %s", u"a", "\xe1")
2018-11-29 17:01:46,831 - twisted - 244 - ERROR -  - Traceback (most recent call last):
2018-11-29 17:01:46,832 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/handlers.py", line 76, in emit
2018-11-29 17:01:46,834 - twisted - 244 - ERROR -  -     if self.shouldRollover(record):
2018-11-29 17:01:46,835 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
2018-11-29 17:01:46,836 - twisted - 244 - ERROR -  -     msg = "%s\n" % self.format(record)
2018-11-29 17:01:46,837 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 741, in format
2018-11-29 17:01:46,839 - twisted - 244 - ERROR -  -     return fmt.format(record)
2018-11-29 17:01:46,840 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 465, in format
2018-11-29 17:01:46,841 - twisted - 244 - ERROR -  -     record.message = record.getMessage()
2018-11-29 17:01:46,842 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 329, in getMessage
2018-11-29 17:01:46,845 - twisted - 244 - ERROR -  -     msg = msg % self.args
2018-11-29 17:01:46,846 - twisted - 244 - ERROR -  - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)
2018-11-29 17:01:46,847 - twisted - 244 - ERROR -  - Logged from file <console>, line 1
2018-11-29 17:01:46,848 - twisted - 244 - ERROR -  - Traceback (most recent call last):
2018-11-29 17:01:46,849 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 868, in emit
2018-11-29 17:01:46,849 - twisted - 244 - ERROR -  -     msg = self.format(record)
2018-11-29 17:01:46,850 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 741, in format
2018-11-29 17:01:46,851 - twisted - 244 - ERROR -  -     return fmt.format(record)
2018-11-29 17:01:46,852 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 465, in format
2018-11-29 17:01:46,853 - twisted - 244 - ERROR -  -     record.message = record.getMessage()
2018-11-29 17:01:46,854 - twisted - 244 - ERROR -  -   File "/usr/lib/python2.7/logging/__init__.py", line 329, in getMessage
2018-11-29 17:01:46,854 - twisted - 244 - ERROR -  -     msg = msg % self.args
2018-11-29 17:01:46,855 - twisted - 244 - ERROR -  - UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)
2018-11-29 17:01:46,856 - twisted - 244 - ERROR -  - Logged from file <console>, line 1

However, this doesn't explain why we end up in an infinite loop. It also doesn't explain why it affects python3 (if indeed #4202 is the same problem), because there is no implicit encode/decode on py3.

@richvdh
Copy link
Member Author

richvdh commented Nov 29, 2018

I feel like this might be related to non-ascii event ids, which could end up in the logcontext; however I still can't quite figure out how that can make everything explode.

@progserega
Copy link

After I start matrix-synapse as service (not in screen with '-n' option) - it is crash after ~23 hour...
It is critical bug.

    2018-11-30 18:26:58,125 - synapse.handlers.sync - 909 - INFO - GET-306472- Calculating sync response for @progserega:rsprim.ru
    2018-11-30 18:26:58,130 - synapse.metrics - 374 - INFO - - Collecting gc 0
    2018-11-30 18:26:58,133 - synapse.metrics - 374 - INFO - - Collecting gc 1
    2018-11-30 18:26:58,143 - synapse.access.http.8008 - 302 - INFO - GET-306472- 127.0.0.1 - 8008 - {@progserega:rsprim.ru} Processed request: 0.019sec/0.001sec (0.007sec, 0.000sec) (0.001sec/0.010sec/2) 1394B 200 "GET /_matrix/client/r0/sync?filter=0&timeout=30000&since=s254845_9330045_1901_315478_75945_322_7228_640    __62_317 HTTP/1.0" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/70.0.3538.77 Chrome/70.0.3538.77 Safari/537.36" [0 dbevts]
    2018-11-30 18:26:58,143 - synapse.metrics - 374 - INFO - - Collecting gc 2
    2018-11-30 18:26:58,568 - synapse.access.https.8448 - 233 - INFO - PUT-306473- 5.9.9.18 - 8448 - Received request: PUT /_matrix/federation/v1/send/1541183832897/
    2018-11-30 18:26:58,569 - synapse.federation.transport.server - 130 - INFO - PUT-306473- Request from amorgan.xyz
    2018-11-30 18:26:58,570 - synapse.federation.transport.server - 336 - INFO - PUT-306473- Received txn 1541183832897 from amorgan.xyz. (PDUs: 0, EDUs: 1)
    2018-11-30 18:26:58,570 - synapse.util.async_helpers - 205 - INFO - PUT-306473- Acquired uncontended linearizer lock 'fed_txn_handler' for key (u'amorgan.xyz', u'1541183832897')
    2018-11-30 18:26:58,571 - twisted - 243 - INFO - - Timing out client: IPv4Address(TCP, '37.221.195.157', 56728) 
    2018-11-30 18:26:58,572 - synapse.access.http.8008 - 233 - INFO - PUT-306474- 127.0.0.1 - 8008 - Received request: PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/m1543566418185.67?
    2018-11-30 18:26:58,575 - synapse.access.http.8008 - 233 - INFO - PUT-306475- 127.0.0.1 - 8008 - Received request: PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/m1543566418228.68?
    2018-11-30 18:26:58,577 - synapse.access.http.8008 - 233 - INFO - PUT-306476- 127.0.0.1 - 8008 - Received request: PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/m1543566418286.69?
    2018-11-30 18:26:58,579 - synapse.access.http.8008 - 233 - INFO - PUT-306477- 127.0.0.1 - 8008 - Received request: PUT /_matrix/client/unstable/sendToDevice/m.room.encrypted/m1543566418345.70?
    2018-11-30 18:26:58,583 - twisted - 243 - ERROR - PUT-306475- Traceback (most recent call last): 
    2018-11-30 18:26:58,583 - twisted - 243 - ERROR - PUT-306475-   File "/usr/lib/python2.7/logging/handlers.py", line 76, in emit
    2018-11-30 18:26:58,583 - twisted - 243 - ERROR - PUT-306475-     if self.shouldRollover(record):
    2018-11-30 18:26:58,584 - twisted - 243 - ERROR - PUT-306475-   File "/usr/lib/python2.7/logging/handlers.py", line 156, in shouldRollover
    2018-11-30 18:26:58,585 - twisted - 243 - ERROR - PUT-306475-     msg = "%s\n" % self.format(record)
    2018-11-30 18:26:58,586 - twisted - 243 - ERROR - PUT-306475-   File "/usr/lib/python2.7/logging/__init__.py", line 734, in format 
    2018-11-30 18:26:58,586 - twisted - 243 - ERROR - PUT-306475-     return fmt.format(record)
    2018-11-30 18:26:58,586 - twisted - 243 - ERROR - PUT-306475-   File "/usr/lib/python2.7/logging/__init__.py", line 476, in format 
    2018-11-30 18:26:58,587 - twisted - 243 - ERROR - PUT-306475-     raise e
    2018-11-30 18:26:58,587 - twisted - 243 - ERROR - PUT-306475- UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 51: ordinal not in range(128)
    2018-11-30 18:26:58,587 - twisted - 243 - ERROR - PUT-306475- Logged from file _base.py, line 254
    2018-11-30 18:26:58,582 - synapse.metrics - 374 - INFO - - Collecting gc 0
    2018-11-30 18:26:58,590 - twisted - 243 - ERROR - PUT-306476- Traceback (most recent call last): 

@richvdh
Copy link
Member Author

richvdh commented Nov 30, 2018

@progserega when you say crash you mean... stops responding? uses 100% cpu? segfaults?

@progserega
Copy link

progserega commented Nov 30, 2018

@progserega when you say crash you mean... stops responding? uses 100% cpu? segfaults?

No, 100 not use:
1

But my script check service by execute command every 10 minutes:
curl --connect-timeout 60 --max-time 120 -XGET "http://localhost:8008/_matrix/client/versions"

If command fail - script save homeserver.log and restart matrix-synapse service.
In previouse comment I post last lines of this saved log.

@progserega
Copy link

So it is "stop responding".

@richvdh
Copy link
Member Author

richvdh commented Nov 30, 2018

ok so maybe it's not actually an infinite recursion. Can you send me (@richvdh:sw1v.org) the full output from stderr (say, a couple of days' worth) after running with -n ?

@neilisfragile neilisfragile added z-bug (Deprecated Label) p1 labels Nov 30, 2018
@progserega
Copy link

progserega commented Dec 1, 2018

I start synapse as:
/usr/bin/python -m "synapse.app.homeserver" -n --config-path /etc/matrix-synapse/homeserver.yaml --config-path /etc/matrix-synapse/conf.d/ &> synapse_n_stdout_file.log

I will send synapse_n_stdout_file.log to you after two days.

@progserega
Copy link

ok so maybe it's not actually an infinite recursion. Can you send me (@richvdh:sw1v.org) the full output from stderr (say, a couple of days' worth) after running with -n ?

I was send log to you in private chat.

@richvdh
Copy link
Member Author

richvdh commented Dec 3, 2018

OK so sadly nothing particularly enlightening in those logs. I have figured out where @progserega 's UnicodeDecode errors are coming from (#4252), but still don't understand why his server stops responding.

@progserega: it might be worth disabling the StreamHandler in your logging config and leaving it a bit longer.

@progserega
Copy link

progserega commented Dec 3, 2018

Now I start synapse as service with "-n" options (I think this is may help not crash to server). Do the "-n" doing such as disabling "console" in log config?

@progserega: it might be worth disabling the StreamHandler in your logging config and leaving it a bit longer.

How I must change log config?
Change handlers: [file, console] to handlers: [file]?

Now my /etc/matrix-synapse/log.yaml:

    version: 1
    
    formatters:
      precise:
       format: '%(asctime)s - %(name)s - %(lineno)d - %(levelname)s - %(request)s- %(message)s'
    
    filters:
      context:
        (): synapse.util.logcontext.LoggingContextFilter
        request: ""
    
    handlers:
      file:
        class: logging.handlers.RotatingFileHandler
        formatter: precise
        filename: /var/log/matrix-synapse/homeserver.log
        maxBytes: 104857600
        backupCount: 10
        filters: [context]
      console:
        class: logging.StreamHandler
        formatter: precise
        level: WARN
    
    loggers:
        synapse:
            level: INFO
    
        synapse.storage.SQL:
            level: INFO
    
    root:
        level: INFO
        handlers: [file, console]

@gilanghamidy
Copy link

I also experienced stack overflow error caused by this logging, and it does not want to recover by restarting. The stack overflow somehow happened at some point in time due to unknown origin.

Below is some entry of the log

synapse[3261]: Fatal Python error: Cannot recover from stack overflow.
synapse[3261]: Current thread 0x00007fa5ac6066c0 (most recent call first):
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 367 in getMessage
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 608 in format
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 869 in format
synapse[3261]:   File "/usr/lib64/python3.7/logging/handlers.py", line 185 in shouldRollover
synapse[3261]:   File "/usr/lib64/python3.7/logging/handlers.py", line 69 in emit
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 894 in handle
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1586 in callHandlers
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1524 in handle
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1514 in _log
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1444 in log
synapse[3261]:   File "/usr/lib64/python3.7/site-packages/twisted/logger/_stdlib.py", line 116 in __call__
synapse[3261]:   File "/usr/lib/python3.7/site-packages/synapse/config/logger.py", line 192 in _log
synapse[3261]:   File "/usr/lib64/python3.7/site-packages/twisted/logger/_observer.py", line 131 in __call__
synapse[3261]:   File "/usr/lib64/python3.7/site-packages/twisted/logger/_logger.py", line 144 in emit
synapse[3261]:   File "/usr/lib64/python3.7/site-packages/twisted/logger/_io.py", line 170 in write
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 946 in handleError
synapse[3261]:   File "/usr/lib64/python3.7/logging/handlers.py", line 73 in emit
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 894 in handle
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1586 in callHandlers
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1524 in handle
synapse[3261]:   File "/usr/lib64/python3.7/logging/__init__.py", line 1514 in _log
...

I tried to make a workaround by preventing the recursive call of log function in synapse/config/logger.py by making the function return if it is previously called. Below is the modification:

# Global variable to store the "semaphore"
inside_log = False

def _setup_stdlib_logging(config, log_config, logBeginner: LogBeginner):
    # .....
    # Route Twisted's native logging through to the standard library logging
    # system.
    observer = STDLibLogObserver()

    def _log(event):
        # If the function is previously entered, just return
        if inside_log:
            return observer(None)
        # Close the semaphore
        inside_log = True

        # .....
        # Temporary store the result
        result = observer(event)
        
        # Open the semaphore
        inside_log = False
        
        # Previously return observer(event)
        return result 

At least now the Synapse server is running again. But I cannot identify the root cause of the error. The log for original crash is already overwritten after several restarting attempt. So I am not sure what causes it.

@symphorien
Copy link

I also can reproduce the fatal python error:

ep 06 10:11:54 blitiri homeserver[18837]: Fatal Python error: Cannot recover from stack overflow.
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f5897a3e700 (most recent call first):
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 296 in wait
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/queue.py", line 170 in get
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/rk6naxarmc4s069f4kkh1r9zk645id3f-python3.7-Twisted-20.3.0/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 45 in work
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 870 in run
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 926 in _bootstrap_inner
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 890 in _bootstrap
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f58c9bfb700 (most recent call first):
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 296 in wait
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/queue.py", line 170 in get
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/rk6naxarmc4s069f4kkh1r9zk645id3f-python3.7-Twisted-20.3.0/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 45 in work
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 870 in run
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 926 in _bootstrap_inner
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 890 in _bootstrap
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f58ca7fc700 (most recent call first):
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 296 in wait
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/queue.py", line 170 in get
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/rk6naxarmc4s069f4kkh1r9zk645id3f-python3.7-Twisted-20.3.0/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 45 in work
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 870 in run
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 926 in _bootstrap_inner
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 890 in _bootstrap
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f58caffd700 (most recent call first):
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 296 in wait
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/queue.py", line 170 in get
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/rk6naxarmc4s069f4kkh1r9zk645id3f-python3.7-Twisted-20.3.0/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 45 in work
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 870 in run
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 926 in _bootstrap_inner
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 890 in _bootstrap
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f58cb7fe700 (most recent call first):
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 296 in wait
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/queue.py", line 170 in get
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/rk6naxarmc4s069f4kkh1r9zk645id3f-python3.7-Twisted-20.3.0/lib/python3.7/site-packages/twisted/_threads/_threadworker.py", line 45 in work
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 870 in run
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 926 in _bootstrap_inner
Sep 06 10:11:54 blitiri homeserver[18837]:   File "/nix/store/r94aa2gj4drkhfvkm2p4ab6cblb6kxlq-python3-3.7.6/lib/python3.7/threading.py", line 890 in _bootstrap
Sep 06 10:11:54 blitiri homeserver[18837]: Thread 0x00007f58cbfff700 (most recent call first):
...

It takes about 10-20 minutes to trigger this again.
I use this logging configuration:

 {"disable_existing_loggers":false,"root":{"level":"WARNING"},"version":1}

@richvdh
Copy link
Member Author

richvdh commented Sep 7, 2020

@symphorien it looks like you've cut out the important part of that error log?

@richvdh
Copy link
Member Author

richvdh commented Sep 7, 2020

I've raised a PR which I hope will stop synapse aborting when the error happens, but it won't solve the underlying problem.

The most likely cause of an error is that you've configured a rotating log handler, but the synapse process doesn't have permissions to write to the directory containing the log files.

@richvdh
Copy link
Member Author

richvdh commented Sep 7, 2020

fixed by #8268

@richvdh richvdh closed this as completed Sep 7, 2020
@symphorien
Copy link

Thanks, it works, but it seems that with my logging configuration {"disable_existing_loggers":false,"root":{"level":"WARNING"},"version":1}, all logging errors out:

Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:27:47+0200 [stderr#error] ERROR:twisted:WARNING:root:***** STARTING SERVER *****
Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:27:47+0200 [stderr#error] ERROR:twisted:WARNING:root:Server /nix/store/rjj4hmzxyqc4g01khpgzs5362cybc1hr-matrix-synapse-1.19.1/bin/homeserver version 1.19.1
Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:36:00+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cdec94850 method='GET' uri='/_matrix/client/r0/sync?access_token=<redacted>&since=s342414_990566_0_167205_10737_15_11307_69910_1&full_state=false&timeout=30000' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:36:00+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cdc72fb50 method='GET' uri='/_matrix/client/r0/sync?access_token=<redacted>&since=s342414_990566_0_167205_10737_15_11307_69910_1&timeout=30000&filter=%7B%22room%22:%7B%22state%22:%7B%22lazy_load_members%22:true%7D%7D%7D' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:36:18+0200 [stderr#error] ERROR:twisted:WARNING:synapse.logging.context:Re-starting finished log context PUT-183-$1599510557118966PFBFh:matrix.org
Sep 07 22:47:11 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:36:18+0200 [stderr#error] ERROR:twisted:WARNING:synapse.logging.context:Re-starting finished log context PUT-183-$1599510761120565iMFPX:matrix.org
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:36:18+0200 [stderr#error] ERROR:twisted:WARNING:synapse.logging.context:Re-starting finished log context PUT-183-$1599510784120702dOCit:matrix.org
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:06+0200 [stderr#error] ERROR:twisted:WARNING:synapse.rest.media.v1._base:Failed to write to consumer: <class 'Exception'> Consumer asked us to stop producing
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:14+0200 [stderr#error] ERROR:twisted:WARNING:synapse.storage.txn:[TXN OPERROR] {insert_linearized_receipt-a25} could not serialize access due to concurrent update
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:14+0200 [stderr#error] ERROR:twisted: 0/5
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:25+0200 [stderr#error] ERROR:twisted:WARNING:synapse.storage.txn:[TXN OPERROR] {insert_graph_receipt-b3e} could not serialize access due to concurrent delete
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:25+0200 [stderr#error] ERROR:twisted: 0/5
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:56+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cdc757510 method='PUT' uri='/_matrix/federation/v1/send/1599495298626' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:56+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cb7b0f6d0 method='PUT' uri='/_matrix/federation/v1/send/1599495298626' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:39:56+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cb5cf4d90 method='PUT' uri='/_matrix/federation/v1/send/1599495298626' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:41:21+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2cb8aaf910 method='GET' uri='/_matrix/client/r0/sync?access_token=<redacted>&since=s342414_990566_0_167205_10737_15_11307_69910_1&full_state=true' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:41:59+0200 [stderr#error] ERROR:twisted:WARNING:synapse.storage.txn:[TXN OPERROR] {insert_linearized_receipt-dfd} could not serialize access due to concurrent update
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:41:59+0200 [stderr#error] ERROR:twisted: 0/5
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:42:02+0200 [stderr#error] ERROR:twisted:WARNING:synapse.federation.transport.server:client disconnected before we started processing request
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:42:02+0200 [stderr#error] ERROR:twisted:WARNING:synapse.http.server:Not sending response to request <XForwardedForRequest at 0x7f2c9d4419d0 method='PUT' uri='/_matrix/federation/v1/send/1599465045289' clientproto='HTTP/1.0' site=8008>, already disconnected.
Sep 07 22:47:12 blitiri homeserver[16085]: logging during logging: 2020-09-07T22:44:05+0200 [stderr#error] ERROR:twisted:WARNING:synapse.rest.media.v1._base:Failed to write to consumer: <class 'Exception'> Consumer asked us to stop producing

@richvdh
Copy link
Member Author

richvdh commented Sep 8, 2020

@symphorien: that log config will send logging to redirected stderr (ie, back to the logging system). This will be the reason for your earlier stack overflows.

You need to configure a handler:

version: 1

handlers:
    console:
        class: logging.StreamHandler

root:
    level: INFO
    handlers: [console]

disable_existing_loggers: false

This was referenced Jan 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
z-bug (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

5 participants