Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On un-sleeping laptop, element desktop sometimes goes entirely white (because EW has OOMed during the night) #680

Open
ara4n opened this issue Mar 24, 2022 · 65 comments · Fixed by matrix-org/matrix-js-sdk#2392
Labels

Comments

@ara4n
Copy link
Member

ara4n commented Mar 24, 2022

Steps to reproduce

  1. Unsleep mac laptop while running nightly
  2. Observe that the app is entirely blank (and inspector won't open)

Screenshot 2022-03-24 at 08 37 46

The app quite cleanly via File: Quit however.

Will send logs at next launch.

Outcome

What did you expect?

No white

What happened instead?

All the white

Operating system

macOS 12.3 on M1

Application version

Nightly

How did you install the app?

nightly

Homeserver

matrix.org

Will you send logs?

Yes

@ara4n ara4n added the T-Defect label Mar 24, 2022
@ara4n
Copy link
Member Author

ara4n commented Apr 4, 2022

I am currently getting this almost every morning.

@ara4n
Copy link
Member Author

ara4n commented Apr 10, 2022

Yup, i've had this every single time i've unslept my laptop over the last week.

@ara4n
Copy link
Member Author

ara4n commented May 9, 2022

See also #672 and #824 and electron/electron#2357.

I am getting this literally every day i open my M1 mac. @novocaine says here that:

A white screen means Electron's renderer process has crashed. Element code can't crash that process as it runs in a container - there's some problem with either Electron or your system (drivers, run out of resources etc).

Just for some additional diagnostics, the pstree I have running while it's stuck in this state are:

 |-+= 33570 matthew /Applications/Element Nightly.app/Contents/MacOS/Element Nightly
 | |--- 33571 matthew /Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (GPU).app/Contents/MacOS/Element Nightly Helper (GPU) --type=gpu-process --user-data-dir=/Users/matthew/Library/Application Support/Element Nightly --gpu-preferences=UAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJgEAAAAAAAAmAQAAAAAAACIAQAAMAAAAIABAAAAAAAAiAEAAAAAAACQAQAAAAAAAJgBAAAAAAAAoAEAAAAAAACoAQAAAAAAALABAAAAAAAAuAEAAAAAAADAAQAAAAAAAMgBAAAAAAAA0AEAAAAAAADYAQAAAAAAAOABAAAAAAAA6AEAAAAAAADwAQAAAAAAAPgBAAAAAAAAAAIAAAAAAAAIAgAAAAAAABACAAAAAAAAGAIAAAAAAAAgAgAAAAAAACgCAAAAAAAAMAIAAAAAAAA4AgAAAAAAAEACAAAAAAAASAIAAAAAAABQAgAAAAAAAFgCAAAAAAAAYAIAAAAAAABoAgAAAAAAAHACAAAAAAAAeAIAAAAAAACAAgAAAAAAAIgCAAAAAAAAkAIAAAAAAACYAgAAAAAAAKACAAAAAAAAqAIAAAAAAACwAgAAAAAAALgCAAAAAAAAwAIAAAAAAADIAgAAAAAAANACAAAAAAAA2AIAAAAAAADgAgAAAAAAAOgCAAAAAAAA8AIAAAAAAAD4AgAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAHAAAAEAAAAAAAAAAAAAAACAAAABAAAAAAAAAAAAAAAAkAAAAQAAAAAAAAAAAAAAALAAAAEAAAAAAAAAAAAAAADAAAABAAAAAAAAAAAAAAAA4AAAAQAAAAAAAAAAAAAAAPAAAAEAAAAAAAAAABAAAAAAAAABAAAAAAAAAAAQAAAAcAAAAQAAAAAAAAAAEAAAAIAAAAEAAAAAAAAAABAAAACQAAABAAAAAAAAAAAQAAAAsAAAAQAAAAAAAAAAEAAAAMAAAAEAAAAAAAAAABAAAADgAAABAAAAAAAAAAAQAAAA8AAAAQAAAAAAAAAAQAAAAAAAAAEAAAAAAAAAAEAAAABwAAABAAAAAAAAAABAAAAAgAAAAQAAAAAAAAAAQAAAAJAAAAEAAAAAAAAAAEAAAACwAAABAAAAAAAAAABAAAAAwAAAAQAAAAAAAAAAQAAAAOAAAAEAAAAAAAAAAEAAAADwAAABAAAAAAAAAABwAAAAAAAAAQAAAAAAAAAAcAAAAHAAAAEAAAAAAAAAAHAAAACAAAABAAAAAAAAAABwAAAAkAAAAQAAAAAAAAAAcAAAALAAAAEAAAAAAAAAAHAAAADAAAABAAAAAAAAAABwAAAA4AAAAQAAAAAAAAAAcAAAAPAAAAEAAAAAAAAAAIAAAAAAAAABAAAAAAAAAACAAAAAcAAAAQAAAAAAAAAAgAAAAIAAAAEAAAAAAAAAAIAAAACQAAABAAAAAAAAAACAAAAAsAAAAQAAAAAAAAAAgAAAAMAAAAEAAAAAAAAAAIAAAADgAAABAAAAAAAAAACAAAAA8AAAAQAAAAAAAAAAoAAAAAAAAAEAAAAAAAAAAKAAAABwAAABAAAAAAAAAACgAAAAgAAAAQAAAAAAAAAAoAAAAJAAAAEAAAAAAAAAAKAAAACwAAABAAAAAAAAAACgAAAAwAAAAQAAAAAAAAAAoAAAAOAAAAEAAAAAAAAAAKAAAADwAAAAgAAAAAAAAACAAAAAAAAAA= --shared-files --field-trial-handle=1718379636,9501207313218371973,7896253060103614826,131072 --enable-features=WebRTCPipeWireCapturer --disable-features=HardwareMediaKeyHandling,MediaSessionService,PlzServiceWorker,SpareRendererForSitePerProcess --seatbelt-client=40
 | \--- 33572 matthew /Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper.app/Contents/MacOS/Element Nightly Helper --type=utility --utility-sub-type=network.mojom.NetworkService --lang=en-GB --service-sandbox-type=network --user-data-dir=/Users/matthew/Library/Application Support/Element Nightly --standard-schemes=vector --enable-sandbox --secure-schemes=vector --bypasscsp-schemes --cors-schemes --fetch-schemes=vector --service-worker-schemes --streaming-schemes --shared-files --field-trial-handle=1718379636,9501207313218371973,7896253060103614826,131072 --enable-features=WebRTCPipeWireCapturer --disable-features=HardwareMediaKeyHandling,MediaSessionService,PlzServiceWorker,SpareRendererForSitePerProcess --seatbelt-client=41

although fwiw Instruments (having sampled via the Activity Monitor instrument) also claims there's MTLCompilerService and VTDecoderXPCService and VTEncoderXPCServicer with a "responsible process" of 33570 (the main Element Nightly).

Meanwhile, taking a thread dump of the three main processes via the Activity Monitor app (not Instruments) gives:

nightly-sample.txt
nightly-helper-sample.txt
nightly-helper-gpu-sample.txt

Ah - I just found the crash log for the renderer process, which apparently exited with SIGTRAP (EXC_BREAKPOINT?!) at 02:43:08, when i was asleep:

nightly-helper-renderer-crashdump.txt

For the sake of completeness, here are the last few other crashdumps from the same in the middle of the night:

Element Nightly Helper (Renderer)-2022-05-04-043838.ips.txt
Element Nightly Helper (Renderer)-2022-05-07-025828.ips.txt

Looking at the JSON stackstraces in these thread dumps:

crash.json.txt

...it looks like thread ID 1248168 is the one which crashed, due to v8 running out of memory (while trying to GC):

  {
    "triggered": true,
    "id": 1248168,
    "name": "ThreadPoolForegroundWorker",
    "threadState": {
      "x": [
        {
          "value": 4732437384
        },
        {
          "value": 105553245181312
        },
        {
          "value": 1024
        },
        {
          "value": 6272652880
        },
        {
          "value": 6272652032
        },
        {
          "value": 6272651744
        },
        {
          "value": 729647118309407400
        },
        {
          "value": 1280
        },
        {
          "value": 0
        },
        {
          "value": 0
        },
        {
          "value": 0
        },
        {
          "value": 2
        },
        {
          "value": 2
        },
        {
          "value": 0
        },
        {
          "value": 0
        },
        {
          "value": 0
        },
        {
          "value": 7241649484,
          "symbolLocation": 0,
          "symbol": "pthread_mutex_unlock"
        },
        {
          "value": 6272692224
        },
        {
          "value": 0
        },
        {
          "value": 6272653200
        },
        {
          "value": 6272653216
        },
        {
          "value": 6272653224
        },
        {
          "value": 6272651880
        },
        {
          "value": 6272688863
        },
        {
          "value": 4731625472,
          "symbolLocation": 15664,
          "symbol": "v8_inspector::protocol::Debugger::API::Paused::ReasonEnum::XHR"
        },
        {
          "value": 56
        },
        {
          "value": 8683873544,
          "symbolLocation": 0,
          "symbol": "__stderrp"
        },
        {
          "value": 534836150272
        },
        {
          "value": 134227373
        }
      ],
      "flavor": "ARM_THREAD_STATE64",
      "lr": {
        "value": 4651545376
      },
      "cpsr": {
        "value": 1610616832
      },
      "fp": {
        "value": 6272653184
      },
      "sp": {
        "value": 6272651856
      },
      "esr": {
        "value": 4060086272,
        "description": "(Breakpoint) brk 0"
      },
      "pc": {
        "value": 4651545904,
        "matchesCrashFrame": 1
      },
      "far": {
        "value": 6272653624
      }
    },
    "frames": [
      {
        "imageOffset": 42005808,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 8767684,
        "imageIndex": 3
      },
      {
        "imageOffset": 42005280,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 8767156,
        "imageIndex": 3
      },
      {
        "imageOffset": 60872656,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 27634532,
        "imageIndex": 3
      },
      {
        "imageOffset": 13763320,
        "symbol": "v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*)",
        "symbolLocation": 724,
        "imageIndex": 3
      },
      {
        "imageOffset": 13763112,
        "symbol": "v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*)",
        "symbolLocation": 516,
        "imageIndex": 3
      },
      {
        "imageOffset": 15541168,
        "symbol": "v8::internal::Heap::StartIncrementalMarking(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags)",
        "symbolLocation": 1108,
        "imageIndex": 3
      },
      {
        "imageOffset": 16367628,
        "symbol": "v8::internal::GlobalSafepointScope::~GlobalSafepointScope()",
        "symbolLocation": 241132,
        "imageIndex": 3
      },
      {
        "imageOffset": 16183292,
        "symbol": "v8::internal::GlobalSafepointScope::~GlobalSafepointScope()",
        "symbolLocation": 56796,
        "imageIndex": 3
      },
      {
        "imageOffset": 16128432,
        "symbol": "v8::internal::GlobalSafepointScope::~GlobalSafepointScope()",
        "symbolLocation": 1936,
        "imageIndex": 3
      },
      {
        "imageOffset": 16127796,
        "symbol": "v8::internal::GlobalSafepointScope::~GlobalSafepointScope()",
        "symbolLocation": 1300,
        "imageIndex": 3
      },
      {
        "imageOffset": 60775440,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 27537316,
        "imageIndex": 3
      },
      {
        "imageOffset": 60770400,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 27532276,
        "imageIndex": 3
      },
      {
        "imageOffset": 42241136,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9003012,
        "imageIndex": 3
      },
      {
        "imageOffset": 42376056,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9137932,
        "imageIndex": 3
      },
      {
        "imageOffset": 42373568,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9135444,
        "imageIndex": 3
      },
      {
        "imageOffset": 42559652,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9321528,
        "imageIndex": 3
      },
      {
        "imageOffset": 42373028,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9134904,
        "imageIndex": 3
      },
      {
        "imageOffset": 42432288,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9194164,
        "imageIndex": 3
      },
      {
        "imageOffset": 42431332,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9193208,
        "imageIndex": 3
      },
      {
        "imageOffset": 42560896,
        "symbol": "node::GetEnvironmentIsolateData(node::Environment*)",
        "symbolLocation": 9322772,
        "imageIndex": 3
      },
      {
        "imageOffset": 29292,
        "symbol": "_pthread_start",
        "symbolLocation": 148,
        "imageIndex": 6
      },
      {
        "imageOffset": 8332,
        "symbol": "thread_start",
        "symbolLocation": 8,
        "imageIndex": 6
      }
    ]
  },

So: what could be causing an explosion of v8 memory utilisation on the renderer process, and how do I stop it?

@ara4n ara4n changed the title On un-sleeping laptop, element desktop sometimes goes entirely white On un-sleeping laptop, element desktop sometimes goes entirely white (because EW has OOMed during the night) May 9, 2022
@ara4n
Copy link
Member Author

ara4n commented May 10, 2022

So I just got this again, except this time the inspector was open. This didn't help much, given the inspector disconnected from the app when the renderer crashed, however I think I have an accurate memory heap size estimate from the point of the crash (701MB)...

Screenshot 2022-05-10 at 09 31 25

...although this probably don't reflect the sudden memory spike which killed it.

The crash happened at 05:27:15 GMT+1 according to the crash log:

Element Nightly Helper (Renderer)-2022-05-10-052716.ips.txt

The last thing on the console was:

05:11:43.118 rageshake.ts:72 got device keys for @ajbura:matrix.org: {CEWMASBLTX: {…}, IBHOESCDUX: {…}, IXANQRLYTC: {…}, KRLRINCZCW: {…}, OBVENZDKQL: {…}, …}
05:11:43.118 rageshake.ts:72 got cross-signing keys for @ajbura:matrix.org: {master: {…}, self_signing: {…}, user_signing: undefined}
05:11:43.121 rageshake.ts:72 Completed key download for @ajbura:matrix.org
05:11:43.121 rageshake.ts:72 Device list for @ajbura:matrix.org now up to date
05:11:43.705 ://vector/webapp/872e090734592b905ed9.worker.js:885 Persisting sync data up to m2928537925~3.2928537933~36.2928537933~2.2928537933_757284974_46494294_1350728213_1364023735_3238494_483925928_3618151627_212414
05:11:44.311 rageshake.ts:72 Saving device tracking data m2928537925~3.2928537933~36.2928537933~2.2928537933_757284974_46494294_1350728213_1364023735_3238494_483925928_3618151627_212414

...so significantly before the crash.

@ara4n
Copy link
Member Author

ara4n commented May 10, 2022

Options for debugging this further:

  • Get a coredump and hope that lldb can walk its v8 stack, or failing that manually look in the data of the native stack to see what the exploding thread was up to. Trying this tonight via https://developer.apple.com/forums/thread/694233
  • Monitor the native RSZ and SIGSTOP when it spikes so you can then hopefully attach the chrome inspector and breakpoint it before it dies
  • Figure out how to monitor the v8 heap from within electron and call debugger before it dies.
  • Disable SIP and try to run it under lldb?
  • Build a debug electron, and run that under lldb?

Also discussed at https://matrix.to/#/!zKcwXhRfcrDOBEYsgu:matrix.org/$fHG1BhGvNW4-Z-LvRjVowzVHrelDewTWzJs7J7_0j6A?via=matrix.org&via=element.io&via=vector.modular.im

@MadLittleMods
Copy link

@ara4n Please document any debugging steps to get any of those options working. I'm extremely interested in how to get information for scenarios like these as well.

@ara4n
Copy link
Member Author

ara4n commented May 11, 2022

So it crashed again last night (at 03:22:35), but it didn't dump a core (despite following @justkwin's instructions at https://developer.apple.com/forums/thread/694233). However, did get another datapoint simply from having run Electron from the commandline, which is that its dying words were:

<--- Last few GCs --->

[48339:0x158050000] 15205366 ms: Scavenge 2986.0 (3102.8) -> 2984.6 (3104.1) MB, 6.5 / 0.0 ms  (average mu = 0.982, current mu = 0.984) allocation failure 
[48339:0x158050000] 15205390 ms: Scavenge 2988.0 (3104.1) -> 2988.1 (3104.1) MB, 16.5 / 0.0 ms  (average mu = 0.982, current mu = 0.984) allocation failure 
[48339:0x158050000] 15205404 ms: Scavenge 2988.3 (3104.1) -> 2988.3 (3104.1) MB, 14.5 / 0.0 ms  (average mu = 0.982, current mu = 0.984) allocation failure 


<--- JS stacktrace --->

[48339:0511/032235.122008:FATAL:v8_initializer.cc(820)] 
Error sending from webFrameMain:  Error: Render frame was disposed before WebFrameMain could be accessed
    at EventEmitter._send (<anonymous>)
    at EventEmitter.n.send (node:electron/js2c/browser_init:165:417)
    at EventEmitter.b.send (node:electron/js2c/browser_init:161:2494)
    at IpcMainImpl.<anonymous> (/Applications/Element Nightly.app/Contents/Resources/app.asar/lib/electron-main.js:803:32)
    at Generator.next (<anonymous>)
    at fulfilled (/Applications/Element Nightly.app/Contents/Resources/app.asar/lib/electron-main.js:42:58)
Error sending from webFrameMain:  Error: Render frame was disposed before WebFrameMain could be accessed
    at EventEmitter.n.send (node:electron/js2c/browser_init:165:417)
    at EventEmitter.b.send (node:electron/js2c/browser_init:161:2494)
    at ipcChannelSendUpdateStatus (/Applications/Element Nightly.app/Contents/Resources/app.asar/lib/updater.js:90:35)
    at AutoUpdater.<anonymous> (/Applications/Element Nightly.app/Contents/Resources/app.asar/lib/updater.js:95:5)
    at AutoUpdater.emit (node:events:390:28)

In other words, v8 heap spiked up to 3.1GB before it died (up from the ~600MB it was when it went to sleep).

I guess i try to find out why it didn't dump a core, or failing that try to run it under lldb, or failing that do the SIGSTOP hack to try to catch it in the act.

@ara4n
Copy link
Member Author

ara4n commented May 11, 2022

So i can get naive core dumps okay following Apple's instructions:

matthew@shadowfax ~ % cat crash.c          
#include <stdio.h>

int main() {
	char * c = 0;
	printf("%c", *c);
	return(0);
}
matthew@shadowfax ~ % cc -o crash crash.c
matthew@shadowfax ~ % ulimit -c unlimited
matthew@shadowfax ~ % /usr/libexec/PlistBuddy -c "Add :com.apple.security.get-task-allow bool true" tmp.entitlements
File Doesn't Exist, Will Create: tmp.entitlements
matthew@shadowfax ~ % codesign -s - -f --entitlements tmp.entitlements crash
crash: replacing existing signature
matthew@shadowfax ~ % time ./crash
zsh: segmentation fault (core dumped)  ./crash
./crash  0.00s user 0.50s system 61% cpu 0.813 total
matthew@shadowfax ~ % ls -al /cores/core.55325 
-r--------  1 matthew  wheel  3023405056 11 May 09:31 /cores/core.55325
matthew@shadowfax ~ % lldb -c /cores/core.55325 
(lldb) target create --core "/cores/core.55325"
Core file '/cores/core.55325' (arm64) was loaded.
(lldb) bt
* thread vector-im/element-web#1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x0)
  * frame #0: 0x0000000100377f7c crash`main + 32
    frame vector-im/element-web#1: 0x0000000100695088 dyld`start + 516

@ara4n
Copy link
Member Author

ara4n commented May 11, 2022

So it looks like the reason the core dumps failed for the renderer process is that i failed to set the entitlements correctly:

matthew@shadowfax MacOS % codesign -s - -f --entitlements tmp.entitlements Element\ Nightly\ Helper\ \(Renderer\) 
Element Nightly Helper (Renderer): replacing existing signature
Element Nightly Helper (Renderer): code object is not signed at all
In subcomponent: /Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (Renderer).app/Contents/MacOS/tmp.entitlements

and "code object is not signed at all" is a cryptic error implying that it couldn't re-sign it for some reason. The binary absolutely is signed though; codesign -dvvvvvvvv gives:

Signature size=9078
Authority=Developer ID Application: Vector Creations Limited (7J4U792NQT)
Authority=Developer ID Certification Authority
Authority=Apple Root CA
Timestamp=11 May 2022 at 08:05:12

Perhaps it needs to be re-signed with a real identity rather than an adhoc identity ("-"). Eitherway, after running codesign -s, the output of codesign -dvvvvvvvvvvvvv Element\ Nightly\ Helper\ \(Renderer\) is the same as before, and the entitlements have not been updated to include the required get-task-allow entitlement:

matthew@shadowfax MacOS % codesign --display --entitlements - Element\ Nightly\ Helper\ \(Renderer\)                    
Executable=/Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (Renderer).app/Contents/MacOS/Element Nightly Helper (Renderer)
[Dict]
	[Key] com.apple.security.cs.allow-jit
	[Value]
		[Bool] true
	[Key] com.apple.security.cs.disable-library-validation
	[Value]
		[Bool] true
	[Key] com.apple.security.cs.allow-unsigned-executable-memory
	[Value]
		[Bool] true

So, either i fight against the signing system or i disable SIP...

@ara4n
Copy link
Member Author

ara4n commented May 11, 2022

Having poked further at codesign it turns out it barfed because the tmp.entitlements file was in the cwd - the "code object is not signed at all" was referring to the temporary entitlements file. So, on creating a new entitlements file with the additional get-task-allow entitlement:

% for i in cs.allow-jit cs.disable-library-validation cs.allow-unsigned-executable-memory get-task-allow; do /usr/libexec/PlistBuddy -c "Add :com.apple.security.$i bool true" ~/tmp.entitlements; done
File Doesn't Exist, Will Create: /Users/matthew/tmp.entitlements
% cat ~/tmp.entitlements 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>com.apple.security.cs.allow-jit</key>
	<true/>
	<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
	<true/>
	<key>com.apple.security.cs.disable-library-validation</key>
	<true/>
	<key>com.apple.security.get-task-allow</key>
	<true/>
</dict>
</plist>

...and signing it with a real identity (just because i have one easily to hand, thanks to XCode, and because I don't want to get bitten by weird & wonderful adhoc codesigning quirks):

% codesign -s 9A3ZBW6V78 -f --entitlements ~/tmp.entitlements Element\ Nightly\ Helper\ \(Renderer\)                                                                                                            
Element Nightly Helper (Renderer): replacing existing signature
% codesign --display --entitlements - Element\ Nightly\ Helper\ \(Renderer\)
Executable=/Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (Renderer).app/Contents/MacOS/Element Nightly Helper (Renderer)
[Dict]
	[Key] com.apple.security.cs.allow-jit
	[Value]
		[Bool] true
	[Key] com.apple.security.cs.allow-unsigned-executable-memory
	[Value]
		[Bool] true
	[Key] com.apple.security.cs.disable-library-validation
	[Value]
		[Bool] true
	[Key] com.apple.security.get-task-allow
	[Value]
		[Bool] true
% codesign -dvvvvvvvvvvvvvv Element\ Nightly\ Helper\ \(Renderer\) 2>&1 | egrep 'Sign|Auth'
Signature size=4877
Authority=Apple Development: Matthew Hodgson (9A3ZBW6V78)
Authority=Apple Worldwide Developer Relations Certification Authority
Authority=Apple Root CA
Signed Time=11 May 2022 at 10:19:24

...looks much more plausible. So, setting it running again with ulimit -c unlimited to see what happens. It's still a bit risky, given digging around inside the native core could be a bit painful without a debug build of electron, but it might be good enough to spot what v8 is doing. Failing that, it'll be time for a debug electron build as per https://www.nylas.com/blog/low-level-electron-debugging/.

@ara4n
Copy link
Member Author

ara4n commented May 12, 2022

(of course, having done this, it didn't crash overnight; v8 heap is 700MB in the morning)

@ara4n
Copy link
Member Author

ara4n commented May 13, 2022

good news - it crashed overnight (2022-05-13 02:48:28.7125 +0100). bad news: it still didn't dump core. Perhaps the rlimits got reset on the child process, or perhaps there's something daft like Termination Reason: Namespace SIGNAL, Code 5 Trace/BPT trap: 5 doesn't actually trigger a core dump.

@ara4n
Copy link
Member Author

ara4n commented May 13, 2022

Testing with a new crash test jig (infinite loop), signed with the same entitlements as used on the renderer process, it dumps core as expected when killed with signal 5:

./crash
zsh: trace trap (core dumped)  ./crash

So i can only assume Electron is setting its own rlimits to stop itself from dumping core somehow.

Next stop: disable SIP; attach lldb.

@ara4n
Copy link
Member Author

ara4n commented May 13, 2022

Right, off we go:

matthew@shadowfax ~ % csrutil status
System Integrity Protection status: disabled.
% ps axuw | fgrep -i '(renderer)' | grep -i nightly
matthew           6517 263.9  1.9 471273088 1299728   ??  R    10:20am   0:05.99 /Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (Renderer).app/Contents/MacOS/Element Nightly Helper ...
matthew@shadowfax ~ % lldb -p 6517
(lldb) process attach --pid 6517
Process 6517 stopped
* thread vector-im/element-web#1, name = 'CrRendererMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8
libsystem_kernel.dylib`mach_msg_trap:
->  0x1af4e08d0 <+8>: ret    

libsystem_kernel.dylib`mach_msg_overwrite_trap:
    0x1af4e08d4 <+0>: mov    x16, #-0x20
    0x1af4e08d8 <+4>: svc    #0x80
    0x1af4e08dc <+8>: ret    
Target 0: (Element Nightly Helper (Renderer)) stopped.
Executable module set to "/Applications/Element Nightly.app/Contents/Frameworks/Element Nightly Helper (Renderer).app/Contents/MacOS/Element Nightly Helper (Renderer)".
Architecture set to: arm64e-apple-macosx-.
(lldb) thread list
Process 6517 stopped
* thread vector-im/element-web#1: tid = 0x6736, 0x00000001155f43f0 Electron Framework`v8::internal::JSObject::MigrateToMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Map>, int) + 232, name = 'CrRendererMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  thread vector-im/element-web#3: tid = 0x6749, 0x00000001af4ec378 libsystem_kernel.dylib`kevent64 + 8, name = 'ThreadPoolServiceThread'
  thread vector-im/element-web#4: tid = 0x674a, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#5: tid = 0x674b, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolBackgroundWorker'
  thread vector-im/element-web#6: tid = 0x674c, 0x00000001af4ec378 libsystem_kernel.dylib`kevent64 + 8, name = 'Chrome_ChildIOThread'
  thread vector-im/element-web#7: tid = 0x674d, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#8: tid = 0x674e, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'GpuMemoryThread'
  thread vector-im/element-web#9: tid = 0x674f, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'Compositor'
  thread vector-im/element-web#10: tid = 0x6750, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolSingleThreadSharedForegroundBlocking0'
  thread vector-im/element-web#11: tid = 0x6751, 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'CompositorTileWorker1'
  thread vector-im/element-web#12: tid = 0x6752, 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'CompositorTileWorker2'
  thread vector-im/element-web#13: tid = 0x6753, 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'CompositorTileWorker3'
  thread vector-im/element-web#14: tid = 0x6754, 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'CompositorTileWorker4'
  thread vector-im/element-web#15: tid = 0x6755, 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8, name = 'CompositorTileWorkerBackground'
  thread vector-im/element-web#16: tid = 0x677f, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'Media'
  thread vector-im/element-web#17: tid = 0x6785, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#18: tid = 0x6786, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#19: tid = 0x6787, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#20: tid = 0x6788, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#21: tid = 0x678a, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#22: tid = 0x678b, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#23: tid = 0x678c, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'ThreadPoolForegroundWorker'
  thread vector-im/element-web#24: tid = 0x6791, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'DedicatedWorker thread'
  thread vector-im/element-web#25: tid = 0x6798, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'WebCrypto'
  thread vector-im/element-web#26: tid = 0x682e, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'DedicatedWorker thread'
  thread vector-im/element-web#27: tid = 0x71c2, 0x00000001af4e274c libsystem_kernel.dylib`__workq_kernreturn + 8
(lldb) bt
* thread vector-im/element-web#1, name = 'CrRendererMain', queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001155f43f0 Electron Framework`v8::internal::JSObject::MigrateToMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Map>, int) + 232
    frame vector-im/element-web#1: 0x0000000115619e2c Electron Framework`v8::internal::LookupIterator::ApplyTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>) + 212
    frame vector-im/element-web#2: 0x0000000115656d9c Electron Framework`v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) + 13172
    frame vector-im/element-web#3: 0x00000001155ebff0 Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 12400
    frame vector-im/element-web#4: 0x00000001155ebb98 Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 11288
    frame vector-im/element-web#5: 0x00000001155ea6d0 Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 5968
    frame vector-im/element-web#6: 0x00000001155e948c Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 1292
    frame vector-im/element-web#7: 0x00000001155e9868 Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 2280
    frame vector-im/element-web#8: 0x00000001152ac3f0 Electron Framework`v8::internal::Accessors::MakeAccessor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Name>, void (*)(v8::Local<v8::Name>, v8::PropertyCallbackInfo<v8::Value> const&), void (*)(v8::Local<v8::Name>, v8::Local<v8::Value>, v8::PropertyCallbackInfo<v8::Boolean> const&)) + 258068
    frame vector-im/element-web#9: 0x0000007c07f0a1ac
    frame vector-im/element-web#10: 0x0000007c00511e4c
    frame vector-im/element-web#11: 0x0000007c00664a50
    frame vector-im/element-web#12: 0x0000007c006d208c
    frame vector-im/element-web#13: 0x0000007c006582ec
    frame vector-im/element-web#14: 0x0000007c07ec3bdc
    frame vector-im/element-web#15: 0x0000007c07f59358
    frame vector-im/element-web#16: 0x0000007c07eb4fa8
    frame vector-im/element-web#17: 0x0000007c07e8b308
    frame vector-im/element-web#18: 0x000000011532c14c Electron Framework`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 5356
    frame vector-im/element-web#19: 0x000000011532ca28 Electron Framework`v8::internal::Execution::TryCallScript(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::FixedArray>, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*, bool) + 332
    frame vector-im/element-web#20: 0x000000011532cb94 Electron Framework`v8::internal::Execution::TryCall(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Execution::MessageHandling, v8::internal::MaybeHandle<v8::internal::Object>*, bool) + 200
    frame vector-im/element-web#21: 0x0000000115352d58 Electron Framework`v8::internal::MicrotaskQueue::PerformCheckpointInternal(v8::Isolate*) + 444
    frame vector-im/element-web#22: 0x0000000115231c7c Electron Framework`v8::MicrotasksScope::~MicrotasksScope() + 124
    frame vector-im/element-web#23: 0x0000000117ef2a70 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27655684
    frame vector-im/element-web#24: 0x0000000118da1e20 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1700676
    frame vector-im/element-web#25: 0x0000000118da5034 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1713496
    frame vector-im/element-web#26: 0x000000011841eab4 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 33078856
    frame vector-im/element-web#27: 0x0000000118105be0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29831028
    frame vector-im/element-web#28: 0x000000011810a278 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29849100
    frame vector-im/element-web#29: 0x00000001181098cc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29846624
    frame vector-im/element-web#30: 0x000000011904bfd0 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4494580
    frame vector-im/element-web#31: 0x0000000119052f14 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4523064
    frame vector-im/element-web#32: 0x00000001184bc748 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 33725148
    frame vector-im/element-web#33: 0x0000000116d28c70 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9003012
    frame vector-im/element-web#34: 0x0000000116d42b54 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9109224
    frame vector-im/element-web#35: 0x0000000116d7d058 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9348076
    frame vector-im/element-web#36: 0x0000000116d7c2a0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9344564
    frame vector-im/element-web#37: 0x0000000116d7c844 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9346008
    frame vector-im/element-web#38: 0x00000001af5ea9a0 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 28
    frame vector-im/element-web#39: 0x00000001af5ea8ec CoreFoundation`__CFRunLoopDoSource0 + 208
    frame vector-im/element-web#40: 0x00000001af5ea5ec CoreFoundation`__CFRunLoopDoSources0 + 268
    frame vector-im/element-web#41: 0x00000001af5e8f6c CoreFoundation`__CFRunLoopRun + 828
    frame vector-im/element-web#42: 0x00000001af5e8490 CoreFoundation`CFRunLoopRunSpecific + 600
    frame vector-im/element-web#43: 0x00000001b04c5210 Foundation`-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 236
    frame vector-im/element-web#44: 0x0000000116d7d874 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9350152
    frame vector-im/element-web#45: 0x0000000116d7c324 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9344696
    frame vector-im/element-web#46: 0x0000000116d4339c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9111344
    frame vector-im/element-web#47: 0x0000000116d12ad0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 8912484
    frame vector-im/element-web#48: 0x0000000119e5adf4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 19235608
    frame vector-im/element-web#49: 0x000000011477b05c Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 2444
    frame vector-im/element-web#50: 0x000000011477ba50 Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 4992
    frame vector-im/element-web#51: 0x000000011477a084 Electron Framework`v8::internal::compiler::BasicBlock::set_loop_header(v8::internal::compiler::BasicBlock*) + 10812
    frame vector-im/element-web#52: 0x000000011477a9b8 Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 744
    frame vector-im/element-web#53: 0x00000001144f880c Electron Framework`ElectronMain + 128
    frame vector-im/element-web#54: 0x00000001043909dc Element Nightly Helper (Renderer)`_mh_execute_header + 2524
    frame vector-im/element-web#55: 0x00000001045a1088 dyld`start + 516

etc. However, v8 extensions don't seem to be present - looks like these are provided by llnode, which looks like it's geared up for debugging node rather than the v8 in an Electron (but perhaps it'll work anyway). So either i'm stuck roaming around a native stack without symbols, or perhaps via llnode, or failing that, a debug Electron build.

Meanwhile, I just realised that I could have tested and debugged getting a coredump out of the renderer process just by hitting it with kill -5 🤦🏻.

@ara4n
Copy link
Member Author

ara4n commented May 13, 2022

So i tried with llnode (brew install llnode, and then plugin load /opt/homebrew/opt/llnode/llnode.dylib on lldb run), but v8 bt doesn't give any different info to a normal bt, even when the renderer process has been nicely breakpointed in JS the depths of a big chunky v8 stack:

(lldb) v8 bt
 * thread vector-im/element-web#1: tid = 0x6736, 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8, name = 'CrRendererMain', queue = 'com.apple.main-thread', stop reason = signal SIGTRAP
  * frame #0: 0x00000001af4e08d0 libsystem_kernel.dylib`mach_msg_trap + 8
    frame vector-im/element-web#1: 0x00000001af4e0d40 libsystem_kernel.dylib`mach_msg + 76
    frame vector-im/element-web#2: 0x00000001af5eac1c CoreFoundation`__CFRunLoopServiceMachPort + 372
    frame vector-im/element-web#3: 0x00000001af5e90cc CoreFoundation`__CFRunLoopRun + 1180
    frame vector-im/element-web#4: 0x00000001af5e8490 CoreFoundation`CFRunLoopRunSpecific + 600
    frame vector-im/element-web#5: 0x00000001b04c5210 Foundation`-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 236
    frame vector-im/element-web#6: 0x0000000116d7d874 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9350152
    frame vector-im/element-web#7: 0x0000000116d7c324 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9344696
    frame vector-im/element-web#8: 0x0000000116d433e4 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9111416
    frame vector-im/element-web#9: 0x0000000116d12ad0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 8912484
    frame vector-im/element-web#10: 0x0000000117e57284 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27018776
    frame vector-im/element-web#11: 0x0000000118f7c0a8 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 3642828
    frame vector-im/element-web#12: 0x0000000115a05e64 Electron Framework`v8_inspector::DumpAsyncTaskStacksStateForTest(v8_inspector::V8Inspector*) + 96624
    frame vector-im/element-web#13: 0x0000000115300f7c Electron Framework`v8::internal::Debug::OnDebugBreak(v8::internal::Handle<v8::internal::FixedArray>, v8::internal::StepAction, v8::base::EnumSet<v8::debug::BreakReason, int>) + 648
    frame vector-im/element-web#14: 0x0000000115300744 Electron Framework`v8::internal::Debug::Break(v8::internal::JavaScriptFrame*, v8::internal::Handle<v8::internal::JSFunction>) + 560
    frame vector-im/element-web#15: 0x000000011578566c Electron Framework`v8::internal::RegExp::CompileForTesting(v8::internal::Isolate*, v8::internal::Zone*, v8::internal::RegExpCompileData*, v8::base::Flags<v8::internal::RegExpFlag, int>, v8::internal::Handle<v8::internal::String>, v8::internal::Handle<v8::internal::String>, bool) + 89592
    frame vector-im/element-web#16: 0x0000007c07f0a74c
    frame vector-im/element-web#17: 0x0000007c07f963cc
    frame vector-im/element-web#18: 0x0000007c07e8d758
    frame vector-im/element-web#19: 0x0000007c002ec75c
    frame vector-im/element-web#20: 0x0000007c002f15b4
    frame vector-im/element-web#21: 0x0000007c002eccc0
    frame vector-im/element-web#22: 0x0000007c002edfe8
    frame vector-im/element-web#23: 0x0000007c07e8b42c
    frame vector-im/element-web#24: 0x0000007c07e8b0c8
    frame vector-im/element-web#25: 0x000000011532aed0 Electron Framework`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 624
    frame vector-im/element-web#26: 0x000000011521e31c Electron Framework`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 624
    frame vector-im/element-web#27: 0x0000000117ef2a34 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27655624
    frame vector-im/element-web#28: 0x0000000118da2578 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1702556
    frame vector-im/element-web#29: 0x0000000118da8cb8 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1728988
    frame vector-im/element-web#30: 0x0000000118105be0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29831028
    frame vector-im/element-web#31: 0x000000011810a278 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29849100
    frame vector-im/element-web#32: 0x00000001181098cc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29846624
    frame vector-im/element-web#33: 0x00000001180feddc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29802864
    frame vector-im/element-web#34: 0x00000001180feb70 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29802244
    frame vector-im/element-web#35: 0x0000000118200ffc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 30860176
    frame vector-im/element-web#36: 0x00000001180fdc50 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29798372
    frame vector-im/element-web#37: 0x000000011810bc24 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29855672
    frame vector-im/element-web#38: 0x0000000118179200 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 30303636
    frame vector-im/element-web#39: 0x0000000118190388 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 30398236
    frame vector-im/element-web#40: 0x0000000118192830 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 30407620
    frame vector-im/element-web#41: 0x0000000118192174 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 30405896
    frame vector-im/element-web#42: 0x0000000118500dac Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34005312
    frame vector-im/element-web#43: 0x00000001180ff080 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29803540
    frame vector-im/element-web#44: 0x00000001180feb8c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29802272
    frame vector-im/element-web#45: 0x00000001180fdc50 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29798372
    frame vector-im/element-web#46: 0x0000000118500d48 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34005212
    frame vector-im/element-web#47: 0x0000000118503654 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34015720
    frame vector-im/element-web#48: 0x0000000118503810 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34016164
    frame vector-im/element-web#49: 0x0000000118504050 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34018276
    frame vector-im/element-web#50: 0x00000001180ff080 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29803540
    frame vector-im/element-web#51: 0x00000001180feb8c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29802272
    frame vector-im/element-web#52: 0x00000001180fdc50 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29798372
    frame vector-im/element-web#53: 0x0000000118503c74 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 34017288
    frame vector-im/element-web#54: 0x0000000118882e78 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 37683724
    frame vector-im/element-web#55: 0x00000001183bd154 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 32679144
    frame vector-im/element-web#56: 0x0000000118c41fa4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 259272
    frame vector-im/element-web#57: 0x0000000118c521c4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 325352
    frame vector-im/element-web#58: 0x0000000118c52220 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 325444
    frame vector-im/element-web#59: 0x0000000118c3d5a0 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 240324
    frame vector-im/element-web#60: 0x0000000118c3d8d8 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 241148
    frame vector-im/element-web#61: 0x0000000118c3ced4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 238584
    frame vector-im/element-web#62: 0x0000000116d28c70 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9003012
    frame vector-im/element-web#63: 0x0000000116d42b54 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9109224
    frame vector-im/element-web#64: 0x0000000116d7cdcc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9347424
    frame vector-im/element-web#65: 0x0000000116d7c2a0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9344564
    frame vector-im/element-web#66: 0x0000000116d7c7f4 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9345928
    frame vector-im/element-web#67: 0x00000001af6052c0 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 32
    frame vector-im/element-web#68: 0x00000001af604ea8 CoreFoundation`__CFRunLoopDoTimer + 1064
    frame vector-im/element-web#69: 0x00000001af604930 CoreFoundation`__CFRunLoopDoTimers + 392
    frame vector-im/element-web#70: 0x00000001af5e93c0 CoreFoundation`__CFRunLoopRun + 1936
    frame vector-im/element-web#71: 0x00000001af5e8490 CoreFoundation`CFRunLoopRunSpecific + 600
    frame vector-im/element-web#72: 0x00000001b04c5210 Foundation`-[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 236
    frame vector-im/element-web#73: 0x0000000116d7d874 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9350152
    frame vector-im/element-web#74: 0x0000000116d7c324 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9344696
    frame vector-im/element-web#75: 0x0000000116d4339c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9111344
    frame vector-im/element-web#76: 0x0000000116d12ad0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 8912484
    frame vector-im/element-web#77: 0x0000000119e5adf4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 19235608
    frame vector-im/element-web#78: 0x000000011477b05c Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 2444
    frame vector-im/element-web#79: 0x000000011477ba50 Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 4992
    frame vector-im/element-web#80: 0x000000011477a084 Electron Framework`v8::internal::compiler::BasicBlock::set_loop_header(v8::internal::compiler::BasicBlock*) + 10812
    frame vector-im/element-web#81: 0x000000011477a9b8 Electron Framework`v8::internal::ExternalReference::fast_c_call_caller_pc_address(v8::internal::Isolate*) + 744
    frame vector-im/element-web#82: 0x00000001144f880c Electron Framework`ElectronMain + 128
    frame vector-im/element-web#83: 0x00000001043909dc Element Nightly Helper (Renderer)`_mh_execute_header + 2524
    frame vector-im/element-web#84: 0x00000001045a1088 dyld`start + 516

...which corresponds to:

Screenshot 2022-05-13 at 10 46 35

...in the Chrome debugger.

Without symbols, frame var doesn't pick up anything in the native stack, despite what frame you might select - e.g. frame select 25 looks like a juicy one above. Meanwhile things like v8 findjsobjects or even v8 nodeinfo seems to tightloop and never complete.

EDIT: they do eventually complete after a few minutes:

(lldb) v8 findjsobjects
 Instances  Total Size Name
 ---------- ---------- ----
 ---------- ---------- 
          0          0 

...suggests that llnode is just failing outright.

On the plus side, kill -5 nicely drops the renderer into the lldb debugger, and one can even continue the process afterwards, which is interesting. So one option for moving forwards might be to just let it run, wait for OOM to trigger a signal 5, and then hope that lldb can override the breakpoint and cont execution on the renderer process, and then use the JS debugger (before the renderer totally crashes) to breakpoint and get a stacktrace and debug...

@ara4n
Copy link
Member Author

ara4n commented May 13, 2022

Just in case I can't attach the v8 debugger successfully (because it's busy tightlooping and OOMing to death), i had a quick look around the stacks of the backtrace above, when it's breakpointed normally. I suspect there's a better way to do this with lldb, but for a poor man's frame inspector:

for (0..84) {
	say "frame select $_";
	for (0..8) {
		say 'x "*(uint64_t *)($fp + ' . ($_*8) . ')"';
	}
}

...tries to deference the first 8 addresses on each frame in the stack and hexdump the first 32 bytes, thus sniffing for strings (C-style or UCS2). You run the generated commands in lldb with command source -e false ..., and it gives you stuff like:

(lldb) frame select 84
frame vector-im/element-web#84: 0x00000001045a1088 dyld`start + 516
dyld`start:
->  0x1045a1088 <+516>: mov    x20, x0
    0x1045a108c <+520>: ldr    x8, [x19, #0x8]
    0x1045a1090 <+524>: ldr    w0, [x8, #0x34]
    0x1045a1094 <+528>: mov    x1, #0x0
(lldb) x "*(uint64_t *)($fp + 0)"
error: memory read failed for 0x0
(lldb) x "*(uint64_t *)($fp + 8)"
error: memory read failed for 0x0
(lldb) x "*(uint64_t *)($fp + 16)"
0x104390000: cf fa ed fe 0c 00 00 01 00 00 00 00 02 00 00 00  ................
0x104390010: 11 00 00 00 b8 08 00 00 85 00 a0 00 00 00 00 00  ................
(lldb) x "*(uint64_t *)($fp + 24)"
error: memory read failed for 0x0
(lldb) x "*(uint64_t *)($fp + 32)"
0x16ba6f8e8: 2f 41 70 70 6c 69 63 61 74 69 6f 6e 73 2f 45 6c  /Applications/El
0x16ba6f8f8: 65 6d 65 6e 74 20 4e 69 67 68 74 6c 79 2e 61 70  ement Nightly.ap
(lldb) x "*(uint64_t *)($fp + 40)"
0x16ba6f975: 2d 2d 74 79 70 65 3d 72 65 6e 64 65 72 65 72 00  --type=renderer.
0x16ba6f985: 2d 2d 75 73 65 72 2d 64 61 74 61 2d 64 69 72 3d  --user-data-dir=
(lldb) x "*(uint64_t *)($fp + 48)"
0x16ba6f985: 2d 2d 75 73 65 72 2d 64 61 74 61 2d 64 69 72 3d  --user-data-dir=
0x16ba6f995: 2f 55 73 65 72 73 2f 6d 61 74 74 68 65 77 2f 4c  /Users/matthew/L
(lldb) x "*(uint64_t *)($fp + 56)"
0x16ba6f9d0: 2d 2d 73 74 61 6e 64 61 72 64 2d 73 63 68 65 6d  --standard-schem
0x16ba6f9e0: 65 73 3d 76 65 63 74 6f 72 00 2d 2d 65 6e 61 62  es=vector.--enab
(lldb) x "*(uint64_t *)($fp + 64)"
0x16ba6f9ea: 2d 2d 65 6e 61 62 6c 65 2d 73 61 6e 64 62 6f 78  --enable-sandbox
0x16ba6f9fa: 00 2d 2d 73 65 63 75 72 65 2d 73 63 68 65 6d 65  .--secure-scheme

for the initial frame, complete with argv[] on the stack - or for a more interesting frame like 25, you get:

(lldb) frame select 25
frame vector-im/element-web#25: 0x000000011532aed0 Electron Framework`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 624
Electron Framework`v8::internal::Execution::Call:
->  0x11532aed0 <+624>: mov    x21, x0
    0x11532aed4 <+628>: ldr    x0, [sp, #0x90]
    0x11532aed8 <+632>: cbnz   x0, 0x11532b560           ; <+2304>
    0x11532aedc <+636>: add    x0, sp, #0x70
(lldb) x "*(uint64_t *)($fp + 0)"
0x16ba6c660: c0 c7 a6 6b 01 00 00 00 34 2a ef 17 01 00 00 00  ...k....4*......
0x16ba6c670: c8 c7 a6 6b 01 00 00 00 00 00 00 00 00 00 00 00  ...k............
(lldb) x "*(uint64_t *)($fp + 8)"
0x11521e31c: 80 10 00 b4 f5 03 00 aa 88 02 40 b9 69 7a 41 b9  ..........@.izA.
0x11521e32c: 1f 01 09 6b e1 24 00 54 a8 02 40 f9 88 02 00 f9  ...k.$.T..@.....
(lldb) x "*(uint64_t *)($fp + 16)"
0x60000885dc80: 70 00 6f 00 73 00 74 00 4d 00 65 00 73 00 73 00  p.o.s.t.M.e.s.s.
0x60000885dc90: 61 00 67 00 65 00 00 00 61 00 74 00 65 00 00 00  a.g.e...a.t.e...
(lldb) x "*(uint64_t *)($fp + 24)"
0x138008000: 00 00 00 00 7c 00 00 00 00 80 00 38 01 00 00 00  ....|......8....
0x138008010: 30 60 99 6b 01 00 00 00 30 60 99 6b 01 00 00 00  0`.k....0`.k....
(lldb) x "*(uint64_t *)($fp + 32)"
0x16ba6c568: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x16ba6c578: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
(lldb) x "*(uint64_t *)($fp + 40)"
0x16ba6c870: 88 92 84 33 01 00 00 00 85 00 e6 d0 7e 9b 9d 92  ...3........~...
0x16ba6c880: 00 00 00 00 00 00 00 00 30 92 84 33 01 00 00 00  ........0..3....
(lldb) x "*(uint64_t *)($fp + 48)"
0x1338492a0: 8d 43 20 08 7c 00 00 00 8d 43 20 08 7c 00 00 00  .C .|....C .|...
0x1338492b0: 00 00 00 00 00 00 00 00 29 c7 57 4c 7c 00 00 00  ........).WL|...
(lldb) x "*(uint64_t *)($fp + 56)"
0x13384b1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x13384b200: 58 f1 00 33 01 00 00 00 00 00 00 00 00 00 00 00  X..3............
(lldb) x "*(uint64_t *)($fp + 64)"
error: memory read failed for 0x0

...and so you can see that this looks likely to be an attempt to call postMessage (in UCS2). Perhaps this will be enough to guess what the v8 stack was when it next crashes?

@ara4n
Copy link
Member Author

ara4n commented May 14, 2022

(of course, it didn't crash overnight this time)

@ara4n
Copy link
Member Author

ara4n commented May 14, 2022

Other postmortem v8 debuggers are https://github.com/tjfontaine/lldb-v8 (python 2, doesn't work after porting it to python 3, thanks to RuntimeError: This resolves the SBAddress using the SBTarget from lldb.target so this property can ONLY be used in the interactive script interpreter (i.e. under the lldb script command). For things like Python based commands and breakpoint callbacks use GetLoadAddress instead. - and the venerable mdb v8 (https://bryce.is/writing/code/2016/02/06/getting-started-with-mdb_v8.html) which I can't face installing solaris to use.

Playing around with a corefile (via gcore) i am failing to get much of a useful v8 stack trace, and at this rate i'm going to end up having to port llnode to work with whatever v8 sits inside Chromium these days. So instead, might be easier to just run under a debug electron build, so trying that instead.

@ara4n
Copy link
Member Author

ara4n commented May 15, 2022

So, i caught in the act this morning, running under lldb (not using the debug electron i built yesterday). Turns out my poor man's trick of "look for string references on the stack" might be good enough for at least pointing at the problematic part of the world. Irritatingly, neither gcore or process save-core will save a core of the process while it's breakpointed in lldb, and it turns out I can't cont in as it just re-throws the breakpoint signal. Therefore, i'm stuck debugging the breakpointed process.

So, the backtrace of the exploding thread is:

* thread vector-im/element-web#16, name = 'DedicatedWorker thread'
  * frame #0: 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame vector-im/element-web#1: 0x00000001af51e83c libsystem_pthread.dylib`_pthread_cond_wait + 1236
    frame vector-im/element-web#2: 0x0000000116d7627c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9319952
    frame vector-im/element-web#3: 0x0000000117ed4194 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27530536
    frame vector-im/element-web#4: 0x0000000117ed42d0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27530852
    frame vector-im/element-web#5: 0x0000000117ed4ac0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27532884
    frame vector-im/element-web#6: 0x00000001154703f4 Electron Framework`v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 192980
    frame vector-im/element-web#7: 0x00000001153bd54c Electron Framework`v8::internal::Heap::AllocatedExternalMemorySinceMarkCompact() + 7720
    frame vector-im/element-web#8: 0x0000000115386a30 Electron Framework`v8::internal::FactoryBase<v8::internal::Factory>::NewDescriptorArray(int, int, v8::internal::AllocationType) + 64
    frame vector-im/element-web#9: 0x0000000115659868 Electron Framework`v8::internal::JSArray::SetLength(v8::internal::Handle<v8::internal::JSArray>, unsigned int) + 3552
    frame vector-im/element-web#10: 0x000000011562fd60 Electron Framework`v8::internal::Map::TransitionToDataProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::PropertyConstness, v8::internal::StoreOrigin) + 3968
    frame vector-im/element-web#11: 0x0000000115619c0c Electron Framework`v8::internal::LookupIterator::PrepareTransitionToDataProperty(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::StoreOrigin) + 184
    frame vector-im/element-web#12: 0x0000000115656d90 Electron Framework`v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) + 13160
    frame vector-im/element-web#13: 0x00000001155ebff0 Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 12400
    frame vector-im/element-web#14: 0x00000001155ebbec Electron Framework`v8::internal::JSReceiver::DeletePropertyOrElement(v8::internal::Handle<v8::internal::JSReceiver>, v8::internal::Handle<v8::internal::Name>, v8::internal::LanguageMode) + 11372
    frame vector-im/element-web#15: 0x00000001156a47f4 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 38048
    frame vector-im/element-web#16: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#17: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#18: 0x00000001156a4484 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37168
    frame vector-im/element-web#19: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#20: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#21: 0x00000001156a479c Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37960
    frame vector-im/element-web#22: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#23: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#24: 0x00000001156a4484 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37168
    frame vector-im/element-web#25: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#26: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#27: 0x00000001156a2510 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 29116
    frame vector-im/element-web#28: 0x00000001156a14a8 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 24916
    frame vector-im/element-web#29: 0x00000001156a4484 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37168
    frame vector-im/element-web#30: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#31: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#32: 0x00000001156a4484 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37168
    frame vector-im/element-web#33: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#34: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#35: 0x00000001156a479c Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37960
    frame vector-im/element-web#36: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#37: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#38: 0x00000001156a4484 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37168
    frame vector-im/element-web#39: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#40: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#41: 0x00000001156a479c Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37960
    frame vector-im/element-web#42: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
    frame vector-im/element-web#43: 0x00000001156a17dc Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 25736
    frame vector-im/element-web#44: 0x0000000115210784 Electron Framework`v8::ValueDeserializer::ReadValue(v8::Local<v8::Context>) + 256
    frame vector-im/element-web#45: 0x0000000117f042f8 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27727500
    frame vector-im/element-web#46: 0x0000000119e2bf64 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 19043464
    frame vector-im/element-web#47: 0x0000000119046970 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4472468
    frame vector-im/element-web#48: 0x000000011904c2d0 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4495348
    frame vector-im/element-web#49: 0x0000000119060ba8 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4579532
    frame vector-im/element-web#50: 0x0000000119060658 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4578172
    frame vector-im/element-web#51: 0x00000001190606b4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4578264
    frame vector-im/element-web#52: 0x000000011904a9f4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4488984
    frame vector-im/element-web#53: 0x00000001152703dc Electron Framework`v8::internal::Accessors::MakeAccessor(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Name>, void (*)(v8::Local<v8::Name>, v8::PropertyCallbackInfo<v8::Value> const&), void (*)(v8::Local<v8::Name>, v8::Local<v8::Value>, v8::PropertyCallbackInfo<v8::Boolean> const&)) + 12288
    frame vector-im/element-web#54: 0x0000007c07f0a1ac
    frame vector-im/element-web#55: 0x0000007c07e8d758
    frame vector-im/element-web#56: 0x0000007c07e8d758
    frame vector-im/element-web#57: 0x0000007c07e8d758
    frame vector-im/element-web#58: 0x0000007c0019dc14
    frame vector-im/element-web#59: 0x0000007c0019b46c
    frame vector-im/element-web#60: 0x0000007c07e8b42c
    frame vector-im/element-web#61: 0x0000007c07e8b0c8
    frame vector-im/element-web#62: 0x000000011532aed0 Electron Framework`v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) + 624
    frame vector-im/element-web#63: 0x000000011521e31c Electron Framework`v8::Function::Call(v8::Local<v8::Context>, v8::Local<v8::Value>, int, v8::Local<v8::Value>*) + 624
    frame vector-im/element-web#64: 0x0000000117ef2a34 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 27655624
    frame vector-im/element-web#65: 0x0000000118da1e20 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1700676
    frame vector-im/element-web#66: 0x0000000118da5034 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 1713496
    frame vector-im/element-web#67: 0x000000011841eab4 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 33078856
    frame vector-im/element-web#68: 0x0000000118105be0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29831028
    frame vector-im/element-web#69: 0x000000011810a278 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29849100
    frame vector-im/element-web#70: 0x0000000118109420 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 29845428
    frame vector-im/element-web#71: 0x0000000118a361dc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 39466352
    frame vector-im/element-web#72: 0x0000000118a2f2cc Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 39437920
    frame vector-im/element-web#73: 0x0000000118a2ed18 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 39436460
    frame vector-im/element-web#74: 0x0000000116d28c70 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9003012
    frame vector-im/element-web#75: 0x0000000116d42b54 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9109224
    frame vector-im/element-web#76: 0x0000000116cf502c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 8790976
    frame vector-im/element-web#77: 0x0000000116d4339c Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9111344
    frame vector-im/element-web#78: 0x0000000116d12ad0 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 8912484
    frame vector-im/element-web#79: 0x0000000115f42a74 Electron Framework`v8::internal::SetupIsolateDelegate::SetupHeap(v8::internal::Heap*) + 2473788
    frame vector-im/element-web#80: 0x0000000116d76d80 Electron Framework`node::GetEnvironmentIsolateData(node::Environment*) + 9322772

...and so I hit it with my poor-man's stack inspector, to get the first 32 bytes of each of the first 16 addresses on the stack for each stack frame:

for (0..81) {
	say "frame select $_";
	for (0..16) {
		say 'x "*(uint64_t *)($fp + ' . ($_*8) . ')"';
	}
}

I'm not going to post the whole dump as it's massive and probably contains quite a lot of personal data and random collateral heap contents, but it's enough to get an idea of what's going on.

The bottommost frame with relevant human-readable stringy data is frame 58:

(lldb) frame select 58
frame vector-im/element-web#58: 0x0000007c0019dc14
...
(lldb) x "*(uint64_t *)($fp + 24)"
0x7cce63ff85: 2d 94 0a 31 22 00 08 31 22 00 08 00 00 00 00 ad  -..1"..1".......
0x7cce63ff95: 25 00 08 03 00 00 00 0e 00 00 00 73 79 6e 63 54  %..........syncT
...
(lldb) x "*(uint64_t *)($fp + 64)"
0x7cce63ff95: 25 00 08 03 00 00 00 0e 00 00 00 73 79 6e 63 54  %..........syncT
0x7cce63ffa5: 6f 44 61 74 61 62 61 73 65 00 00 41 44 94 0a 06  oDatabase..AD...
  • interesting that there's some kind of syncToDatabase function name hanging around the stack. Moving up the stack:
(lldb) frame select 57
frame vector-im/element-web#57: 0x0000007c07e8d758
...
(lldb) x "*(uint64_t *)($fp + 24)"
0x7c4e4e72a9: 25 00 08 03 00 00 00 7f 00 00 00 6d 32 39 34 32  %..........m2942
0x7c4e4e72b9: 34 34 36 32 39 36 7e 32 2e 32 39 34 32 34 34 36  446296~2.2942446

...oh hello, that looks like like a sync token...

(lldb) frame select 56
frame vector-im/element-web#56: 0x0000007c07e8d758
...
(lldb) x "*(uint64_t *)($fp + 24)"
0x7cd1dbffdd: 22 94 0a 31 22 00 08 31 22 00 08 fd a3 7e 0a 35  "..1"..1"....~.5
0x7cd1dbffed: ff a7 d1 a5 79 0a 3a 21 8e 14 00 f1 22 00 08 02  ....y.:!...."...
(lldb) x "*(uint64_t *)($fp + 32)"
0x7c4e4e72a9: 25 00 08 03 00 00 00 7f 00 00 00 6d 32 39 34 32  %..........m2942
0x7c4e4e72b9: 34 34 36 32 39 36 7e 32 2e 32 39 34 32 34 34 36  446296~2.2942446
(lldb) x "*(uint64_t *)($fp + 40)"
0x7c3a0a7875: 22 00 08 ea c0 b8 92 1a 00 00 00 50 65 72 73 69  "..........Persi
0x7c3a0a7885: 73 74 69 6e 67 20 73 79 6e 63 20 64 61 74 61 20  sting sync data 

Ah ha! That's both a a sync token, and also the "Persisting sync data" log line from indexeddb-local-backend. Moving on up...

(lldb) frame select 51
frame vector-im/element-web#51: 0x00000001190606b4 Electron Framework`node::CommonEnvironmentSetup::event_loop() const + 4578264
Electron Framework`node::CommonEnvironmentSetup::event_loop:
...
(lldb) x "*(uint64_t *)($fp + 88)"
0x11a6fbb98: 49 44 42 4f 62 6a 65 63 74 53 74 6f 72 65 00 00  IDBObjectStore..
0x11a6fbba8: 69 6e 64 65 78 4e 61 6d 65 73 00 00 00 00 00 00  indexNames......
(lldb) x "*(uint64_t *)($fp + 96)"
0x11a3443e8: 70 75 74 00 00 00 00 00 00 00 00 00 00 00 00 00  put.............
0x11a3443f8: 73 74 61 72 74 20 74 61 62 6c 65 20 6d 6f 72 78  start table morx

...right that looks like it could be doing an IndexedDB 'put' operation inside a general v8/node event loop...

(lldb) frame select 44
frame vector-im/element-web#44: 0x0000000115210784 Electron Framework`v8::ValueDeserializer::ReadValue(v8::Local<v8::Context>) + 256
Electron Framework`v8::ValueDeserializer::ReadValue:
...
(lldb) x "*(uint64_t *)($fp + 24)"
0x7031010000: ff 14 ff 0d 6f 22 07 63 6c 6f 62 62 65 72 22 01  ....o".clobber".
0x7031010010: 2d 22 09 6e 65 78 74 42 61 74 63 68 22 7f 6d 32  -".nextBatch".m2

Oh, that's really interesting - that looks like the parameters to the store.put() in persistSyncData() here. Moving on up...

(lldb) frame select 39
frame vector-im/element-web#39: 0x00000001156a1e64 Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 27408
Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo:
...
(lldb) x "*(uint64_t *)($fp + 32)"
0x703101009f: 72 6f 6f 6d 73 44 61 74 61 6f 22 04 6a 6f 69 6e  roomsDatao".join
0x70310100af: 6f 63 3c 21 00 79 00 56 00 69 00 63 00 4a 00 55  oc<!.y.V.i.c.J.U

Okay, so that looks like it could be roomsData param passed into persistSyncData - starting with a join block and then some keys which are presumably UCS2 room IDs (!yVicJU...). Then, as you go up the stack, it looks like it keeps drilling further and further into this data structure (although it's unclear if it's reading or writing), in all of the HasIntegrityLevelTransitionTo frames (presumably pure v8 frames), for instance:

frame 38
(lldb) x "*(uint64_t *)($fp + 72)"
0x70310100a9: 22 04 6a 6f 69 6e 6f 63 3c 21 00 79 00 56 00 69  ".joinoc<!.y.V.i
0x70310100b9: 00 63 00 4a 00 55 00 69 00 44 00 71 00 46 00 55  .c.J.U.i.D.q.F.U

frame 35
(lldb) x "*(uint64_t *)($fp + 72)"
0x70310100b0: 63 3c 21 00 79 00 56 00 69 00 63 00 4a 00 55 00  c<!.y.V.i.c.J.U.
0x70310100c0: 69 00 44 00 71 00 46 00 55 00 4f 00 5a 00 61 00  i.D.q.F.U.O.Z.a.

frame 33
(lldb) x "*(uint64_t *)($fp + 32)"
0x7037123b17: 21 44 7a 6e 74 6c 79 46 71 44 77 59 5a 53 71 6f  !DzntlyFqDwYZSqo
0x7037123b27: 59 42 56 3a 6d 61 75 6e 69 75 6d 2e 6e 65 74 6f  YBV:maunium.neto

frame 32
(lldb) x "*(uint64_t *)($fp + 72)"
0x7037123b37: 22 09 65 70 68 65 6d 65 72 61 6c 6f 22 06 65 76  ".ephemeralo".ev
0x7037123b47: 65 6e 74 73 41 01 6f 22 04 74 79 70 65 22 09 6d  entsA.o".type".m

frame 29
(lldb) x "*(uint64_t *)($fp + 72)"
0x7037123b43: 22 06 65 76 65 6e 74 73 41 01 6f 22 04 74 79 70  ".eventsA.o".typ
0x7037123b53: 65 22 09 6d 2e 72 65 63 65 69 70 74 22 07 72 6f  e".m.receipt".ro

frame 26
(lldb) x "*(uint64_t *)($fp + 104)"
0x7037123b4c: 01 6f 22 04 74 79 70 65 22 09 6d 2e 72 65 63 65  .o".type".m.rece
0x7037123b5c: 69 70 74 22 07 72 6f 6f 6d 5f 69 64 22 1f 21 44  ipt".room_id".!D

frame 24
(lldb) x "*(uint64_t *)($fp + 72)"
0x7037123b4e: 22 04 74 79 70 65 22 09 6d 2e 72 65 63 65 69 70  ".type".m.receip
0x7037123b5e: 74 22 07 72 6f 6f 6d 5f 69 64 22 1f 21 44 7a 6e  t".room_id".!Dzn

frame 21
(lldb) x "*(uint64_t *)($fp + 72)"
0x7037123b93: 22 2c 24 4e 76 74 54 49 4d 73 41 4d 61 6d 58 31  ",$NvtTIMsAMamX1
0x7037123ba3: 48 6d 61 45 45 5f 63 4f 36 53 34 45 75 6b 76 4d  HmaEE_cO6S4EukvM

frame 19
(lldb) x "*(uint64_t *)($fp + 32)"
0x703714f954: 24 72 74 38 73 58 79 4f 6a 53 79 68 30 71 65 75  $rt8sXyOjSyh0qeu
0x703714f964: 55 47 66 32 47 34 66 5f 58 72 4f 34 4c 67 58 4b  UGf2G4f_XrO4LgXK

frame 18
(lldb) x "*(uint64_t *)($fp + 72)"
0x703714f981: 22 06 6d 2e 72 65 61 64 6f 22 14 40 6e 6f 6e 61  ".m.reado".@nona
0x703714f991: 6d 65 31 34 3a 6d 61 74 72 69 78 2e 6f 72 67 6f  me14:matrix.orgo

frame 15
(lldb) x "*(uint64_t *)($fp + 72)"
0x703714f98a: 22 14 40 6e 6f 6e 61 6d 65 31 34 3a 6d 61 74 72  ".@noname14:matr
0x703714f99a: 69 78 2e 6f 72 67 6f 22 02 74 73 4e 00 80 0f 8b  ix.orgo".tsN....

...and then it actually comes out trying to set a property on an object:

frame vector-im/element-web#12: 0x0000000115656d90 Electron Framework`v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::StoreOrigin, v8::Maybe<v8::internal::ShouldThrow>) + 13160
(lldb) x "*(uint64_t *)($fp + 72)"
0x7c2e9fff39: 9e 3d 6e 0e fc 9d 8e 20 00 00 00 8d e4 27 30 a1  .=n.... .....'0.
0x7c2e9fff49: 22 00 08 38 00 00 00 78 2e 66 61 6d 69 6c 69 65  "..8...x.familie

...and finally it OOMs while trying to do a minor GC whilst allocating heap in order to set the property:

(lldb) frame select 6
frame vector-im/element-web#6: 0x00000001154703f4 Electron Framework`v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 192980
...
(lldb) x "*(uint64_t *)($fp + 72)"
0x11a441f28: 4d 69 6e 6f 72 47 43 00 00 00 00 00 00 00 00 00  MinorGC.........
0x11a441f38: 5b 49 6e 63 72 65 6d 65 6e 74 61 6c 4d 61 72 6b  [IncrementalMark

So, this looks anecdotally an awful lot like it's trying to clone the sync accmuulator response while trying to store it in indexeddb. There's nothing very special about any of the room/user/event IDs flying around the stack; it looks to literally have been caught in the act walking over a sync response while iterating over the room read receipts in the ephemeral data for the mautrix whatsapp room, and finally exploded trying to allocate heap for one of the components of the read receipts.

To dig a bit more, i re-ran the stack sniffer but grabbing more data (512 bytes), looking only at offset $fp+72, given empirically that's where most of the strings seem to be (especially for the v8 interpreted frames). I was particularly interested in grabbing the sync token that it was trying to persist in nextBatch:

for (0..81) {
	say "frame select $_";
	say 'x -c 512 "*(uint64_t *)($fp + 72)"';
}

This produced much juicier stuff like:

(lldb) frame select 41
frame vector-im/element-web#41: 0x00000001156a479c Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo(v8::internal::Map, v8::internal::Symbol*, v8::internal::PropertyAttributes*) + 37960
Electron Framework`v8::internal::TransitionsAccessor::HasIntegrityLevelTransitionTo:
->  0x1156a479c <+37960>: cbz    x0, 0x1156a4828           ; <+38100>
    0x1156a47a0 <+37964>: mov    x22, x0
    0x1156a47a4 <+37968>: ldr    x1, [x19]
    0x1156a47a8 <+37972>: add    x0, sp, #0x78
(lldb) x -c 512 "*(uint64_t *)($fp + 72)"
0x7031010005: 22 07 63 6c 6f 62 62 65 72 22 01 2d 22 09 6e 65  ".clobber".-".ne
0x7031010015: 78 74 42 61 74 63 68 22 7f 6d 32 39 34 32 34 34  xtBatch".m294244
0x7031010025: 36 32 39 36 7e 32 2e 32 39 34 32 34 34 36 33 33  6296~2.294244633
0x7031010035: 36 7e 31 2e 32 39 34 32 34 34 36 33 33 36 7e 33  6~1.2942446336~3
0x7031010045: 2e 32 39 34 32 34 34 36 33 33 36 5f 37 35 37 32  .2942446336_7572
0x7031010055: 38 34 39 37 34 5f 31 36 38 33 30 39 30 30 5f 31  84974_16830900_1
0x7031010065: 33 35 38 33 39 38 37 32 37 5f 31 33 37 34 30 36  358398727_137406
0x7031010075: 38 30 37 36 5f 33 32 35 32 32 33 37 5f 34 38 37  8076_3252237_487
0x7031010085: 36 34 32 31 35 39 5f 33 36 36 31 31 39 33 36 35  642159_366119365
0x7031010095: 31 5f 32 31 32 34 31 34 22 09 72 6f 6f 6d 73 44  1_212414".roomsD
0x70310100a5: 61 74 61 6f 22 04 6a 6f 69 6e 6f 63 3c 21 00 79  atao".joinoc<!.y
0x70310100b5: 00 56 00 69 00 63 00 4a 00 55 00 69 00 44 00 71  .V.i.c.J.U.i.D.q
0x70310100c5: 00 46 00 55 00 4f 00 5a 00 61 00 59 00 6e 00 4d  .F.U.O.Z.a.Y.n.M
0x70310100d5: 00 67 00 3a 00 6d 00 61 00 74 00 72 00 69 00 78  .g.:.m.a.t.r.i.x
0x70310100e5: 00 2e 00 6f 00 72 00 67 00 6f 22 09 65 70 68 65  ...o.r.g.o".ephe
0x70310100f5: 6d 65 72 61 6c 6f 22 06 65 76 65 6e 74 73 41 01  meralo".eventsA.
0x7031010105: 6f 22 04 74 79 70 65 22 09 6d 2e 72 65 63 65 69  o".type".m.recei
0x7031010115: 70 74 22 07 72 6f 6f 6d 5f 69 64 63 3c 21 00 79  pt".room_idc<!.y
0x7031010125: 00 56 00 69 00 63 00 4a 00 55 00 69 00 44 00 71  .V.i.c.J.U.i.D.q
0x7031010135: 00 46 00 55 00 4f 00 5a 00 61 00 59 00 6e 00 4d  .F.U.O.Z.a.Y.n.M
0x7031010145: 00 67 00 3a 00 6d 00 61 00 74 00 72 00 69 00 78  .g.:.m.a.t.r.i.x
0x7031010155: 00 2e 00 6f 00 72 00 67 00 22 07 63 6f 6e 74 65  ...o.r.g.".conte
0x7031010165: 6e 74 6f 22 2c 24 72 34 30 53 58 79 4f 4a 58 6a  nto",$r40SXyOJXj
0x7031010175: 64 4b 39 58 62 75 52 77 2d 5a 46 6e 30 70 56 45  dK9XbuRw-ZFn0pVE
0x7031010185: 49 6c 4e 43 61 72 42 6f 30 77 56 35 79 6f 61 54  IlNCarBo0wV5yoaT
0x7031010195: 4d 6f 22 06 6d 2e 72 65 61 64 6f 22 13 40 6d 61  Mo".m.reado".@ma
0x70310101a5: 74 74 68 65 77 3a 6d 61 74 72 69 78 2e 6f 72 67  tthew:matrix.org
0x70310101b5: 6f 22 02 74 73 4e 00 d0 5d 2e ce 01 78 42 22 19  o".tsN..]...xB".
0x70310101c5: 6f 72 67 2e 6d 61 74 72 69 78 2e 6d 73 63 32 32  org.matrix.msc22
0x70310101d5: 38 35 2e 68 69 64 64 65 6e 46 7b 02 7b 01 7b 01  85.hiddenF{.{.{.
0x70310101e5: 22 2c 24 75 4c 30 64 5f 4b 33 4c 45 54 59 44 53  ",$uL0d_K3LETYDS
0x70310101f5: 64 34 47 64 32 6d 42 65 6c 48 6a 68 32 51 62 32  d4Gd2mBelHjh2Qb2

...and so we can see that it died while trying to persist sync token: m2942446296~2.2942446336~1.2942446336~3.2942446336_757284974_16830900_1358398727_1374068076_3252237_487642159_3661193651_212414.

Now, looking at the server logs for my account:

2022-05-15 05:04:56,155 - synapse.access.http.21126 - 448 - INFO - GET-7874601 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 2.286sec/0.013sec (1.293sec, 0.122sec) (2.356sec/5.976sec/379) 129629B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=0&since=s2942329283_757284974_16691904_1358345931_1373997043_3252073_487619632_3660822395_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [88 dbevts]
2022-05-15 05:05:02,659 - synapse.access.http.21126 - 448 - INFO - GET-7874634 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 4.931sec/0.001sec (0.112sec, 0.004sec) (0.011sec/0.112sec/11) 1232B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=30000&since=s2942387211_757284974_16763373_1358373833_1374033078_3252155_487631726_3660899184_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [2 dbevts]
2022-05-15 05:36:26,061 - synapse.access.http.21126 - 448 - INFO - GET-7896468 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 2.233sec/0.009sec (0.974sec, 0.156sec) (2.839sec/5.804sec/377) 106825B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=0&since=s2942387211_757284974_16763373_1358373833_1374033078_3252155_487631726_3660899184_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [97 dbevts]
2022-05-15 05:36:27,820 - synapse.access.http.21126 - 448 - INFO - GET-7896517 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 0.098sec/0.001sec (0.060sec, 0.000sec) (0.003sec/0.023sec/4) 936B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=30000&since=s2942427005_757284974_16809540_1358390961_1374056452_3252178_487637643_3661100463_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [1 dbevts]
2022-05-15 05:53:06,191 - synapse.access.http.21126 - 448 - INFO - GET-7908114 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 1.977sec/0.006sec (1.074sec, 0.104sec) (2.007sec/4.567sec/211) 83602B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=30000&since=m2942427127~36.2942427132~2.2942427132~3.2942427132_757284974_16809640_1358390992_1374056488_3252178_487637660_3661100464_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [81 dbevts]
2022-05-15 06:09:18,752 - synapse.access.http.21126 - 448 - INFO - GET-7919334 - x.x.x.x - 21126 - {@matthew:matrix.org} Processed request: 1.616sec/0.013sec (0.940sec, 0.114sec) (1.876sec/4.237sec/340) 83424B 200 "GET /_matrix/client/r0/sync?filter=139&timeout=30000&since=m2942446296~2.2942446336~1.2942446336~3.2942446336_757284974_16830900_1358398727_1374068076_3252237_487642159_3661193651_212414 HTTP/1.1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) ElementNightly/2022051301 Chrome/98.0.4758.141 Electron/17.4.2 Safari/537.36" [98 dbevts]

...are the last incremental syncs before it exploded, and it's somewhat interesting that the final sync request was from this access token. Meanwhile the console logging in the app in this timeframe was (with some irrelevant reqs removed):

05:47:17.038 rageshake.ts:72 MatrixClient sync state => RECONNECTING
06:04:56.309 rageshake.ts:72 Request to reset timeline in room  !MgHWuNJsOwvwVVVJLQ:matrix.org  viewing: !yHWhpxlXVaLcsgDUKb:matrix.org
06:04:56.314 rageshake.ts:72 Marking device list outdated for @neb_alertmanager:matrix.org
06:04:56.316 rageshake.ts:72 Starting key download for ['@neb_alertmanager:matrix.org']
06:04:56.321 rageshake.ts:72 MatrixClient sync state => SYNCING
06:04:56.736 rageshake.ts:72 Saving device tracking data s2942387211_757284974_16763373_1358373833_1374033078_3252155_487631726_3660899184_212414
06:04:57.566 rageshake.ts:72 got device keys for @neb_alertmanager:matrix.org: {YIOURJLAFS: {…}}
06:04:57.566 rageshake.ts:72 got cross-signing keys for @neb_alertmanager:matrix.org: {master: undefined, self_signing: undefined, user_signing: undefined}
06:04:57.567 rageshake.ts:72 Completed key download for @neb_alertmanager:matrix.org
06:04:57.567 rageshake.ts:72 Device list for @neb_alertmanager:matrix.org now up to date
06:04:58.547 logger.ts:52 Persisting sync data up to s2942387211_757284974_16763373_1358373833_1374033078_3252155_487631726_3660899184_212414
06:04:59.275 rageshake.ts:72 Saving device tracking data s2942387211_757284974_16763373_1358373833_1374033078_3252155_487631726_3660899184_212414
06:20:23.465 rageshake.ts:72 /sync error ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response
06:20:23.465 rageshake.ts:72 ORG.MATRIX.JSSDK_TIMEOUT: Locally timed out waiting for a response
06:20:23.465 rageshake.ts:72 Number of consecutive failed sync requests: 1
06:20:23.465 rageshake.ts:72 Starting keep-alive
06:20:23.469 rageshake.ts:72 MatrixClient sync state => RECONNECTING
06:36:26.100 rageshake.ts:72 Request to reset timeline in room  !MgHWuNJsOwvwVVVJLQ:matrix.org  viewing: !yHWhpxlXVaLcsgDUKb:matrix.org
06:36:26.105 rageshake.ts:72 Marking device list outdated for @neb_alertmanager:matrix.org
06:36:26.107 rageshake.ts:72 Starting key download for ['@neb_alertmanager:matrix.org']
06:36:26.110 rageshake.ts:72 MatrixClient sync state => SYNCING
06:36:26.581 rageshake.ts:72 Saving device tracking data s2942427005_757284974_16809540_1358390961_1374056452_3252178_487637643_3661100463_212414
06:36:27.438 rageshake.ts:72 got device keys for @neb_alertmanager:matrix.org: {YIOURJLAFS: {…}}
06:36:27.438 rageshake.ts:72 got cross-signing keys for @neb_alertmanager:matrix.org: {master: undefined, self_signing: undefined, user_signing: undefined}
06:36:27.438 rageshake.ts:72 Completed key download for @neb_alertmanager:matrix.org
06:36:27.438 rageshake.ts:72 Device list for @neb_alertmanager:matrix.org now up to date
06:36:28.337 logger.ts:52 Persisting sync data up to s2942427005_757284974_16809540_1358390961_1374056452_3252178_487637643_3661100463_212414
06:36:28.840 rageshake.ts:72 Saving device tracking data s2942427005_757284974_16809540_1358390961_1374056452_3252178_487637643_3661100463_212414
06:53:04.745 logger.ts:52 Persisting sync data up to m2942427127~36.2942427132~2.2942427132~3.2942427132_757284974_16809640_1358390992_1374056488_3252178_487637660_3661100464_212414
07:09:15.563 rageshake.ts:72 Marking device list outdated for @travis:t2l.io
07:09:15.566 rageshake.ts:72 Starting key download for ['@travis:t2l.io']
07:09:15.898 rageshake.ts:72 Saving device tracking data m2942446296~2.2942446336~1.2942446336~3.2942446336_757284974_16830900_1358398727_1374068076_3252237_487642159_3661193651_212414
07:09:17.546 rageshake.ts:72 got device keys for @travis:t2l.io: {DPJHLOXMRV: {…}, PQTUMTVHCC: {…}, QOYKSAYWSN: {…}, ZBYDDKFEBW: {…}}
07:09:17.546 rageshake.ts:72 got cross-signing keys for @travis:t2l.io: {master: {…}, self_signing: {…}, user_signing: undefined}
07:09:17.547 rageshake.ts:72 Completed key download for @travis:t2l.io
07:09:17.547 rageshake.ts:72 Device list for @travis:t2l.io now up to date
07:09:17.753 logger.ts:52 Persisting sync data up to m2942446296~2.2942446336~1.2942446336~3.2942446336_757284974_16830900_1358398727_1374068076_3252237_487642159_3661193651_212414
07:09:18.696 rageshake.ts:72 Saving device tracking data m2942446296~2.2942446336~1.2942446336~3.2942446336_757284974_16830900_1358398727_1374068076_3252237_487642159_3661193651_212414

...which matches the idea that it exploded while trying to persist the sync data.

So, i think next steps are:

  • Put more logging around sync accumulator persistence - e.g. when do the persists complete? If we have multiple persists stacking up due to the laptop only running intermittently while asleep, then that could explain why we run out of heap. For instance, could the persist from 06:53 still be running when our one from 07:09 runs? If so, does it need a lock?
  • Possibly pick through the sync tokens to try to infer whether they are stacking up somehow, correlating timings of reqs on the server to persists on the client

The sync accumulator approach is horrid anyway and about to be killed with sliding sync, so this problem might go away - but there is a chance that persistence exploding is just a side-effect of a memory leak somewhere else (given persistsync is always going to spike RAM by a few hundred MB), so it's worth investigating this a little more to get to the bottom of it.

@ara4n
Copy link
Member Author

ara4n commented May 15, 2022

Hmm. indexeddb.ts has:

    private reallySave = this.degradable((): Promise<void> => {
        this.syncTs = Date.now(); // set now to guard against multi-writes

...which is all very well, but i don't think that it guards against multi-writes - it looks like it just stops writes happening more frequently than every 5 minutes. But if the write from 5 minutes ago is still running...

@ara4n
Copy link
Member Author

ara4n commented May 15, 2022

Another datapoint: i set Nightly running again, breakpointed in persistSyncData, and JSON.stringify(roomsData).length is 143988497 bytes - so ~140MB, which really shouldn't be enough (even if allocated twice while bouncing back and forth between the main js-sdk and the sync accumulator worker) to cause a disaster. Next step; add more debugging to persisting the sync accumulator.

@ara4n
Copy link
Member Author

ara4n commented May 15, 2022

An argument against overlapping persists is the fact that the memory seems to explode very rapidly (e.g. before the inspector can notice, assuming it's updating every second, as one would expect).

@ara4n
Copy link
Member Author

ara4n commented May 15, 2022

So, my debug electron build is borked somehow - the Renderer immediately crashes with:

3   libsystem_c.dylib             	       0x1af449d48 __stack_chk_fail + 96
4   Electron Framework            	       0x30c06d14c content::RenderFrameImpl::PrepareFrameForCommit(GURL const&, blink::mojom::CommitNavigationParams const&) + 280

So i've set a debug element-web running in normal electron, with the following js-sdk patch to try to track what's happening when persisting sync:

diff --git a/src/store/indexeddb-local-backend.ts b/src/store/indexeddb-local-backend.ts
index a7a7e2574..22f079e91 100644
--- a/src/store/indexeddb-local-backend.ts
+++ b/src/store/indexeddb-local-backend.ts
@@ -418,7 +418,7 @@ export class LocalIndexedDBStoreBackend implements IIndexedDBBackend {
         nextBatch: string,
         roomsData: ISyncResponse["rooms"],
     ): Promise<void> {
-        logger.log("Persisting sync data up to", nextBatch);
+        logger.log("Persisting sync data (" + JSON.stringify(roomsData).length + " bytes) up to", nextBatch);
         return utils.promiseTry<void>(() => {
             const txn = this.db.transaction(["sync"], "readwrite");
             const store = txn.objectStore("sync");
@@ -427,7 +427,9 @@ export class LocalIndexedDBStoreBackend implements IIndexedDBBackend {
                 nextBatch,
                 roomsData,
             }); // put == UPSERT
-            return txnAsPromise(txn).then();
+            return txnAsPromise(txn).then(()=>{
+                logger.log("Persisted sync data up to", nextBatch);
+            });
         });
     }

So far it seems to be running nicely, with:

Persisting sync data (118103281 bytes) up to m2943770749~1.2943770759~2.2943770759~3.2943770759_757284974_18384151_1359046117_1374928761_3253669_488017884_3664966327_212414
19:53:24.059 logger.ts:52 Persisted sync data up to m2943770749~1.2943770759~2.2943770759~3.2943770759_757284974_18384151_1359046117_1374928761_3253669_488017884_3664966327_212414

etc. Let's see what it logs when it explodes tonight...

@ara4n
Copy link
Member Author

ara4n commented May 16, 2022

(It didn't crash overnight. Worth noting that the sync accumulator size has shrunk to 118103281 from 143988497 thanks to it being a fresh login - which might be enough of a difference to avoid problems. Also, the overnight syncs are all very boring:

05:09:10.929 logger.ts:52 Persisting sync data (118890285 bytes) up to s2944706664_757284974_19560827_1359546318_1375575358_3254442_488249981_3666896507_212414
05:09:14.555 logger.ts:52 Persisted sync data up to s2944706664_757284974_19560827_1359546318_1375575358_3254442_488249981_3666896507_212414
05:44:10.106 logger.ts:52 Persisting sync data (118911685 bytes) up to s2944752144_757284974_19612840_1359565978_1375603125_3254464_488260649_3666972501_212414
05:44:13.184 logger.ts:52 Persisted sync data up to s2944752144_757284974_19612840_1359565978_1375603125_3254464_488260649_3666972501_212414

No sign of any so far taking more than 4 secs.

@novocaine
Copy link
Contributor

novocaine commented May 18, 2022

thread element-hq/element-web#16

are you sure this is the crashing thread? this thread is waiting on a pthread lock, judging by

  * frame #0: 0x00000001af4e4290 libsystem_kernel.dylib`__psynch_cvwait + 8
    frame vector-im/element-web#1: 0x00000001af51e83c libsystem_pthread.dylib`_pthread_cond_wait + 1236

I'd expect a crashed thread to be inside some function, not waiting for a lock. In your other crash dumps, the crashing threads are inside node::GetEnvironmentIsolateData in a thread called ThreadPoolForegroundWorker inside v8::internal::FatalProcessOutOfMemory called from v8::internal::Heap::StartIncrementalMarking

https://github.com/vector-im/element-web/files/8650110/nightly-helper-renderer-crashdump.txt
Crashed Thread: 20 ThreadPoolForegroundWorker

Thread 20 Crashed:: ThreadPoolForegroundWorker
0   Electron Framework            	       0x11540f530 node::GetEnvironmentIsolateData(node::Environment*) + 8767684
1   Electron Framework            	       0x11540f320 node::GetEnvironmentIsolateData(node::Environment*) + 8767156
2   Electron Framework            	       0x11660d7d0 node::GetEnvironmentIsolateData(node::Environment*) + 27634532
3   Electron Framework            	       0x1139202f8 v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*) + 724
4   Electron Framework            	       0x113920228 v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*) + 516
5   Electron Framework            	       0x113ad23b0 v8::internal::Heap::StartIncrementalMarking(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) + 1108
6   Electron Framework            	       0x113b9c00c v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 241132
7   Electron Framework            	       0x113b6effc v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 56796
8   Electron Framework            	       0x113b619b0 v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 1936
9   Electron Framework            	       0x113b61734 v8::internal::GlobalSafepointScope::~GlobalSafepointScope() + 1300
10  Electron Framework            	       0x1165f5c10 node::GetEnvironmentIsolateData(node::Environment*) + 27537316
11  Electron Framework            	       0x1165f4860 node::GetEnvironmentIsolateData(node::Environment*) + 27532276
12  Electron Framework            	       0x115448c70 node::GetEnvironmentIsolateData(node::Environment*) + 9003012
13  Electron Framework            	       0x115469b78 node::GetEnvironmentIsolateData(node::Environment*) + 9137932
14  Electron Framework            	       0x1154691c0 node::GetEnvironmentIsolateData(node::Environment*) + 9135444
15  Electron Framework            	       0x1154968a4 node::GetEnvironmentIsolateData(node::Environment*) + 9321528
16  Electron Framework            	       0x115468fa4 node::GetEnvironmentIsolateData(node::Environment*) + 9134904
17  Electron Framework            	       0x115477720 node::GetEnvironmentIsolateData(node::Environment*) + 9194164
18  Electron Framework            	       0x115477364 node::GetEnvironmentIsolateData(node::Environment*) + 9193208
19  Electron Framework            	       0x115496d80 node::GetEnvironmentIsolateData(node::Environment*) + 9322772
20  libsystem_pthread.dylib       	       0x1afa3226c _pthread_start + 148
21  libsystem_pthread.dylib       	       0x1afa2d08c thread_start + 8

https://github.com/vector-im/element-web/files/8650214/Element.Nightly.Helper.Renderer.-2022-05-04-043838.ips.txt

These json format dumps are really annoying. It contains "faultingThread" : 19 which isn't a thread ID - I think its an index into the threads list. They also aren't properly formed JSON. But, some python later

{'frames': [{'imageIndex': 3,
             'imageOffset': 42005808,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 8767684},
            {'imageIndex': 3,
             'imageOffset': 42005280,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 8767156},
            {'imageIndex': 3,
             'imageOffset': 60872656,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27634532},
            {'imageIndex': 3,
             'imageOffset': 13763320,
             'symbol': 'v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, '
                       'char const*)',
             'symbolLocation': 724},
            {'imageIndex': 3,
             'imageOffset': 13763112,
             'symbol': 'v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, '
                       'char const*)',
             'symbolLocation': 516},
            {'imageIndex': 3,
             'imageOffset': 15541168,
             'symbol': 'v8::internal::Heap::StartIncrementalMarking(int, '
                       'v8::internal::GarbageCollectionReason, '
                       'v8::GCCallbackFlags)',
             'symbolLocation': 1108},
            {'imageIndex': 3,
             'imageOffset': 16367628,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 241132},
            {'imageIndex': 3,
             'imageOffset': 16184732,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 58236},
            {'imageIndex': 3,
             'imageOffset': 16128432,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 1936},
            {'imageIndex': 3,
             'imageOffset': 16127796,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 1300},
            {'imageIndex': 3,
             'imageOffset': 60775440,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27537316},
            {'imageIndex': 3,
             'imageOffset': 60770400,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27532276},
            {'imageIndex': 3,
             'imageOffset': 42241136,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9003012},
            {'imageIndex': 3,
             'imageOffset': 42376056,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9137932},
            {'imageIndex': 3,
             'imageOffset': 42373568,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9135444},
            {'imageIndex': 3,
             'imageOffset': 42559652,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9321528},
            {'imageIndex': 3,
             'imageOffset': 42373028,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9134904},
            {'imageIndex': 3,
             'imageOffset': 42432288,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9194164},
            {'imageIndex': 3,
             'imageOffset': 42431332,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9193208},
            {'imageIndex': 3,
             'imageOffset': 42560896,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9322772},
            {'imageIndex': 6,
             'imageOffset': 29292,
             'symbol': '_pthread_start',
             'symbolLocation': 148},
            {'imageIndex': 6,
             'imageOffset': 8332,
             'symbol': 'thread_start',
             'symbolLocation': 8}],
 'id': 1121028,
 'name': 'ThreadPoolForegroundWorker',

https://github.com/vector-im/element-web/files/8650215/Element.Nightly.Helper.Renderer.-2022-05-04-043838.ips.txt

also "faultingThread" : 19,

{'frames': [{'imageIndex': 3,
             'imageOffset': 42009108,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 8767756},
            {'imageIndex': 3,
             'imageOffset': 42008580,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 8767228},
            {'imageIndex': 3,
             'imageOffset': 60875828,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27634476},
            {'imageIndex': 3,
             'imageOffset': 13763256,
             'symbol': 'v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, '
                       'char const*)',
             'symbolLocation': 724},
            {'imageIndex': 3,
             'imageOffset': 13763048,
             'symbol': 'v8::internal::FatalProcessOutOfMemory(v8::internal::Isolate*, '
                       'char const*)',
             'symbolLocation': 516},
            {'imageIndex': 3,
             'imageOffset': 15541104,
             'symbol': 'v8::internal::Heap::StartIncrementalMarking(int, '
                       'v8::internal::GarbageCollectionReason, '
                       'v8::GCCallbackFlags)',
             'symbolLocation': 1108},
            {'imageIndex': 3,
             'imageOffset': 16367564,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 241132},
            {'imageIndex': 3,
             'imageOffset': 16315740,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 189308},
            {'imageIndex': 3,
             'imageOffset': 16128308,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 1876},
            {'imageIndex': 3,
             'imageOffset': 16127732,
             'symbol': 'v8::internal::GlobalSafepointScope::~GlobalSafepointScope()',
             'symbolLocation': 1300},
            {'imageIndex': 3,
             'imageOffset': 60778612,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27537260},
            {'imageIndex': 3,
             'imageOffset': 60773572,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 27532220},
            {'imageIndex': 3,
             'imageOffset': 42244436,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9003084},
            {'imageIndex': 3,
             'imageOffset': 42379356,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9138004},
            {'imageIndex': 3,
             'imageOffset': 42376868,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9135516},
            {'imageIndex': 3,
             'imageOffset': 42562952,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9321600},
            {'imageIndex': 3,
             'imageOffset': 42376328,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9134976},
            {'imageIndex': 3,
             'imageOffset': 42435588,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9194236},
            {'imageIndex': 3,
             'imageOffset': 42434632,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9193280},
            {'imageIndex': 3,
             'imageOffset': 42564196,
             'symbol': 'node::GetEnvironmentIsolateData(node::Environment*)',
             'symbolLocation': 9322844},
            {'imageIndex': 6,
             'imageOffset': 29292,
             'symbol': '_pthread_start',
             'symbolLocation': 148},
            {'imageIndex': 6,
             'imageOffset': 8332,
             'symbol': 'thread_start',
             'symbolLocation': 8}],
 'id': 57193,
 'name': 'ThreadPoolForegroundWorker',

@novocaine
Copy link
Contributor

novocaine commented May 18, 2022

It appears v8 shares GC work between threads, so if the OOM is occurring during GC, the crashing thread might not be the thread making the large number of allocations that lead to memory exhaustion.

Did I misunderstand what you meant by "exploding thread" and you actually meant "the thread with the absurdly deep number of frames" rather than the crashing thread? I guess that would indicate excessive stack usage but not necessarily which one is allocating too much heap memory, although if its infinitely recursing, maybe fair enough

@novocaine
Copy link
Contributor

novocaine commented May 18, 2022

Also, looking again at thread 16 - V8 is a JIT compiler, not an interpreter - so I think the unannotated frames 54 - 61 might correspond to the actual JS frames, having been compiled to machine code, and the later frames correspond to execution of v8 syscalls from that JS. Trying to understand exactly what's on the stack for the JS frames might be impossible - although maybe there's some consistently placed debugging info like the name of the JS function at a consistent offset?

For calls like MakeAccessor or DeletePropertyOrElement you can cross-reference with source like https://chromium.googlesource.com/v8/v8/+/roll/src/accessors.cc#21 to inspect args and locals to maybe understand exactly the names of properties being accessed.

ara4n referenced this issue in matrix-org/matrix-js-sdk Aug 7, 2022
…flight

this should hopefully reduce chances of
https://github.com/vector-im/element-web/issues/21541 a bit more
as we were incorrectly loading the sync accumulator even
if a sync persist was already in flight, thus wasting RAM
and increasing the chance of the renderer process OOMing
@ara4n
Copy link
Member Author

ara4n commented Aug 9, 2022

matrix-org/matrix-js-sdk#2569 may help solve this; now it's merged let's see...

@ara4n
Copy link
Member Author

ara4n commented Aug 13, 2022

nope, still happening.

@Constaline
Copy link

how about using interval to check whether the content of the window has become "blank" ?

jerrod-lankford/google-voice-desktop-app#95

@ara4n
Copy link
Member Author

ara4n commented Jan 18, 2023

this has gone back to happening every night for the last week or so.

how about using interval to check whether the content of the window has become "blank" ?

The reason the window goes blank is because the browser process that powers electron runs out of memory and crashes, so I don't think that an interval check is going to help us. We need to find the v8/native memory leak and fix it, alas (which in our case might be a matter of finally switching to Sliding Sync, to avoid having so much data flying around)

@turt2live
Copy link
Member

fwiw, last week I was seeing crashes ~daily, and this week it's every 5-6 hours regardless of usage. All cases look like OOMs (white screen when I get back to it).

My machine doesn't sleep, but the process is still crashed by the time I get to it in the morning, then crashes twice more through the day.

@NeonMinnen
Copy link

Just chiming in. Had this issue with the most recent Nightly build. It was only running for 30 minutes prior to crash. This is Nightly desktop edition, running on Arch Linux.

This is definitely not a new issue by any means. I was having this issue 6 months ago with the Flatpak version of Element as well.

@t3chguy
Copy link
Member

t3chguy commented Mar 24, 2023

This is Nightly desktop edition, running on Arch Linux.

This is definitely not a new issue by any means. I was having this issue 6 months ago with the Flatpak version of Element as well.

Both the Arch package and Flatpak are community maintained and have additional crashes specific to them

@NeonMinnen
Copy link

This is Nightly desktop edition, running on Arch Linux.

This is definitely not a new issue by any means. I was having this issue 6 months ago with the Flatpak version of Element as well.

Both the Arch package and Flatpak are community maintained and have additional crashes specific to them

Which ones? I'm running Nightly pulled directly from https://packages.element.io/debian/pool/main/e/element-nightly/element-nightly_2023032401_amd64.deb

@t3chguy
Copy link
Member

t3chguy commented Mar 24, 2023

@NeonMinnen if you're running a deb on Arch then that's not exactly supported. The dependencies won't match exactly etc.

@the-moog
Copy link

I updated element-hq/element-web#25120 (though 'closed') with more information. I can also add that if your kernel is configured for OOM-KILLER (I.e. there is a /proc//oom-adjust and /proc//oom-score then the reason you get a white page is the kernel has killed it off.
I updated #25120 with snippet of top showing the huge (multiple TB) pre-allocation of VM.

@ara4n
Copy link
Member Author

ara4n commented May 2, 2023

I did another deep dive on this today. My theory is that OOMs happen most often when laptops are asleep (but not plugged in), meaning that v8 runs very infrequently and something stacks up and runs out of RAM (perhaps serialising the sync accumulator for postmessage, or handing it to indexeddb, or handing it to sqlite). Logging heap usage in the console with the ugly hack of:

while (true) {
    await new Promise(r=>setTimeout(r, 1000));
    s = performance.memory.usedJSHeapSize;
    console.warn(s);
    if (s > 1.5 * 1024 * 1024 * 1024)
        debugger ;
}

produces a trace like this when it goes bang:

image

The spike at 14:26 catches a persist in the act, so if the same thing happened at 16:00 but starting from 1.3GB (modulo quantising) then it might be close enough to the 1.5GB limit where it explodes.

So, in terms of solutions:
1 We could apply a heuristic to stop persisting the sync accumulator when the app is running slowly (e.g. due to the laptop being asleep) - measure the passage of time and if sleep(10000) keeps taking significantly longer than 10s to run, then we flag ourselves as in slowmo and stop trying to persist the accumulator.
2. Or we could try to lock the sync accumulator persistence more intelligently to stop it from stacking up.
3. Or we could figure out why the base heap usage is so massive - it's meant to idle at 300-500MB on a clean launch, not 1.2GB! This is probably the actual core problem, rather than the fact that sync persistence spikes RAM when the computer's asleep.
4. Or we could kill the sync accumulator by moving to sliding sync.

I ended up doing option 3 at element-hq/element-web#25264. The workaround is that if your Element Desktop starts OOMing lots; log out and log in again (being sure to save your encryption keys).

Meanwhile option 4 is the preferred solution.

@t3chguy
Copy link
Member

t3chguy commented May 5, 2023

facebook/react#18116 looks like the upstream issue about React holding onto detached DOM if key changes. Upgrading to React 18 may alleviate this as it allows for more aggressive GC.

Found an event emitter handler leak in ThreadView

@turt2live
Copy link
Member

fwiw, I found this was happening on a timescale of minutes (making OOM fairly unlikely). Letting Element run normally, then having the machine go to sleep/turn off on its own, then a couple minutes later unsleeping the laptop only to be met with a blank screen in Element.

It was not 100% reproducible, but for the 2 hours I had to force kill and restart Element every 10 minutes it was a tad annoying :p

@turt2live
Copy link
Member

crashes are now happening every 4 hours for me, ftr. Curiously, it's only happening while I'm working - rarely on weekends/evenings/overnight. Since nearly all of our internal rooms are encrypted, I'm curious if it's related to that at all?

I unfortunately can't get a memory profile in any reasonable sense, as memory profiling takes enough memory to crash the process :|

@turt2live
Copy link
Member

This is getting worse: crashes are now happening every 2-3 hours when I'm trying to send messages in encrypted rooms. If I don't use the client for anything, it's fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants