Startup is extremely slow (or hangs) #1842

Open
sesse opened this Issue Nov 30, 2017 · 57 comments

Comments

9 participants

sesse commented Nov 30, 2017

  • I have searched open and closed issues for duplicates

I see there are several near-duplicates, but I'm unsure if there's anything exactly like this, so I'm reporting with the risk of a duplicate :-)

After upgrading from the Chrome extension to the “standalone” client (which is still based on Chrome), startup has become unbearably slow. Signal shows “Loading…” and uses 100% CPU for ages. After maybe 5 minutes, it says “10 so far…”, then after a few more minutes “20 so far” and so on. Recently, it seems to simply never load.

Startup in the Chrome extension was slow, but it was more like 30 seconds, and I haven't had a whole lot of new messages.

Platform info

Operating System: Debian GNU/Linux sid/unstable
Signal Version: 1.0.39

Link to debug log

Here's what happens on the console after I start Signal and allow it to run for a while:

klump:~> signal-desktop
Gtk-Message: Failed to load module "atk-bridge"
reading package.json
configuring
NODE_ENV production
NODE_CONFIG_DIR /opt/Signal/resources/app.asar/config
NODE_CONFIG {}
ALLOW_CONFIG_MUTATIONS undefined
HOSTNAME undefined
NODE_APP_INSTANCE undefined
SUPPRESS_NO_CONFIG_WARNING undefined
setting AUMID
making app single instance
userData: /home/sesse/.config/Signal
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"app ready","time":"2017-11-30T22:35:31.817Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":50,"msg":"Problem loading messages for locale nb Error: ENOENT, _locales/nb/messages.json not found in /opt/Signal/resources/app.asar\n    at notFoundError (ELECTRON_ASAR.js:115:19)\n    at Object.fs.readFileSync (ELECTRON_ASAR.js:514:9)\n    at getLocaleMessages (/opt/Signal/resources/app.asar/app/locale.js:27:24)\n    at load (/opt/Signal/resources/app.asar/app/locale.js:47:16)\n    at App.<anonymous> (/opt/Signal/resources/app.asar/main.js:274:14)\n    at emitTwo (events.js:111:20)\n    at App.emit (events.js:191:7)","time":"2017-11-30T22:35:31.819Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":50,"msg":"Falling back to en locale","time":"2017-11-30T22:35:31.819Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Initializing BrowserWindow config: {\"width\":1918,\"height\":990,\"minWidth\":700,\"minHeight\":360,\"autoHideMenuBar\":false,\"webPreferences\":{\"nodeIntegration\":false,\"preload\":\"/opt/Signal/resources/app.asar/preload.js\"},\"maximized\":false,\"x\":1,\"y\":63}","time":"2017-11-30T22:35:31.819Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Detected Linux. Setting up spell check with locale nb_NO and dictionary location /usr/share/hunspell","time":"2017-11-30T22:35:32.243Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"pre-main prep time: 3 ms","time":"2017-11-30T22:35:32.647Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Build expires:  2018-02-20T01:28:40.000Z","time":"2017-11-30T22:35:32.675Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"background page reloaded","time":"2017-11-30T22:35:32.700Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"environment: production","time":"2017-11-30T22:35:32.700Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"ConversationController: starting initial fetch","time":"2017-11-30T22:35:32.921Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"ConversationController: done with initial fetch","time":"2017-11-30T22:35:32.985Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"listening for registration events","time":"2017-11-30T22:35:32.988Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Next signed key rotation scheduled for \"2017-12-01T15:11:42.107Z\"","time":"2017-11-30T22:35:32.990Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"connect","time":"2017-11-30T22:35:32.992Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"getAllFromCache","time":"2017-11-30T22:35:32.992Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"opening message socket https://textsecure-service.whispersystems.org","time":"2017-11-30T22:35:32.993Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"open inbox","time":"2017-11-30T22:35:33.015Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"websocket open","time":"2017-11-30T22:35:34.496Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"got request PUT /api/v1/queue/empty","time":"2017-11-30T22:35:34.518Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"getAllFromCache loaded 1 saved envelopes","time":"2017-11-30T22:35:35.748Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:36:29.532Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"queueing decrypted envelope +[REDACTED]189.1 1511217578791","time":"2017-11-30T22:36:54.029Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"read messages from +[REDACTED]189.1","time":"2017-11-30T22:36:54.045Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"read sync +[REDACTED]887 1511215701301","time":"2017-11-30T22:36:54.046Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"No message for read sync +[REDACTED]887 1511215701301","time":"2017-11-30T22:36:54.050Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:37:25.212Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:38:20.420Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:39:15.650Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:40:10.798Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:41:05.945Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:42:01.117Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:42:56.366Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:43:51.513Z","v":0}
{"name":"log","hostname":"klump","pid":9258,"level":30,"msg":"Sending a keepalive message","time":"2017-11-30T22:44:46.765Z","v":0}

At this point, it's used 100% CPU for more than ten minutes, and is still in “Loading…”.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Nov 30, 2017

I let it run for half an hour or so. From looking at the developer tools, it appears that it's not actually doing anything; all the CPU is going into repainting the loading bar… which also explains why the entire system is sluggish even if I nice Signal to idle priority. (It's driving the compositor.)

Screenshot:
hung

sesse commented Nov 30, 2017

I let it run for half an hour or so. From looking at the developer tools, it appears that it's not actually doing anything; all the CPU is going into repainting the loading bar… which also explains why the entire system is sluggish even if I nice Signal to idle priority. (It's driving the compositor.)

Screenshot:
hung

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

It sounds like you did an export from the Chrome extension to Standalone. How big is your ~/.config/Signal directory? Also, can you provide your full log so far by going to View -> Debug Log in the main menu?

Member

scottnonnenberg commented Dec 1, 2017

It sounds like you did an export from the Chrome extension to Standalone. How big is your ~/.config/Signal directory? Also, can you provide your full log so far by going to View -> Debug Log in the main menu?

This comment has been minimized.

Show comment Hide comment
@mohonk

mohonk Dec 1, 2017

I see a similar behavior and I did an export from the Chrome extension which has 1 GB.
The ~/.config/Signal directory has 1.1 GB.
After I renamed the Chrome extension export and restarted Signal Desktop the startup went smoothly and I can actually use Signal Desktop now :)

mohonk commented Dec 1, 2017

I see a similar behavior and I did an export from the Chrome extension which has 1 GB.
The ~/.config/Signal directory has 1.1 GB.
After I renamed the Chrome extension export and restarted Signal Desktop the startup went smoothly and I can actually use Signal Desktop now :)

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

Yes, I did an export from Chrome. ~/.config/Signal is 2.2 GB, although I'm not sure why it would be so big. :-)

The debug log contained basically what I already pasted, but I started Signal anew and made it post it (URL below). As you will see, I ran from 22:35 to 23:15 before I gave up (and all the time, it was stuck in “Loading”), so my assumption is that it's really gotten stuck somewhere, not just slow.

However, interestingly, today it got further. After I posted a debug log, I let it stay for a few more minutes, and after four minutes, it got to “Loading messages. 50 so far…” and then appeared to hang again. I gave it a few more minutes to make sure it didn't progress further. then posted another debug log. Here it is: https://gist.github.com/anonymous/199256d56064f6fc392797f694889883

sesse commented Dec 1, 2017

Yes, I did an export from Chrome. ~/.config/Signal is 2.2 GB, although I'm not sure why it would be so big. :-)

The debug log contained basically what I already pasted, but I started Signal anew and made it post it (URL below). As you will see, I ran from 22:35 to 23:15 before I gave up (and all the time, it was stuck in “Loading”), so my assumption is that it's really gotten stuck somewhere, not just slow.

However, interestingly, today it got further. After I posted a debug log, I let it stay for a few more minutes, and after four minutes, it got to “Loading messages. 50 so far…” and then appeared to hang again. I gave it a few more minutes to make sure it didn't progress further. then posted another debug log. Here it is: https://gist.github.com/anonymous/199256d56064f6fc392797f694889883

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

I started with LC_ALL=C signal-desktop. It booted up in maybe five seconds. Closed, booted again, still works. Restarted with my normal locale (nb_NO.UTF-8) and now it boots really fast again. So something got unstuck; I don't know if it had anything to do with the locale or not.

New debug log: https://gist.github.com/anonymous/c1c07b134d59dd802192682470ce36a2

sesse commented Dec 1, 2017

I started with LC_ALL=C signal-desktop. It booted up in maybe five seconds. Closed, booted again, still works. Restarted with my normal locale (nb_NO.UTF-8) and now it boots really fast again. So something got unstuck; I don't know if it had anything to do with the locale or not.

New debug log: https://gist.github.com/anonymous/c1c07b134d59dd802192682470ce36a2

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

@sesse Is it true that whenever it hangs like that it uses 100% CPU? Was it also using a lot of memory?

Member

scottnonnenberg commented Dec 1, 2017

@sesse Is it true that whenever it hangs like that it uses 100% CPU? Was it also using a lot of memory?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

Yes, it uses 100% CPU, although I wonder if maybe it's using all of it (and the GPU) to show the loading animation, not for useful work.

I don't think it was using a lot of memory—I would have noticed.

sesse commented Dec 1, 2017

Yes, it uses 100% CPU, although I wonder if maybe it's using all of it (and the GPU) to show the loading animation, not for useful work.

I don't think it was using a lot of memory—I would have noticed.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

My suspicion is that it's some sort of database cleanup thing going on. Did you happen to attempt to import more than once?

Member

scottnonnenberg commented Dec 1, 2017

My suspicion is that it's some sort of database cleanup thing going on. Did you happen to attempt to import more than once?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

I don't honestly remember (it's a month ago now). It may have failed the first time?

sesse commented Dec 1, 2017

I don't honestly remember (it's a month ago now). It may have failed the first time?

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

It sounds like it has happened a couple times now. Is there any pattern to it?

Can you also describe your general usage of the app? How many sent/received messages per day? How many images or other attachments?

Member

scottnonnenberg commented Dec 1, 2017

It sounds like it has happened a couple times now. Is there any pattern to it?

Can you also describe your general usage of the app? How many sent/received messages per day? How many images or other attachments?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

I don't know of any pattern, no. Usage is fairly light. Think 10–20 messages a day, maybe one or two images per day. There's the odd video or audio file, but those are rare. There are two groups that seem to be fairly large by Signal standards (~20 people).

Occasionally I get “Error handling incoming message” on incoming group messages. I doubt it's related.

sesse commented Dec 1, 2017

I don't know of any pattern, no. Usage is fairly light. Think 10–20 messages a day, maybe one or two images per day. There's the odd video or audio file, but those are rare. There are two groups that seem to be fairly large by Signal standards (~20 people).

Occasionally I get “Error handling incoming message” on incoming group messages. I doubt it's related.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

Okay, so to describe my thinking on this a little bit more. The database we use doesn't delete immediately. It comes back later to clean up. With a large amount of data stored, this process could take some time.

I think that's what is happening in your case, because the actual javascript code being run was minimal - you can see it in your trace. I expect that our requests to the database just didn't return. We do have some 2-minute timeouts in place in the app for just this kind of thing, so I'm disappointed that you were on the loading screen forever.

Anyway, you have two choices.

  1. Tough it out and continue to use that installation as-is. You can perhaps help us track this behavior down by poking and prodding the app over time.
  2. Start over from scratch. Close the app and erase ~/.config/Signal. Start up and re-link.

I suppose there might be another option too, deleting some of the conversations you don't really care about, to see if they eventually reduce the size of your database. Up to you!

Member

scottnonnenberg commented Dec 1, 2017

Okay, so to describe my thinking on this a little bit more. The database we use doesn't delete immediately. It comes back later to clean up. With a large amount of data stored, this process could take some time.

I think that's what is happening in your case, because the actual javascript code being run was minimal - you can see it in your trace. I expect that our requests to the database just didn't return. We do have some 2-minute timeouts in place in the app for just this kind of thing, so I'm disappointed that you were on the loading screen forever.

Anyway, you have two choices.

  1. Tough it out and continue to use that installation as-is. You can perhaps help us track this behavior down by poking and prodding the app over time.
  2. Start over from scratch. Close the app and erase ~/.config/Signal. Start up and re-link.

I suppose there might be another option too, deleting some of the conversations you don't really care about, to see if they eventually reduce the size of your database. Up to you!

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Dec 1, 2017

This is LevelDB, right? :-) I know fairly well how it works (as in, the basic ideas); I was involved in open-sourcing parts of it.

If you think it's spending its time in native code, I can run perf on it next time it's slow to start up? I guess the theory is that LevelDB needs more than two minutes to return, Signal times out the database call and doesn't manage to recover?

sesse commented Dec 1, 2017

This is LevelDB, right? :-) I know fairly well how it works (as in, the basic ideas); I was involved in open-sourcing parts of it.

If you think it's spending its time in native code, I can run perf on it next time it's slow to start up? I guess the theory is that LevelDB needs more than two minutes to return, Signal times out the database call and doesn't manage to recover?

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Dec 1, 2017

Member

Yup, IndexedDB is built on top of LevelDB in Chrome. Any analysis you can do there would be really, really helpful! :0)

It's interesting, looking at your log, from 22:35 to 23:15. Everything appears to be happening normally, except there's no 'empty' event fired which would get rid of that loading screen. And we should see one, since there were no messages from the websocket, and only one message that we retry.

If you could continue providing logs, telling us which sessions started up properly and which didn't, that would help us ensure that it isn't something in our application startup. My best guess at this point is that we're doing something on startup which is really sensitive to high CPU usage, maybe a number of defers to the event loop due to all of our Promise-related operations, and the event loop is running very slowly against all that CPU usage spent on LevelDB cleanup.

Member

scottnonnenberg commented Dec 1, 2017

Yup, IndexedDB is built on top of LevelDB in Chrome. Any analysis you can do there would be really, really helpful! :0)

It's interesting, looking at your log, from 22:35 to 23:15. Everything appears to be happening normally, except there's no 'empty' event fired which would get rid of that loading screen. And we should see one, since there were no messages from the websocket, and only one message that we retry.

If you could continue providing logs, telling us which sessions started up properly and which didn't, that would help us ensure that it isn't something in our application startup. My best guess at this point is that we're doing something on startup which is really sensitive to high CPU usage, maybe a number of defers to the event loop due to all of our Promise-related operations, and the event loop is running very slowly against all that CPU usage spent on LevelDB cleanup.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 6, 2018

I had another relatively slow start; only 30 seconds or so, but enough that I got a few seconds of profile. It's indeed IndexedDB that's taking time:

  17,29%  IndexedDB        [kernel.kallsyms]            [k] copy_user_enhanced_fast_string                       ▒
  14,50%  IndexedDB        signal-desktop               [.] 0x0000000001c6b163                                   ▒
   7,02%  IndexedDB        [kernel.kallsyms]            [k] clear_page_erms                                      ▒
   2,91%  IndexedDB        signal-desktop               [.] 0x0000000001c6b177                                   ▒
   2,15%  IndexedDB        [kernel.kallsyms]            [k] native_irq_return_iret                               ▒
   2,07%  IndexedDB        [unknown]                    [k] 0x00007f35e74a96bf                                   ▒
   2,00%  IndexedDB        [kernel.kallsyms]            [k] __handle_mm_fault                                    ▒
   1,61%  IndexedDB        libc-2.26.so                 [.] __memmove_avx_unaligned_erms                         ▒
   1,47%  IndexedDB        [kernel.kallsyms]            [k] _raw_spin_lock                                       ▒
   1,29%  IndexedDB        [kernel.kallsyms]            [k] unmap_page_range                                     ▒
   1,16%  IndexedDB        [kernel.kallsyms]            [k] find_get_entry                                       ▒
   1,14%  IndexedDB        [kernel.kallsyms]            [k] get_page_from_freelist                               ▒
   1,05%  IndexedDB        [kernel.kallsyms]            [k] __rmqueue                                            ▒
   0,99%  IndexedDB        [kernel.kallsyms]            [k] release_pages                                        ▒
   0,94%  IndexedDB        [kernel.kallsyms]            [k] free_pcppages_bulk                                   ▒
   0,66%  IndexedDB        [kernel.kallsyms]            [k] __pagevec_lru_add_fn                                 ▒
   0,57%  IndexedDB        [kernel.kallsyms]            [k] __do_page_fault                                      ▒
   0,56%  IndexedDB        [kernel.kallsyms]            [k] error_entry                                          ▒
   0,54%  IndexedDB        [kernel.kallsyms]            [k] free_hot_cold_page                                   ▒
   0,49%  IndexedDB        [kernel.kallsyms]            [k] generic_file_read_iter                               ▒
   0,48%  IndexedDB        [kernel.kallsyms]            [k] __radix_tree_lookup                                  ▒
   0,47%  IndexedDB        [kernel.kallsyms]            [k] mem_cgroup_try_charge                                ▒
   0,38%  IndexedDB        [kernel.kallsyms]            [k] handle_mm_fault                                      ▒
   0,35%  IndexedDB        [kernel.kallsyms]            [k] __alloc_pages_nodemask                               ▒
   0,34%  IndexedDB        [kernel.kallsyms]            [k] swapgs_restore_regs_and_return_to_usermode           ▒
   0,32%  IndexedDB        [kernel.kallsyms]            [k] __mod_zone_page_state                                ▒
   0,30%  IndexedDB        [kernel.kallsyms]            [k] __mod_node_page_state                                ▒
   0,29%  IndexedDB        [kernel.kallsyms]            [k] pagevec_lru_move_fn                                  ▒
   0,28%  IndexedDB        [kernel.kallsyms]            [k] alloc_pages_vma                                      ▒
   0,27%  IndexedDB        [kernel.kallsyms]            [k] free_pages_and_swap_cache                            ▒
   0,26%  IndexedDB        libc-2.26.so                 [.] __GI_wcrtomb          

It's interesting that a significant fraction of the time is being spent in the kernel, not in userspace. (This is a KPTI-enabled kernel, so syscalls are a bit slower than they'd be a few weeks ago, but still, it shouldn't be that bad.)

sesse commented Jan 6, 2018

I had another relatively slow start; only 30 seconds or so, but enough that I got a few seconds of profile. It's indeed IndexedDB that's taking time:

  17,29%  IndexedDB        [kernel.kallsyms]            [k] copy_user_enhanced_fast_string                       ▒
  14,50%  IndexedDB        signal-desktop               [.] 0x0000000001c6b163                                   ▒
   7,02%  IndexedDB        [kernel.kallsyms]            [k] clear_page_erms                                      ▒
   2,91%  IndexedDB        signal-desktop               [.] 0x0000000001c6b177                                   ▒
   2,15%  IndexedDB        [kernel.kallsyms]            [k] native_irq_return_iret                               ▒
   2,07%  IndexedDB        [unknown]                    [k] 0x00007f35e74a96bf                                   ▒
   2,00%  IndexedDB        [kernel.kallsyms]            [k] __handle_mm_fault                                    ▒
   1,61%  IndexedDB        libc-2.26.so                 [.] __memmove_avx_unaligned_erms                         ▒
   1,47%  IndexedDB        [kernel.kallsyms]            [k] _raw_spin_lock                                       ▒
   1,29%  IndexedDB        [kernel.kallsyms]            [k] unmap_page_range                                     ▒
   1,16%  IndexedDB        [kernel.kallsyms]            [k] find_get_entry                                       ▒
   1,14%  IndexedDB        [kernel.kallsyms]            [k] get_page_from_freelist                               ▒
   1,05%  IndexedDB        [kernel.kallsyms]            [k] __rmqueue                                            ▒
   0,99%  IndexedDB        [kernel.kallsyms]            [k] release_pages                                        ▒
   0,94%  IndexedDB        [kernel.kallsyms]            [k] free_pcppages_bulk                                   ▒
   0,66%  IndexedDB        [kernel.kallsyms]            [k] __pagevec_lru_add_fn                                 ▒
   0,57%  IndexedDB        [kernel.kallsyms]            [k] __do_page_fault                                      ▒
   0,56%  IndexedDB        [kernel.kallsyms]            [k] error_entry                                          ▒
   0,54%  IndexedDB        [kernel.kallsyms]            [k] free_hot_cold_page                                   ▒
   0,49%  IndexedDB        [kernel.kallsyms]            [k] generic_file_read_iter                               ▒
   0,48%  IndexedDB        [kernel.kallsyms]            [k] __radix_tree_lookup                                  ▒
   0,47%  IndexedDB        [kernel.kallsyms]            [k] mem_cgroup_try_charge                                ▒
   0,38%  IndexedDB        [kernel.kallsyms]            [k] handle_mm_fault                                      ▒
   0,35%  IndexedDB        [kernel.kallsyms]            [k] __alloc_pages_nodemask                               ▒
   0,34%  IndexedDB        [kernel.kallsyms]            [k] swapgs_restore_regs_and_return_to_usermode           ▒
   0,32%  IndexedDB        [kernel.kallsyms]            [k] __mod_zone_page_state                                ▒
   0,30%  IndexedDB        [kernel.kallsyms]            [k] __mod_node_page_state                                ▒
   0,29%  IndexedDB        [kernel.kallsyms]            [k] pagevec_lru_move_fn                                  ▒
   0,28%  IndexedDB        [kernel.kallsyms]            [k] alloc_pages_vma                                      ▒
   0,27%  IndexedDB        [kernel.kallsyms]            [k] free_pages_and_swap_cache                            ▒
   0,26%  IndexedDB        libc-2.26.so                 [.] __GI_wcrtomb          

It's interesting that a significant fraction of the time is being spent in the kernel, not in userspace. (This is a KPTI-enabled kernel, so syscalls are a bit slower than they'd be a few weeks ago, but still, it shouldn't be that bad.)

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Jan 6, 2018

10 seconds at my side with KPTI-disabled kernel
But I only have 4 conversations with a limit of 500 messages each and I didn't used the chrom plugin before.
I started fresh with the standalone app and a not working first sync.

Signal 1.1.0
Linux Ubuntu Mate 17.10 x64
Kernel 4.13.0-21-generic

MartinX3 commented Jan 6, 2018

10 seconds at my side with KPTI-disabled kernel
But I only have 4 conversations with a limit of 500 messages each and I didn't used the chrom plugin before.
I started fresh with the standalone app and a not working first sync.

Signal 1.1.0
Linux Ubuntu Mate 17.10 x64
Kernel 4.13.0-21-generic

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 8, 2018

Member

@MartinX3 We really need logs, especially in situations where it "doesn't work." And we need a lot more information about what "not working" really means. What exactly do you see?

Member

scottnonnenberg commented Jan 8, 2018

@MartinX3 We really need logs, especially in situations where it "doesn't work." And we need a lot more information about what "not working" really means. What exactly do you see?

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Jan 8, 2018

@scottnonnenberg
I would like to provide a log, like @sesse
But I don't know how to profile the IndexedDB process on linux.

MartinX3 commented Jan 8, 2018

@scottnonnenberg
I would like to provide a log, like @sesse
But I don't know how to profile the IndexedDB process on linux.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 8, 2018

After some stracing, I wonder if maybe the IndexedDB thread spends most of its CPU time talking to some other Signal process… Ie., that the problem isn't the database itself, but that the multi-process model causes huge overhead per query.

This is just a theory at this stage, though.

sesse commented Jan 8, 2018

After some stracing, I wonder if maybe the IndexedDB thread spends most of its CPU time talking to some other Signal process… Ie., that the problem isn't the database itself, but that the multi-process model causes huge overhead per query.

This is just a theory at this stage, though.

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Jan 8, 2018

So the SignalDB isn't loaded by a single thread?
That's sad in this use case.

MartinX3 commented Jan 8, 2018

So the SignalDB isn't loaded by a single thread?
That's sad in this use case.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 8, 2018

Member

@sesse We do encryption via service worker so we don't block the main thread. And it does goes to the database for some things. You're saying that we might see some benefits by ensuring that our DB access is from one thread only? You've seen this in other LevelDB implementations?

Member

scottnonnenberg commented Jan 8, 2018

@sesse We do encryption via service worker so we don't block the main thread. And it does goes to the database for some things. You're saying that we might see some benefits by ensuring that our DB access is from one thread only? You've seen this in other LevelDB implementations?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 8, 2018

When I say process, I really mean process—not thread. There are certainly two processes here (well, actually three), communicating by a UNIX domain socket. I don't know what they are talking about :-)

I don't know how service workers map to threads or processes. This is one of the problems of using JS for desktop applications; you're very far abstracted from the metal, so performance debugging can become very difficult.

sesse commented Jan 8, 2018

When I say process, I really mean process—not thread. There are certainly two processes here (well, actually three), communicating by a UNIX domain socket. I don't know what they are talking about :-)

I don't know how service workers map to threads or processes. This is one of the problems of using JS for desktop applications; you're very far abstracted from the metal, so performance debugging can become very difficult.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 8, 2018

Member

@sesse The way Electron works, built on top of Chromium, is that there are usually three processes. The overall management process, then a renderer process, then usually what appears to be a Chromium helper process - I'm not sure what goes on in that one; maybe visual compositing or something?

Anyway, you're saying that IndexedDB performance is slowing down because the process interacting with IndexedDB is frequently being interrupted for inter-process communication?

Member

scottnonnenberg commented Jan 8, 2018

@sesse The way Electron works, built on top of Chromium, is that there are usually three processes. The overall management process, then a renderer process, then usually what appears to be a Chromium helper process - I'm not sure what goes on in that one; maybe visual compositing or something?

Anyway, you're saying that IndexedDB performance is slowing down because the process interacting with IndexedDB is frequently being interrupted for inter-process communication?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 8, 2018

I'm not saying I have the answer by any means (I am definitely not an expert on Chrome internals!), but my theory is that IndexedDB lives in the helper process, so it sits behind (slow) inter-process communication.

sesse commented Jan 8, 2018

I'm not saying I have the answer by any means (I am definitely not an expert on Chrome internals!), but my theory is that IndexedDB lives in the helper process, so it sits behind (slow) inter-process communication.

This comment has been minimized.

Show comment Hide comment
@janLo

janLo Jan 16, 2018

I have the same problem on all my computers. The ones with an SSD are a bit slower but it blocks generally between 2 and 10 minutes on startup.

janLo commented Jan 16, 2018

I have the same problem on all my computers. The ones with an SSD are a bit slower but it blocks generally between 2 and 10 minutes on startup.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 16, 2018

Member

@janLo Please provide a bit more detail - do you see a 'messages so far' number counting up? How fast? How many messages with attachments do you generally receive? (we're downloading them on startup)

This issue is generally about situations where the number doesn't count up at all, or does it extremely slowly. Either way, a log from your large download would be useful. View -> Debug Log.

Member

scottnonnenberg commented Jan 16, 2018

@janLo Please provide a bit more detail - do you see a 'messages so far' number counting up? How fast? How many messages with attachments do you generally receive? (we're downloading them on startup)

This issue is generally about situations where the number doesn't count up at all, or does it extremely slowly. Either way, a log from your large download would be useful. View -> Debug Log.

This comment has been minimized.

Show comment Hide comment
@janLo

janLo Jan 17, 2018

I see the "counting up" - depending on how many messages the machine had recieved in the past I've waited for numbers between 300 and 1500. I recieve messages with attachments occationally but not that often. maybe 1-5 per day. It also seems that these are all teh "old" messages, as the new ones that I received until the app was closed arrive when the UI is started.

The log is filled with

INFO  2018-01-16T22:58:15.334Z queueing envelope +[REDACTED]084.12 1515975123664

janLo commented Jan 17, 2018

I see the "counting up" - depending on how many messages the machine had recieved in the past I've waited for numbers between 300 and 1500. I recieve messages with attachments occationally but not that often. maybe 1-5 per day. It also seems that these are all teh "old" messages, as the new ones that I received until the app was closed arrive when the UI is started.

The log is filled with

INFO  2018-01-16T22:58:15.334Z queueing envelope +[REDACTED]084.12 1515975123664

This comment has been minimized.

Show comment Hide comment
@entonjackson

entonjackson Jan 18, 2018

Same here. Takes forever to load conversation. 3 different devices. 3 different internet qualities. Makes no difference.

Same here. Takes forever to load conversation. 3 different devices. 3 different internet qualities. Makes no difference.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 18, 2018

Member

@entonjackson Thanks for weighing in. Please provide logs including times you experience slow startup and slow conversation load, as well as details about your computer. It will really help us figure out what we can do to improve things.

Member

scottnonnenberg commented Jan 18, 2018

@entonjackson Thanks for weighing in. Please provide logs including times you experience slow startup and slow conversation load, as well as details about your computer. It will really help us figure out what we can do to improve things.

This comment has been minimized.

Show comment Hide comment
@janLo

janLo Jan 18, 2018

I'd like to mention, that this happens only since I use the "standalone" version. It never happened with the chrome app.

janLo commented Jan 18, 2018

I'd like to mention, that this happens only since I use the "standalone" version. It never happened with the chrome app.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 18, 2018

Member

@janLo The Chrome app had the same 'counting up' screen. Are you saying that you never saw it, or are you saying that it never took this long? What OS are you on, by the way? Logs would be useful as well.

Member

scottnonnenberg commented Jan 18, 2018

@janLo The Chrome app had the same 'counting up' screen. Are you saying that you never saw it, or are you saying that it never took this long? What OS are you on, by the way? Logs would be useful as well.

This comment has been minimized.

Show comment Hide comment
@janLo

janLo Jan 18, 2018

It was never long enough for me to notice.

janLo commented Jan 18, 2018

It was never long enough for me to notice.

This comment has been minimized.

Show comment Hide comment
@DracoLlasa

DracoLlasa Jan 19, 2018

i just wanted to chime in a bit to see if i can get confirmation on this being a confirmed issue or not.
i send easily 100+ messages a day using the desktop app, as others have noted after a couple hundred to a thousand or two messages the app gets slow to start specifically to load a contact window that has a lot of history. Worse when i get to the point where i have many thousand of messages with a specific contact, the client will actually take a few seconds to send. i will type a message, press enter and it will sit a bit. during which time i could still change the message which was nice sometimes, but i saw that might have been fixed.

If this is a confirmed issue and my info can be of any assistance please let me know.

i just wanted to chime in a bit to see if i can get confirmation on this being a confirmed issue or not.
i send easily 100+ messages a day using the desktop app, as others have noted after a couple hundred to a thousand or two messages the app gets slow to start specifically to load a contact window that has a lot of history. Worse when i get to the point where i have many thousand of messages with a specific contact, the client will actually take a few seconds to send. i will type a message, press enter and it will sit a bit. during which time i could still change the message which was nice sometimes, but i saw that might have been fixed.

If this is a confirmed issue and my info can be of any assistance please let me know.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 19, 2018

Member

@DracoLlasa Your logs would be useful, since they will show initial startup time as well as conversation load time. Please don't hesitate to include them in the future. :0)

Member

scottnonnenberg commented Jan 19, 2018

@DracoLlasa Your logs would be useful, since they will show initial startup time as well as conversation load time. Please don't hesitate to include them in the future. :0)

This comment has been minimized.

Show comment Hide comment
@DracoLlasa

DracoLlasa Jan 19, 2018

@scottnonnenberg ok no problem, this is the first time i have contributed here, would the log you want be from the "view | Debug log" menus option? if not let me know from where.

If it is that debug log, should i use the 'submit' button there or just pull the relevant logs from a startup and such and post it here or in an attached txt file

@scottnonnenberg ok no problem, this is the first time i have contributed here, would the log you want be from the "view | Debug log" menus option? if not let me know from where.

If it is that debug log, should i use the 'submit' button there or just pull the relevant logs from a startup and such and post it here or in an attached txt file

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 19, 2018

Member

@DracoLlasa View -> Debug Log, then upload that log, then take the URL of that upload and paste it here.

Member

scottnonnenberg commented Jan 19, 2018

@DracoLlasa View -> Debug Log, then upload that log, then take the URL of that upload and paste it here.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 22, 2018

I had another fairly slow startup today. This time, I captured a perf with -g, which told me this:

-   13,70%     0,00%  IndexedDB        libpthread-2.26.so                       [.] __GI___libc_pread64                      ▒
   - __GI___libc_pread64                                                                                                     ▒
      - 6,92% entry_SYSCALL_64_fastpath                                                                                      ▒
         - sys_pread64                                                                                                       ▒
            - 6,90% vfs_read                                                                                                 ▒
               - 6,90% __vfs_read                                                                                            ▒
                  - 6,89% generic_file_read_iter                                                                             ▒
                     - 6,31% copy_page_to_iter                                                                               ▒
                        - copyout                                                                                            ▒
                           - 6,26% copy_user_enhanced_fast_string                                                            ▒
                              - 4,50% page_fault                                                                             ▒
                                 - 4,44% __do_page_fault                                                                     ▒
                                    - 4,27% handle_mm_fault                                                                  ▒
                                       - 4,21% __handle_mm_fault                                                             ▒
                                          + 2,68% alloc_pages_vma                                                            ▒
                                            0,65% _raw_spin_lock                                                             ▒
        5,15% copy_user_enhanced_fast_string                                                                                 ▒
      + 0,75% return_from_SYSCALL_64              

It seems that IndexedDB opens every single shard I have and wants to read a bit from it (probably the bloom filter or something similar). Since I have 647 shards, this takes a fair amount of time…

I think a relevant question to the other reporters in the threads is what this returns:

ls -l ~/.config/Signal/IndexedDB/file__0.indexeddb.leveldb/ |  wc -l

It would be interesting to see if there's a correlation between number of shards and startup slowness.

There's also seemingly somehow VM pressure just moving data around (I don't know for what, the stack unwinding somehow don't extend far enough):

-   13,58%     4,10%  IndexedDB        libc-2.26.so                             [.] __memmove_avx_unaligned_erms             ▒
   - 11,41% __memmove_avx_unaligned_erms                                                                                     ▒
      - 4,95% page_fault                                                                                                     ▒
         - 4,90% __do_page_fault                                                                                             ▒
            - 4,64% handle_mm_fault                                                                                          ▒
               - 4,52% __handle_mm_fault                                                                                     ▒
                  - 2,94% alloc_pages_vma                                                                                    ▒
                     - 2,91% __alloc_pages_nodemask                                                                          ▒
                        - get_page_from_freelist                                                                             ▒
                             2,27% clear_page_erms                                                                           ▒
                    0,57% _raw_spin_lock                                                                                     ▒
        2,93% __memmove_avx_unaligned_erms                                                                                   ▒
        0,70% error_entry 

sesse commented Jan 22, 2018

I had another fairly slow startup today. This time, I captured a perf with -g, which told me this:

-   13,70%     0,00%  IndexedDB        libpthread-2.26.so                       [.] __GI___libc_pread64                      ▒
   - __GI___libc_pread64                                                                                                     ▒
      - 6,92% entry_SYSCALL_64_fastpath                                                                                      ▒
         - sys_pread64                                                                                                       ▒
            - 6,90% vfs_read                                                                                                 ▒
               - 6,90% __vfs_read                                                                                            ▒
                  - 6,89% generic_file_read_iter                                                                             ▒
                     - 6,31% copy_page_to_iter                                                                               ▒
                        - copyout                                                                                            ▒
                           - 6,26% copy_user_enhanced_fast_string                                                            ▒
                              - 4,50% page_fault                                                                             ▒
                                 - 4,44% __do_page_fault                                                                     ▒
                                    - 4,27% handle_mm_fault                                                                  ▒
                                       - 4,21% __handle_mm_fault                                                             ▒
                                          + 2,68% alloc_pages_vma                                                            ▒
                                            0,65% _raw_spin_lock                                                             ▒
        5,15% copy_user_enhanced_fast_string                                                                                 ▒
      + 0,75% return_from_SYSCALL_64              

It seems that IndexedDB opens every single shard I have and wants to read a bit from it (probably the bloom filter or something similar). Since I have 647 shards, this takes a fair amount of time…

I think a relevant question to the other reporters in the threads is what this returns:

ls -l ~/.config/Signal/IndexedDB/file__0.indexeddb.leveldb/ |  wc -l

It would be interesting to see if there's a correlation between number of shards and startup slowness.

There's also seemingly somehow VM pressure just moving data around (I don't know for what, the stack unwinding somehow don't extend far enough):

-   13,58%     4,10%  IndexedDB        libc-2.26.so                             [.] __memmove_avx_unaligned_erms             ▒
   - 11,41% __memmove_avx_unaligned_erms                                                                                     ▒
      - 4,95% page_fault                                                                                                     ▒
         - 4,90% __do_page_fault                                                                                             ▒
            - 4,64% handle_mm_fault                                                                                          ▒
               - 4,52% __handle_mm_fault                                                                                     ▒
                  - 2,94% alloc_pages_vma                                                                                    ▒
                     - 2,91% __alloc_pages_nodemask                                                                          ▒
                        - get_page_from_freelist                                                                             ▒
                             2,27% clear_page_erms                                                                           ▒
                    0,57% _raw_spin_lock                                                                                     ▒
        2,93% __memmove_avx_unaligned_erms                                                                                   ▒
        0,70% error_entry 

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Jan 22, 2018

Would a solution for this problem speed up the sync process with the app?

Would a solution for this problem speed up the sync process with the app?

This comment has been minimized.

Show comment Hide comment
@DracoLlasa

DracoLlasa Jan 24, 2018

@scottnonnenberg thanks, i was checking it out yesterday and think i sorted out what to select to send.. was going to do it tonight after work but got the new version and i can tell its much quicker to load.. i will keep an eye on it and not do anything to intentionally clean up old messaged and stuff like i did last time it got real bad.

@scottnonnenberg thanks, i was checking it out yesterday and think i sorted out what to select to send.. was going to do it tonight after work but got the new version and i can tell its much quicker to load.. i will keep an eye on it and not do anything to intentionally clean up old messaged and stuff like i did last time it got real bad.

This comment has been minimized.

Show comment Hide comment
@janLo

janLo Jan 24, 2018

Not as slow as it used to be but still slower than any version of the chrome app: https://gist.github.com/e5b771b160d813dfb91a7bcbcdce17e7

janLo commented Jan 24, 2018

Not as slow as it used to be but still slower than any version of the chrome app: https://gist.github.com/e5b771b160d813dfb91a7bcbcdce17e7

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 24, 2018

Member

@janLo Apologies if I've mentioned this before, but one reason the Chrome App was 'faster' is because it was always running - it was running anytime you had your Chrome browser running. This meant that it had to download large amounts of messages less often, and when it did, it started downloading before you even opened the window.

Member

scottnonnenberg commented Jan 24, 2018

@janLo Apologies if I've mentioned this before, but one reason the Chrome App was 'faster' is because it was always running - it was running anytime you had your Chrome browser running. This meant that it had to download large amounts of messages less often, and when it did, it started downloading before you even opened the window.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 27, 2018

I did an iostat during a slow load:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              6,78    4,32    250,61    144,32     0,36     5,80   5,02  57,34    1,24   61,79   0,28    36,96    33,45   4,18   4,63
sda             99,00    0,00   7044,00      0,00     0,00     0,00   0,00   0,00    0,36    0,00   0,04    71,15     0,00   0,28   2,80
sda             89,00   93,00  10444,00    620,00     0,00    11,00   0,00  10,58    0,40    0,00   0,04   117,35     6,67   0,20   3,60
sda            286,00    0,00  35160,00      0,00     0,00     0,00   0,00   0,00    0,43    0,00   0,12   122,94     0,00   0,39  11,20
sda             98,00    0,00  11864,00      0,00     0,00     0,00   0,00   0,00    0,45    0,00   0,04   121,06     0,00   0,33   3,20
sda            104,00    0,00  13092,00      0,00     0,00     0,00   0,00   0,00    0,35    0,00   0,04   125,88     0,00   0,31   3,20
sda            511,00    3,00  64692,00     76,00     0,00    17,00   0,00  85,00    0,47    8,00   0,26   126,60    25,33   0,45  23,20
sda            134,65    0,00  16760,40      0,00     0,00     0,00   0,00   0,00    0,50    0,00   0,06   124,47     0,00   0,41   5,54
sda             90,00    0,00  10832,00      0,00     0,00     0,00   0,00   0,00    0,40    0,00   0,04   120,36     0,00   0,36   3,20
sda             87,00    1,00  10348,00   1076,00     0,00     0,00   0,00   0,00    0,83    4,00   0,08   118,94  1076,00   0,68   6,00

How much stuff are you loading from the DB on startup, anyway? Either IndexedDB has a huge startup overhead, or your schema is designed so that you need to do too many queries.

sesse commented Jan 27, 2018

I did an iostat during a slow load:

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              6,78    4,32    250,61    144,32     0,36     5,80   5,02  57,34    1,24   61,79   0,28    36,96    33,45   4,18   4,63
sda             99,00    0,00   7044,00      0,00     0,00     0,00   0,00   0,00    0,36    0,00   0,04    71,15     0,00   0,28   2,80
sda             89,00   93,00  10444,00    620,00     0,00    11,00   0,00  10,58    0,40    0,00   0,04   117,35     6,67   0,20   3,60
sda            286,00    0,00  35160,00      0,00     0,00     0,00   0,00   0,00    0,43    0,00   0,12   122,94     0,00   0,39  11,20
sda             98,00    0,00  11864,00      0,00     0,00     0,00   0,00   0,00    0,45    0,00   0,04   121,06     0,00   0,33   3,20
sda            104,00    0,00  13092,00      0,00     0,00     0,00   0,00   0,00    0,35    0,00   0,04   125,88     0,00   0,31   3,20
sda            511,00    3,00  64692,00     76,00     0,00    17,00   0,00  85,00    0,47    8,00   0,26   126,60    25,33   0,45  23,20
sda            134,65    0,00  16760,40      0,00     0,00     0,00   0,00   0,00    0,50    0,00   0,06   124,47     0,00   0,41   5,54
sda             90,00    0,00  10832,00      0,00     0,00     0,00   0,00   0,00    0,40    0,00   0,04   120,36     0,00   0,36   3,20
sda             87,00    1,00  10348,00   1076,00     0,00     0,00   0,00   0,00    0,83    4,00   0,08   118,94  1076,00   0,68   6,00

How much stuff are you loading from the DB on startup, anyway? Either IndexedDB has a huge startup overhead, or your schema is designed so that you need to do too many queries.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 29, 2018

Member

@sesse I think the large amount of data saved and loaded has to do with our caching mechanism for not-yet-processed messages. Before we attempt to decrypt a message, we put it in the database. Then when we've successfully decrypted it, we update it in the database. And then finally when we've processed it fully, we create the real message in the database and delete the cached data.

This is a lot of creates, updates and deletes, especially when we're processing hundreds of messages because the queue has filled up since we were last running.

But it does ensure that we're resilient in the face of crashes, or even the user closing the app while we're loading messages.

Member

scottnonnenberg commented Jan 29, 2018

@sesse I think the large amount of data saved and loaded has to do with our caching mechanism for not-yet-processed messages. Before we attempt to decrypt a message, we put it in the database. Then when we've successfully decrypted it, we update it in the database. And then finally when we've processed it fully, we create the real message in the database and delete the cached data.

This is a lot of creates, updates and deletes, especially when we're processing hundreds of messages because the queue has filled up since we were last running.

But it does ensure that we're resilient in the face of crashes, or even the user closing the app while we're loading messages.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

FWIW, I doubt I ever start up Signal and there's hundreds of messages in the queue, but I could be mistaken.

sesse commented Jan 29, 2018

FWIW, I doubt I ever start up Signal and there's hundreds of messages in the queue, but I could be mistaken.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

Also, hundreds of messages or not; you are reading hundreds of megabytes of data on startup, as far as I can tell. It can't all be about a few hundred kilobyte-large messages being stored two or three times :-)

sesse commented Jan 29, 2018

Also, hundreds of messages or not; you are reading hundreds of megabytes of data on startup, as far as I can tell. It can't all be about a few hundred kilobyte-large messages being stored two or three times :-)

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Jan 29, 2018

@sesse Maybe there are Metadata for every single Message + serialized Photos and Videos in your Chats?

MartinX3 commented Jan 29, 2018

@sesse Maybe there are Metadata for every single Message + serialized Photos and Videos in your Chats?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

There are, but still, why should they need to be loaded on startup? I don't suppose Signal needs to open every single old message I have just to boot?

sesse commented Jan 29, 2018

There are, but still, why should they need to be loaded on startup? I don't suppose Signal needs to open every single old message I have just to boot?

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 29, 2018

Member

@sesse We do a few index-based queries on startup, like checking for expired disappearing messages, or looking for any cached messages which didn't finish processing. When we do database schema updates, we sometimes do loop through every single object from a certain store, but that's uncommon. When we process a message, we first determine whether it's a duplicate by checking our database, again using an index-based query.

As an expert in IndexedDB, perhaps you can talk about how index scans are implemented? Surely we don't read that much data from disk when we do a index-based query?

Member

scottnonnenberg commented Jan 29, 2018

@sesse We do a few index-based queries on startup, like checking for expired disappearing messages, or looking for any cached messages which didn't finish processing. When we do database schema updates, we sometimes do loop through every single object from a certain store, but that's uncommon. When we process a message, we first determine whether it's a duplicate by checking our database, again using an index-based query.

As an expert in IndexedDB, perhaps you can talk about how index scans are implemented? Surely we don't read that much data from disk when we do a index-based query?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

I'm afraid I'm nowhere near an IndexedDB expert. :-) I was involved in open-sourcing parts of LevelDB (in particular, the parts relating to compression), and I've also used its internal predecessor (SSTable) a fair amount. But IndexedDB is on top of that again, and that's where any secondary indexing sits—and I've never touched it.

sesse commented Jan 29, 2018

I'm afraid I'm nowhere near an IndexedDB expert. :-) I was involved in open-sourcing parts of LevelDB (in particular, the parts relating to compression), and I've also used its internal predecessor (SSTable) a fair amount. But IndexedDB is on top of that again, and that's where any secondary indexing sits—and I've never touched it.

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 29, 2018

Member

@sesse Do you have the log to go along with that 'reading hundreds of megabytes' slow load?

Member

scottnonnenberg commented Jan 29, 2018

@sesse Do you have the log to go along with that 'reading hundreds of megabytes' slow load?

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

I think this is the one: https://gist.github.com/anonymous/d54f911f5e5083bf24eca6678b172ede . It looks as if it may have scrolled out, though :-/

In general, it appears LevelDB is performing very poorly on my system. There shouldn't be hundreds of these really small files, e.g. this from the LevelDB log (LOG in the directory):

2018/01/28-19:52:10.889 11030 Compacting 1@2 + 29@3 files
2018/01/28-19:52:10.917 11030 Generated table #18250@2: 8 keys, 711 bytes
2018/01/28-19:52:10.943 11030 Generated table #18251@2: 3 keys, 1092 bytes
2018/01/28-19:52:10.969 11030 Generated table #18252@2: 2 keys, 1096 bytes
2018/01/28-19:52:11.000 11030 Generated table #18253@2: 3 keys, 830423 bytes
2018/01/28-19:52:11.027 11030 Generated table #18254@2: 4 keys, 1627 bytes
2018/01/28-19:52:11.055 11030 Generated table #18255@2: 7 keys, 2812 bytes
2018/01/28-19:52:11.101 11030 Generated table #18256@2: 1 keys, 3740570 bytes
2018/01/28-19:52:11.127 11030 Generated table #18257@2: 5 keys, 1815 bytes
2018/01/28-19:52:11.154 11030 Generated table #18258@2: 2 keys, 1087 bytes
2018/01/28-19:52:11.180 11030 Generated table #18259@2: 2 keys, 1158 bytes
2018/01/28-19:52:11.206 11030 Generated table #18260@2: 2 keys, 949 bytes
2018/01/28-19:52:11.232 11030 Generated table #18261@2: 2 keys, 1102 bytes
2018/01/28-19:52:11.258 11030 Generated table #18262@2: 1 keys, 841 bytes
2018/01/28-19:52:11.285 11030 Generated table #18263@2: 2 keys, 916 bytes
2018/01/28-19:52:11.311 11030 Generated table #18264@2: 2 keys, 1308 bytes
2018/01/28-19:52:11.337 11030 Generated table #18265@2: 3 keys, 1331 bytes
2018/01/28-19:52:11.363 11030 Generated table #18266@2: 1 keys, 841 bytes
2018/01/28-19:52:11.390 11030 Generated table #18267@2: 1 keys, 657 bytes
2018/01/28-19:52:11.419 11030 Generated table #18268@2: 1 keys, 723 bytes
2018/01/28-19:52:11.459 11030 Generated table #18269@2: 2 keys, 2863460 bytes
2018/01/28-19:52:11.486 11030 Generated table #18270@2: 1 keys, 733 bytes
2018/01/28-19:52:11.512 11030 Generated table #18271@2: 2 keys, 1119 bytes
2018/01/28-19:52:11.540 11030 Generated table #18272@2: 1 keys, 733 bytes
2018/01/28-19:52:11.566 11030 Generated table #18273@2: 1 keys, 713 bytes
2018/01/28-19:52:11.592 11030 Generated table #18274@2: 1 keys, 783 bytes
2018/01/28-19:52:11.618 11030 Generated table #18275@2: 4 keys, 1906 bytes
2018/01/28-19:52:11.645 11030 Generated table #18276@2: 3 keys, 1314 bytes
2018/01/28-19:52:11.672 11030 Generated table #18277@2: 3 keys, 1655 bytes
2018/01/28-19:52:11.698 11030 Generated table #18278@2: 2 keys, 1339 bytes
2018/01/28-19:52:11.739 11030 Generated table #18279@2: 1 keys, 2804578 bytes
2018/01/28-19:52:11.739 11030 Compacted 1@2 + 29@3 files => 10269392 bytes
2018/01/28-19:52:11.766 11030 compacted to: files[ 0 0 24 261 400 0 0 ]

All these level-2 shards at a few kilobytes? That looks strange to me.

sesse commented Jan 29, 2018

I think this is the one: https://gist.github.com/anonymous/d54f911f5e5083bf24eca6678b172ede . It looks as if it may have scrolled out, though :-/

In general, it appears LevelDB is performing very poorly on my system. There shouldn't be hundreds of these really small files, e.g. this from the LevelDB log (LOG in the directory):

2018/01/28-19:52:10.889 11030 Compacting 1@2 + 29@3 files
2018/01/28-19:52:10.917 11030 Generated table #18250@2: 8 keys, 711 bytes
2018/01/28-19:52:10.943 11030 Generated table #18251@2: 3 keys, 1092 bytes
2018/01/28-19:52:10.969 11030 Generated table #18252@2: 2 keys, 1096 bytes
2018/01/28-19:52:11.000 11030 Generated table #18253@2: 3 keys, 830423 bytes
2018/01/28-19:52:11.027 11030 Generated table #18254@2: 4 keys, 1627 bytes
2018/01/28-19:52:11.055 11030 Generated table #18255@2: 7 keys, 2812 bytes
2018/01/28-19:52:11.101 11030 Generated table #18256@2: 1 keys, 3740570 bytes
2018/01/28-19:52:11.127 11030 Generated table #18257@2: 5 keys, 1815 bytes
2018/01/28-19:52:11.154 11030 Generated table #18258@2: 2 keys, 1087 bytes
2018/01/28-19:52:11.180 11030 Generated table #18259@2: 2 keys, 1158 bytes
2018/01/28-19:52:11.206 11030 Generated table #18260@2: 2 keys, 949 bytes
2018/01/28-19:52:11.232 11030 Generated table #18261@2: 2 keys, 1102 bytes
2018/01/28-19:52:11.258 11030 Generated table #18262@2: 1 keys, 841 bytes
2018/01/28-19:52:11.285 11030 Generated table #18263@2: 2 keys, 916 bytes
2018/01/28-19:52:11.311 11030 Generated table #18264@2: 2 keys, 1308 bytes
2018/01/28-19:52:11.337 11030 Generated table #18265@2: 3 keys, 1331 bytes
2018/01/28-19:52:11.363 11030 Generated table #18266@2: 1 keys, 841 bytes
2018/01/28-19:52:11.390 11030 Generated table #18267@2: 1 keys, 657 bytes
2018/01/28-19:52:11.419 11030 Generated table #18268@2: 1 keys, 723 bytes
2018/01/28-19:52:11.459 11030 Generated table #18269@2: 2 keys, 2863460 bytes
2018/01/28-19:52:11.486 11030 Generated table #18270@2: 1 keys, 733 bytes
2018/01/28-19:52:11.512 11030 Generated table #18271@2: 2 keys, 1119 bytes
2018/01/28-19:52:11.540 11030 Generated table #18272@2: 1 keys, 733 bytes
2018/01/28-19:52:11.566 11030 Generated table #18273@2: 1 keys, 713 bytes
2018/01/28-19:52:11.592 11030 Generated table #18274@2: 1 keys, 783 bytes
2018/01/28-19:52:11.618 11030 Generated table #18275@2: 4 keys, 1906 bytes
2018/01/28-19:52:11.645 11030 Generated table #18276@2: 3 keys, 1314 bytes
2018/01/28-19:52:11.672 11030 Generated table #18277@2: 3 keys, 1655 bytes
2018/01/28-19:52:11.698 11030 Generated table #18278@2: 2 keys, 1339 bytes
2018/01/28-19:52:11.739 11030 Generated table #18279@2: 1 keys, 2804578 bytes
2018/01/28-19:52:11.739 11030 Compacted 1@2 + 29@3 files => 10269392 bytes
2018/01/28-19:52:11.766 11030 compacted to: files[ 0 0 24 261 400 0 0 ]

All these level-2 shards at a few kilobytes? That looks strange to me.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 29, 2018

As an expert in IndexedDB, perhaps you can talk about how index scans are implemented? Surely we don't read that much data from disk when we do a index-based query?

I look a look in the IndexedDB documentation, and it seems simple enough: For each entry in the index, you have a LevelDB entry. That entry tells you the primary key, where the real record is.

So if you're doing a scan along an index, you'll have some nice and regular reads for the index itself, and then a bunch of really random lookups throughout the actual data. So they are certainly more costly than reading over the primary key (not surprising), although hundreds of megabytes sounds maybe a bit excessive. How many records are you expecting to read in the typical case?

sesse commented Jan 29, 2018

As an expert in IndexedDB, perhaps you can talk about how index scans are implemented? Surely we don't read that much data from disk when we do a index-based query?

I look a look in the IndexedDB documentation, and it seems simple enough: For each entry in the index, you have a LevelDB entry. That entry tells you the primary key, where the real record is.

So if you're doing a scan along an index, you'll have some nice and regular reads for the index itself, and then a bunch of really random lookups throughout the actual data. So they are certainly more costly than reading over the primary key (not surprising), although hundreds of megabytes sounds maybe a bit excessive. How many records are you expecting to read in the typical case?

This comment has been minimized.

Show comment Hide comment
@scottnonnenberg

scottnonnenberg Jan 30, 2018

Member

@sesse Clearly we're not in the right ballpark with all this guessing. We don't do massive reads. What I suspect is that there are some sort of cleanup/compaction operations going on because we've fragmented the database, or added enough, or deleted enough.

Member

scottnonnenberg commented Jan 30, 2018

@sesse Clearly we're not in the right ballpark with all this guessing. We don't do massive reads. What I suspect is that there are some sort of cleanup/compaction operations going on because we've fragmented the database, or added enough, or deleted enough.

This comment has been minimized.

Show comment Hide comment
@sesse

sesse Jan 30, 2018

I don't see there being done massive compactions; if so, there would be write activity, not just read activity.

If you can reproduce this, generally have a look at the LOG file LevelDB leaves in the directory. It mentions all new shards, compactions, etc. as they go (but not reads). In my case, it looks somehow like the compaction scheme has gone all awry (way too many level-2 and level-3 files, and any level-0 file that's being written is pretty much immediately causing a compaction), and you'd probably need some real LevelDB expertise to figure out why.

sesse commented Jan 30, 2018

I don't see there being done massive compactions; if so, there would be write activity, not just read activity.

If you can reproduce this, generally have a look at the LOG file LevelDB leaves in the directory. It mentions all new shards, compactions, etc. as they go (but not reads). In my case, it looks somehow like the compaction scheme has gone all awry (way too many level-2 and level-3 files, and any level-0 file that's being written is pretty much immediately causing a compaction), and you'd probably need some real LevelDB expertise to figure out why.

This comment has been minimized.

Show comment Hide comment
@MartinX3

MartinX3 Feb 9, 2018

https://gist.github.com/1033d5b648075d17d830ec7224ec0060
Linux Ubuntu Mate x64 17.10

Here is a new log.
I restarted my PC and I'm hearing my HDD.
There are many HDD-Accesses-Tasks slowing down the whole system, while this APP stats up.

MartinX3 commented Feb 9, 2018

https://gist.github.com/1033d5b648075d17d830ec7224ec0060
Linux Ubuntu Mate x64 17.10

Here is a new log.
I restarted my PC and I'm hearing my HDD.
There are many HDD-Accesses-Tasks slowing down the whole system, while this APP stats up.

This comment has been minimized.

Show comment Hide comment
@Terenceskill

Terenceskill Feb 9, 2018

Same problem here.
Using Linux Mint 18.3 x64 Mate.
All 4 cpu cores are at ~100%.
The HDD is under heavy load.

Here is my log:
https://gist.github.com/anonymous/315bc653b9243b06aeefa9920ba26b8f

Same problem here.
Using Linux Mint 18.3 x64 Mate.
All 4 cpu cores are at ~100%.
The HDD is under heavy load.

Here is my log:
https://gist.github.com/anonymous/315bc653b9243b06aeefa9920ba26b8f

This comment has been minimized.

Show comment Hide comment
@konklone

konklone Feb 14, 2018

I'll chip in a set of logs, on a Dell XPS 13 running Debian Buster (testing):
https://gist.github.com/konklone/e67d649c95b32d1c1f9709e454c27dcd

I didn't record the CPU load or disk load, but it definitely felt like a 100% saturated set of CPUs, for at least 5 minutes, maybe 10. My .config/Signal directory reads 55M when I run du -sh . in it.

I've observed this behavior before several times, a long and consuming CPU experience while the message count goes up (very slowly) by 10 at a time. It doesn't happen on every single app startup, but it's fairly frequent -- I imagine it relates to the unpredictable delays and densities of conversations I have in Signal between application opens.

I'll chip in a set of logs, on a Dell XPS 13 running Debian Buster (testing):
https://gist.github.com/konklone/e67d649c95b32d1c1f9709e454c27dcd

I didn't record the CPU load or disk load, but it definitely felt like a 100% saturated set of CPUs, for at least 5 minutes, maybe 10. My .config/Signal directory reads 55M when I run du -sh . in it.

I've observed this behavior before several times, a long and consuming CPU experience while the message count goes up (very slowly) by 10 at a time. It doesn't happen on every single app startup, but it's fairly frequent -- I imagine it relates to the unpredictable delays and densities of conversations I have in Signal between application opens.

@scottnonnenberg scottnonnenberg added this to Backlog in Next steps Mar 1, 2018

@gasi gasi referenced this issue Mar 2, 2018

Closed

[Windows 7 Pro]: Slow Startup #2092

0 of 1 task complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment