Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync still very slow on Linux #1023

Closed
bablokb opened this issue Dec 1, 2018 · 18 comments
Closed

Sync still very slow on Linux #1023

bablokb opened this issue Dec 1, 2018 · 18 comments
Labels
backlog We'll get to it... eventually... bug It's a bug linux

Comments

@bablokb
Copy link

bablokb commented Dec 1, 2018

Operating system

  • Linux

Application

  • Desktop

This is a sort of reopen of #312 which IMHO has a wrong title and is misleading. Sync still is very slow even after the relase that "maybe" fixed #312.

My investigations don't show any influence of GUI-activities on the sync process itself. The observations are purely anecdotal.

What I found out so far is:

  • all items that are stored in the database sync fast
  • resources sync very slow

The logic during sync needs some improvement, since it seems to work like this:

loop:
  open connection to webdav
  sync one item
  close connection to webdav
  do nothing for a long time

I still have to verify the "do nothing" part (will do some strace when I find time), but whatever Joplin does during this time dos not use any measurable OS-resources.
So one thing that really has to be done is to move the open/close out of the loop. This is not responsible for the slow sync, but it is a sensible optimization anyway.

I also tried to understand where all this is going on in the code but no success yet. Maybe someone can give me some pointers to the relevant source-files then I would further investigate the issue.

Bernhard

@tsumare
Copy link

tsumare commented Dec 10, 2018

I have been experiencing similar issues for some time now (since around v1.0.10 I would guess, but don't quote me on that). Interestingly, I have observed that when syncing to a webdav server on my local network, I have no problems. When syncing my Android phones, I have no problems. When syncing to the same webdav server from elsewhere on the internet, it can take many minutes to sync one or two (or even zero) text notes. I never had enough details to report, as there is nothing interesting in the server logs, but I can at least provide that much.

In dav.log, you can see a sync which creates one 1 line (+title) text note taking 2 minutes 3 seconds due to a very long idle period.

This is from Joplin 1.0.117, against a wsgidav server behind an nginx reverse proxy (not using nginx dav). According to my memory (which may be faulty), a similar sync from a system on the same LAN as that server completes in 1-2 seconds. Using the same joplin profile, workstation and server with the command line Joplin client, the sync is also completed in 1-2 seconds, so this appears to be a desktop-exclusive issue.

@scuba-tech
Copy link

scuba-tech commented Dec 16, 2018

Same problem on my end. VERY slow sync. During sync, the "Updated remote items: 1." stays on my screen for 5~20 minutes. This is manifesting even when only changing a single word in a single text note. (All encrypted, FWIW).

Am on a gigabit connection. All other applications and web services responsive. Dropbox client responsive. Mobile app (phone and tablet) synced with same account are responsive and take under 3 seconds to sync the same changes.

Ubuntu 18.10; Joplin 1.0.117 (prod, linux)

I hope this report helps!

EDIT: just saw v.118 Beta released, popped-up right after posting this reply. Will install and report back if this is fixed on my end. :)

@laurent22
Copy link
Owner

I think this is a duplicate of another issue where the Linux GUI doesn't respond during sync. There's currently no fix for this. Maybe someone with knowledge of Linux desktop could figure it out.

@bablokb
Copy link
Author

bablokb commented Dec 16, 2018

As I said in the beginning, it is not a gui-update problem - sync is slow regardless how often you update the gui. And some primitive stracing just shows my assumption: joplin is just doing nothing, it is not busy with anything. It seems more of a problem of internal deadlock (waiting for some resources), and once in a while the sync-task will have access to the resource, sync an item, and then wait again.

I have good knowledge in linux programming (also in JavaScript), so I would be willing to invest some time to track down the problem. But I need some basic understanding of the code (the big picture, i.e. architecture, structure and flow of control) to get started. Of course I could also re-engineer the code, but that would be a waste of time assuming that you already know about that.

@laurent22
Copy link
Owner

There's no deadlock problem on other platforms so I don't think it's an issue with the algorithm itself. My assumption is that there's something in some Linux distros that makes the app sleep or give it very low priority, maybe a power management feature, and that slows down or stops the sync process. Or maybe something that slows down the web requests?

In terms of architecture, the synchronisation happens in the same thread as everything else. The algorithm basically is comparing what's on the WebDAV service, and what's on the SQLite database and create/delete/update as needed. The whole thing is JS code running on V8 engine. I can provide more info if needed.

It seems more of a problem of internal deadlock (waiting for some resources), and once in a while the sync-task will have access to the resource, sync an item, and then wait again.

When it seems to be waiting like this, does it unlock if you do something in the UI? For example, open the config screen, change something and save?

@bablokb
Copy link
Author

bablokb commented Dec 29, 2018

My assumption is that there's something in some Linux distros that makes the app sleep or give it very low priority, maybe a power management feature, and that slows down or stops the sync process

Only a very very misconfigured system would do that. Even embedded systems woud "race to idle", i.e. get work done and then go to sleep.

In terms of architecture, the synchronisation happens in the same thread as everything else.

I don't believe that. Joplin (i.e. the underlying chrome) uses many threads with a lot of interprocess communication. The UI is always responsive, so the actual sync cannot happen in the UI-thread.

When it seems to be waiting like this, does it unlock if you do something in the UI?

When I look at my Apache-logs, this seems indeed the case. When I don't do anything in the UI, Joplin sends an item to the webdav every 120 seconds. When I switch between notebooks, it triggers a send at once. So Joplin does not need an UI update to synchronize an item, but it surely speeds it up.

Since the Joplin-code is the same on other platforms (?), this might be a problem of the chrome-implementation of Linux.

@deltafunction
Copy link

I confirm this behavior. Looking forward to a fix.

@e-ihrke
Copy link

e-ihrke commented Oct 8, 2019

For documentation purposes:
I’ve had tested different things, before creating the PR above, to figure out, what is really causing this problem.

First test: Disable scheduled GUI events, which triggered very often.

While debugging the application I’ve noticed, that every 50ms a GUI event is triggered, which looked, if the size of the application window has changed. The causing interval was quiet easy to find and could also be disabled.

After starting the synchronization again, I’ve tried to find what the cpu is doing while sending the actual request to the server. But there was nothing. The application was waiting for something, but it was not an http response from the webdav server. And the server did not receive any bytes either.

Second test: Extracting the webdav sync code and try to send a backup with it to the webdav server.

I’ve copied the important functions (like creating the options/headers for the node-fetch call) for creating new files on the webdav server from your code into a script, which I’ve placed inside of the project. With this I also wanted to make sure, that I won’t get any different dependencies by accident for the testscript.

Result: The synchronization was fast and had no problems. (I’ve tested this multiple times.)

Third test: Change the GUI to call the testscript instead of the actual synchronization code.

This should show, if the problem is within the webdav synchronization code (which I thought was unlikely because it works on all other platforms and should be independent of the platform), or if there is a global (configuration) problem.

Result: Calling the testscript from within the GUI, resulted in the same extremely slow synchronization. So it looked like a global (configuration) problem.

Fourth test: Looking into the network calls to see, when the application does what and when it is waiting.

This should show, if the application really waits for the webdav server or another network resource, or if the CPU just does not trigger the call, doing something different.

Here I found something:

On a Linux host I’ve found that the application tries to establish the connection to the webdav server. The server responds correctly with a “SYN, ACK” to which the application responds with an “ACK” package. Shortly after that I’ve found that the server did not receive the last “ACK” and resends the “SYN, ACK” package, after around 20 seconds. The application then sends another “ACK” package, which only rarely fixes this connection. More often the webdav server will reset and close the connection. Somehow node-fetch and http or something deeper down does not get this. Then the application waits for the configured timeout (from shim.fetchWithRetry).

On a windows host I could not find this “lost” packages.

After searching for similar issues I found some threads on different sites but no suitable, clean solution. (Other than updating to a newer nodejs version, but if this would really help is quiet unsure.)

Because of this I thought to either decrease the connection timeout, not knowing which other parts of the application would change or if those would actually need the 2 minute timeout. But even then the sync would still be slower than on the other platforms.
So the smallest change (read: lowest impact (at least at that time, not knowing of the potential mobile app sleeping issue)) to do, would be to keep an already established connection alive. And also doing this only for the webdav sync part. From experience I also had expected a performance increase for the other platforms, which could be verified (on windows).

After implementing the keep alive, I’ve tested it with multiple full-syncs of my notes and also with syncing updates.

It is not really satisfying to fall back to this workaround though.

@e-ihrke
Copy link

e-ihrke commented Oct 9, 2019

Fifth test: Using the command line version
Since the testscript from the second test was fast on command line but not when called from the GUI, I wanted to make sure that it is really not GUI related.

Result: The synchronization was fast.

@laurent22 Is there a way to use both the GUI and the command line client with the same working directory? Would they interfere / break each other if done so?
Maybe syncing from cli would be a possible workaround for the linux users until this is fixed. Though the sync interval should probably be disabled then.

@laurent22
Copy link
Owner

@laurent22 Is there a way to use both the GUI and the command line client with the same working directory? Would they interfere / break each other if done so?

Yes it's likely something will break. For example if the CLI app syncs and the desktop is running at the same time, the desktop app won't know it needs to reload the data from the database, so it might overwrite items that were synced. There's probably a ton of other edge cases I can't think about because it's not meant to be used that way.

@ghost
Copy link

ghost commented Oct 22, 2019

Don't know, if it's helpful, but I have the same problem with sync on three different Kubuntu 18.10/19.04 machines. While sync'ing with my NextCloud with my Android device needs only a few seconds (feels like less than 2 seconds in local WLAN, and 6 seconds from the outside world over the internet), it needs minutes with the Linux client.

It's no networking problem. One Linux client is connected via 1 Gbps ethernet to the NextCloud / Apache-webserver, the other Linux client is a Notebook with 1.x Gbps WLAN, the third is a Linux guest VM running on the same host where NextCloud / Apache-webserver runs.

While the minutes of sync'ing I can use the Joplin GUI without reduced speed, fully responsive.

When sync'ing from Android device, the Apache logfile says:

[22/Oct/2019:13:07:38 +0200] "MKCOL /remote.php/webdav/Joplin/.sync/ HTTP/1.1" 405 5429 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:38 +0200] "MKCOL /remote.php/webdav/Joplin/.lock/ HTTP/1.1" 405 1675 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:38 +0200] "MKCOL /remote.php/webdav/Joplin/.resource/ HTTP/1.1" 405 1679 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:39 +0200] "PROPFIND /remote.php/webdav/Joplin/.lock HTTP/1.1" 207 1826 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:39 +0200] "GET /remote.php/webdav/Joplin/.sync/version.txt HTTP/1.1" 200 1540 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:39 +0200] "PUT /remote.php/webdav/Joplin/.sync/version.txt HTTP/1.1" 204 1394 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:40 +0200] "PROPFIND /remote.php/webdav/Joplin/3f03a88c27494a50a5da7664b9399ba0.md HTTP/1.1" 404 1651 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:40 +0200] "PUT /remote.php/webdav/Joplin/3f03a88c27494a50a5da7664b9399ba0.md HTTP/1.1" 201 1448 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:41 +0200] "PROPFIND /remote.php/webdav/Joplin/059beeef09d0416ea06b48e9dc5d6bf6.md HTTP/1.1" 404 1655 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:41 +0200] "PUT /remote.php/webdav/Joplin/059beeef09d0416ea06b48e9dc5d6bf6.md HTTP/1.1" 201 1454 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:42 +0200] "PROPFIND /remote.php/webdav/Joplin/7e01db3deda74d5b9e3c91dc3b639343.md HTTP/1.1" 404 1651 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:42 +0200] "PUT /remote.php/webdav/Joplin/7e01db3deda74d5b9e3c91dc3b639343.md HTTP/1.1" 201 1442 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:43 +0200] "PROPFIND /remote.php/webdav/Joplin/66c41a16086a4734aa75effbe2be61b8.md HTTP/1.1" 404 1649 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:43 +0200] "PUT /remote.php/webdav/Joplin/66c41a16086a4734aa75effbe2be61b8.md HTTP/1.1" 201 1444 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:43 +0200] "PROPFIND /remote.php/webdav/Joplin/ HTTP/1.1" 207 8507 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:44 +0200] "GET /remote.php/webdav/Joplin/3f03a88c27494a50a5da7664b9399ba0.md HTTP/1.1" 200 5977 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:44 +0200] "GET /remote.php/webdav/Joplin/059beeef09d0416ea06b48e9dc5d6bf6.md HTTP/1.1" 200 5538 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:44 +0200] "GET /remote.php/webdav/Joplin/7e01db3deda74d5b9e3c91dc3b639343.md HTTP/1.1" 200 4487 "-" "okhttp/3.12.1"
[22/Oct/2019:13:07:44 +0200] "GET /remote.php/webdav/Joplin/66c41a16086a4734aa75effbe2be61b8.md HTTP/1.1" 200 9586 "-" "okhttp/3.12.1"

Finished in 6 seconds. Same thing to the same NextCloud-URL from the Linux client:

[22/Oct/2019:13:10:40 +0200] "MKCOL /remote.php/webdav/Joplin/.sync/ HTTP/1.1" 405 4934 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:11:00 +0200] "MKCOL /remote.php/webdav/Joplin/.lock/ HTTP/1.1" 405 1795 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:12:40 +0200] "MKCOL /remote.php/webdav/Joplin/.resource/ HTTP/1.1" 405 1791 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:13:00 +0200] "PROPFIND /remote.php/webdav/Joplin/.lock HTTP/1.1" 207 1946 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:14:40 +0200] "GET /remote.php/webdav/Joplin/.sync/version.txt HTTP/1.1" 200 1658 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "PUT /remote.php/webdav/Joplin/.sync/version.txt HTTP/1.1" 204 1510 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "PROPFIND /remote.php/webdav/Joplin/ HTTP/1.1" 207 8632 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "GET /remote.php/webdav/Joplin/3f03a88c27494a50a5da7664b9399ba0.md HTTP/1.1" 200 6100 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "GET /remote.php/webdav/Joplin/66c41a16086a4734aa75effbe2be61b8.md HTTP/1.1" 200 5955 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "GET /remote.php/webdav/Joplin/059beeef09d0416ea06b48e9dc5d6bf6.md HTTP/1.1" 200 5116 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"
[22/Oct/2019:13:15:01 +0200] "GET /remote.php/webdav/Joplin/7e01db3deda74d5b9e3c91dc3b639343.md HTTP/1.1" 200 4077 "-" "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"

Finished in 4m 21s... same problem on each Linux machine.

@ghost
Copy link

ghost commented Oct 24, 2019

Next thing...

  • Syncing with mobile version on Android device: < 7 seconds
  • Syncing with portable Windows version on Wine on Kubuntu 19.04: < 7 seconds
  • Syncing with Linux version on any Linux machine with Kubuntu 18.10/19.04: > 4 minutes

@stale
Copy link

stale bot commented Jan 22, 2020

Hey there, it looks like there has been no activity on this issue recently. Has the issue been fixed, or does it still require the community's attention? This issue may be closed if no further activity occurs. You may comment on the issue and I will leave it open. Thank you for your contributions.

@stale stale bot added the stale An issue that hasn't been active for a while... label Jan 22, 2020
@dcervenkov
Copy link

This is still a problem, dear stale bot.

@stale stale bot removed the stale An issue that hasn't been active for a while... label Jan 22, 2020
@laurent22
Copy link
Owner

Some ideas there for a solution: #1931

@laurent22 laurent22 added bug It's a bug linux backlog We'll get to it... eventually... labels Jan 22, 2020
@carlbordum
Copy link
Contributor

I am experiencing this too. It appears to me as if the Linux Desktop client hangs while syncing. It uses no network while syncing for long periods of time until you click somewhere in the GUI.

@bedwardly-down
Copy link
Contributor

Lately, this bug has been cropping up across all three of my platforms: Linux, iOS, and Android. The Webdav provider I'm using is pCloud.

@Babber
Copy link

Babber commented Feb 20, 2020

I have the same issue on Linux Mint 19.3. Syncing is very slow, and on top of this, it completely ignores the setting for syncing frequency. Even though it is set to 24h, it appears to me that it is syncing every 5 min.

@lock lock bot locked and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backlog We'll get to it... eventually... bug It's a bug linux
Projects
None yet
Development

No branches or pull requests

10 participants