Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using a db transaction when scanning (external) files with sqlite causes database locked errors #11391

Closed
butonic opened this issue Oct 2, 2014 · 17 comments

Comments

@butonic
Copy link
Member

butonic commented Oct 2, 2014

I assume the database transaction used to commit updates of the file scanning process in batches causes an sqlite database to lock for extended periods of time, leading to the database locked error when another request comes in. With preview generation, background jobs, hooks, search lucene and maybe sync clients coming along that is very likely to happen. The chance that a lock is in place increases with the latency of file scans, meaning scanning a remote files_external storage, such as amazon s3 is highly likely to produce the db lock.

for reference, a little history:
16 Jan 2013 @icewind1991 adding a transaction to improve scan performance 6871a15

3 Nov 2013 @jancborchardt the original issue #5680
21 Aug 2014 @PVince81 assuming an open db cursor to be the issue #10566

8 Sep 2014 @icewind1991 making the transaction contain even more updates in master 644755d

30 Sep 2014 @butonic being able to reproduce the issue #11280 (comment)
1 Oct 2014 @jancborchardt confirming he was likely using files_external with sqlite #11280 (comment)

With 644755d has added a flag to the scanner that we could use to do without transactions on sqlite. AFAICT atomicity is not the reason for the transaction. Opinions @DeepDiver1975 @karlitschek @schiesbn @icewind1991

@craigpg
Copy link

craigpg commented Oct 2, 2014

@butonic, is this sqlite only?

@craigpg craigpg added this to the 2014-sprint-06-next milestone Oct 2, 2014
@butonic
Copy link
Member Author

butonic commented Oct 2, 2014

@craigpg yes, sqlite only

@icewind1991
Copy link
Contributor

Disabling transitions in the scanner would be a solution but it would cause a massive performance panelty

@karlitschek
Copy link
Contributor

@icewind1991 Why does the transaction improve the performance? I´m not sure I understand the reason by reading the code.

@butonic
Copy link
Member Author

butonic commented Oct 6, 2014

Without the transaction scanning 10000 files makes 10000 individual updates, auto committing them. Which in turn causes the affected indexes to be updated - which is slow on all DBMS, but especially on sqlite and lower end hardware. (more details on sqlite insert performance in http://stackoverflow.com/questions/1711631/how-do-i-improve-insert-per-second-performance-of-sqlite)

@karlitschek
Copy link
Contributor

It would be great to measure this. In my experience smaller atomic updates are scale better because they can reordered and parallelized. This is especially interesting if you have a DB cluster.
And it prevents the scenario that the full DB is locked for a long time as we see here. This is a usual nightmare for DBAs. The linked article is interesting but I´m not sure this is doable with a DB that is in production and used by other processes too.
Sorry but I´m not convinced that having a transaction around a huge number of updates makes things better here. :-)

@icewind1991
Copy link
Contributor

When I first introduced the transactions in the scanner it gave a 3x-5x performance increase

@karlitschek
Copy link
Contributor

But we have to make sure that it´s not locking the DB for a long time and triggering other bugs

@craigpg craigpg modified the milestones: 2014-sprint-07-next, 2014-sprint-06-current Oct 12, 2014
@dragotin
Copy link
Contributor

Yes, transactions speed up sqlite tremendously. Actually, write and update operations should not happen without transactions because they are way too slow. Same experience on the client.

@guruz
Copy link
Contributor

guruz commented Oct 23, 2014

I have no idea how the file scanner or server works, just my 0.005 BTC cent:

Can you guys code it in a way that it uses a transaction but commits every 200msec or so?

However that still might be problematic because other concurrent requests (using https://sqlite.org/c3ref/busy_timeout.html or whatever you're using on the server, that's what we use in the client) might not suceed even when you're "between transactions".

Maybe the file scanner could be "sleeping" not-in-any-transaction or so for some time to give other concurrent requests a chance to do something with the DB? (not relevant with real DB, only for sqlite3)

@karlitschek
Copy link
Contributor

Can we do smaller transactions to not run into a lock timeout?
Speed is good but not working code is worse.

@cdamken
Copy link
Contributor

cdamken commented Oct 24, 2014

@butonic

@craigpg yes, sqlite only

Does this problem only affect SQLlite? @dragotin referred this case in owncloud/client#2378 but I have it with MySQL.

@craigpg craigpg modified the milestones: 2014-sprint-08-next, 2014-sprint-07-current Oct 27, 2014
@craigpg craigpg added this to the 2014-sprint-09-next milestone Nov 10, 2014
@craigpg craigpg removed this from the 2014-sprint-08-current milestone Nov 10, 2014
@DeepDiver1975 DeepDiver1975 modified the milestones: 2014-sprint-09-next, backlog Jan 31, 2015
@dragotin
Copy link
Contributor

Pull request that adds some closeCursors: #14329

@MorrisJobke
Copy link
Contributor

@PVince81 @icewind1991 Isn't this solved by high level file locking that was enabled in 8.2+?

@PVince81
Copy link
Contributor

Such issues haven't been reported since a while, so it might be solved by locking, yes.
Also we did add many closeCursor calls.

@MorrisJobke
Copy link
Contributor

Okay let's close this here and maybe reopen if this appears again.

@lock
Copy link

lock bot commented Aug 7, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants