Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database is locked #55

Closed
aletheia7 opened this issue Sep 23, 2020 · 23 comments
Closed

database is locked #55

aletheia7 opened this issue Sep 23, 2020 · 23 comments

Comments

@aletheia7
Copy link

I am getting occasional errors in dovecot:

...
Sep 23 12:31:52 orange dovecot[17049]: indexer-worker(ray)<156088><ybAEGqiia1+3YQIAfYjuOg:iI6CG6iia1+4YQIAfYjuOg>: Error: FTS Xapian: Error in expunge fct (SELECT DOCID,ID from IDS ORDER BY DOCID LIMIT 5) (database is locked) : unset_box
Sep 23 12:31:52 orange dovecot[17049]: lmtp(156087): Error: FTS Xapian: adding expunge IDs (REPLACE INTO IDS (DOCID,ID) VALUES (3550,144670);) : err(5,database is locked)
...

fts-xapian: 68ea96 (master)
OS: Debian Linux testing/bullseye
Dovecot: v2.3.11.3 (latest)

grosjo added a commit that referenced this issue Sep 27, 2020
@grosjo
Copy link
Owner

grosjo commented Sep 27, 2020

Can you set verbose=1 with the latest git , and share messages before and after the error ?

@aletheia7
Copy link
Author

Received One

Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630, me): FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: lmtp(me)<47630><KMRRMIUYc18OugAAfYjuOg>: sieve: msgid=<0.1.44.FA.1D6965289EA7E42.0@omp.e.lenscrafters.com>: fileinto action: stored mail into mailbox 'Junk/Other'
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8971
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6507,8971);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Starting with partial=4 full=20 attachments=0 verbose=1
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Get last UID of Junk/Other (ececcd190285a45da97100007d88ee3a) = 9164
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTX Xapian: Set box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Start indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Expunge UID=8972
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: adding expunge IDs : REPLACE INTO IDS (DOCID,ID) VALUES (6508,8972);
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: FTS Xapian: Done indexing 'Junk/Other' (ececcd190285a45da97100007d88ee3a) (1 msgs in 15 ms, rate: 66.7)
Sep 29 04:20:37 orange dovecot[1429]: indexer-worker(me)<47637><KMRRMIUYc18OugAAfYjuOg:WT7GMYUYc18VugAAfYjuOg>: Error: FTS Xapian: Error in expunge fct (SELECT DOCID,ID from IDS ORDER BY DOCID LIMIT 5) (database is locked) : unset_box
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box 'Junk/Other' (ececcd190285a45da97100007d88ee3a)
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Box is empty
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Unset box '(null)' ((null))
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): FTS Xapian: Committed 'unset_box' in 0 ms
Sep 29 04:20:37 orange dovecot[1429]: lmtp(47630): Disconnect from local: Client has quit the connection (state=READY)

@aletheia7
Copy link
Author

The indexer had successfully completed an index operation before and after this problem at exactly the same minute/second.

grosjo added a commit that referenced this issue Oct 3, 2020
@grosjo
Copy link
Owner

grosjo commented Oct 3, 2020

Hello, can you try with additional updates ?

grosjo added a commit that referenced this issue Oct 3, 2020
@aletheia7
Copy link
Author

Upgraded and watching.

grosjo added a commit that referenced this issue Oct 4, 2020
grosjo added a commit that referenced this issue Oct 4, 2020
@aletheia7
Copy link
Author

aletheia7 commented Oct 7, 2020

@grosjo
Copy link
Owner

grosjo commented Oct 8, 2020

Do you have a process of expunging the mailbox in parallel to main core binary ?

@aletheia7
Copy link
Author

Yes.
The following job runs daily at 1:00 am as root via a systemd.timer:

    /usr/bin/doveadm expunge -A mailbox '*' from 'no_reply @ooma.com' since 2017-06-28 before 30d header Subject 'New voicemail

The following expunge rules are defined in the dovecot local.conf file:

namespace inbox {
  ...
  mailbox Junk {
    special_use = \Junk
    auto = subscribe
    autoexpunge = 7d
  }
  mailbox Junk/Other {
    special_use = \Junk
    auto = no
    autoexpunge = 7d
  }
  mailbox "Junk E-mail" {
    special_use = \Junk   
    autoexpunge = 7d
  }
  mailbox Backups/baobab {
    auto = no
    autoexpunge = 180days
  }
  mailbox Backups/peach {
    auto = no
    autoexpunge = 180days
  }
  ...
}

The Junk and Backups/baobab mailbox expunge rules get executed when a new email arrives. They get executed often because backups occur in the evening at ~ 22:00 and, well, we get a lot of Junk/Spam.

@aletheia7
Copy link
Author

Maybe the sqlite timeout should be increased. What is the timeout?

https://sqlite.org/c3ref/busy_timeout.html

Is the code using sqlite wal mode?
https://www.sqlite.org/wal.html

Is the sqlite code in dovecot or fts-xapian?

grosjo added a commit that referenced this issue Oct 10, 2020
@grosjo
Copy link
Owner

grosjo commented Oct 10, 2020

All sqlite calls are in fts-xapian.
I added a "timeout" function.
Please knidly check again

@aletheia7
Copy link
Author

I upgraded now. Also, it's happening to another email account too.

@aletheia7
Copy link
Author

One yesterday. Two today.

grosjo added a commit that referenced this issue Oct 16, 2020
@grosjo
Copy link
Owner

grosjo commented Oct 16, 2020

I increased the timeout to see

@aletheia7
Copy link
Author

I used the 100 timeout on 2020-10-16. I received one "database is locked" on 2020-10-17.

I changed 100 -> 500.

@slavkoja
Copy link

Please, @grosjo, for what is SQLite used? Cannot this be stored directly into xapian DB?

AFAIK, the timeout is in msec. Default timeout in Python & Exim is 5000 msec, Dovecot uses 1000 msec. IMO 100 msec is too low.

@grosjo
Copy link
Owner

grosjo commented Oct 20, 2020

I used SQlite to handle expunges outside xapuan, as xapian does not explicitely handle concurrency.

https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/concurrency.html

Does 5000 timeout solve the issue ?
Do you have a better idea ?

grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
@grosjo
Copy link
Owner

grosjo commented Oct 20, 2020

Actually tried with https://tartarus.org/james/xapian-apidoc/html/namespaceXapian.html#a296fe000d6d9525bb8e80f72838026a6
from recent version of Xapian

Make sense ?

@slavkoja
Copy link

I cannot comment xapian locking, but if i properly understand provided link, there will be no timeout, but infinite waiting for lock, i am not sure if it is good idea.

I can comment SQLite locking, but i have only basic C/C++ experiences, thus i didn't inspect your code (i will lost in int), thus some of my notes can be applied already (you simple ignore them):

  • use transactions to write multiple records, for better performance
  • keep transactions as short as possible, any write transaction will lock the database for write & read
  • make sure that proper indexes are defined, otherwise SQLite (by default) will (re)create dynamic index on any read, which can be slow
  • writing to indexes can slow write performace, if missing indexes are OK, disable its auto creation by PRAGMA automatic_index command https://sqlite.org/pragma.html#pragma_automatic_index

Using WAL mode can improve performance, especially it will allow concurrent write/read access. Its setting is persisent and can be enabled/disabled anytime, even by CLI tool. It can little slowdown reads, but concurrent access can improve it at all. Because it is persistent, it can be enabled on DB file creation (only once) and because it is not error to enable it more time, it can be enabled ag. in time plugin initialization, to ensure enabling it for existing DB file .

grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
@grosjo
Copy link
Owner

grosjo commented Oct 20, 2020

Please kindly test again, and let me know.

@aletheia7
Copy link
Author

Please kindly test again, and let me know.

I will. I am currently testing 3f7fe69 with sqlite3_busy_timeout(500).

After 24 h I will test b8cad27 (latest).

grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
grosjo added a commit that referenced this issue Oct 20, 2020
@aletheia7
Copy link
Author

An sqlite3_busy_timeout of 500 works for me.

Now upgrade with the latest.

@aletheia7
Copy link
Author

Problem fixed in fee8502.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants