Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database is locked error in some scenarios #10838

Closed
saledjenic opened this issue May 29, 2023 · 11 comments · Fixed by #10870
Closed

database is locked error in some scenarios #10838

saledjenic opened this issue May 29, 2023 · 11 comments · Fixed by #10870
Assignees
Labels
bug Something isn't working
Milestone

Comments

@saledjenic
Copy link
Contributor

database is locked error started to appear after db improvements were applied. Not sure when, under which circumstances that's happening, but folks from mobile team faced this issue as well. One flow where it is reproducible in 100% cases is when user is migrating his profile keypair to a keycard. That flow ends with an error due to database is locked error which is happening when status-go ConvertToKeycardAccount function is called in this line addedKc, _, err := accountDB.AddKeycardOrAddAccountsIfKeycardIsAdded(kc).

@saledjenic saledjenic added bug Something isn't working performance labels May 29, 2023
@osmaczko osmaczko self-assigned this May 29, 2023
@osmaczko osmaczko added this to the 0.13 milestone May 29, 2023
@alexjba
Copy link
Contributor

alexjba commented May 29, 2023

Changing the flow a little bit results in a different error - FOREIGN KEY constraint failed while inserting in keycards.

In AddKeycardOrAddAccountsIfKeycardIsAdded changing the flow
From: Begin transaction -> Select from keycards -> Insert into keycards -> Database is locked
To: Select from keycards -> Begin transaction -> Insert into keycards -> FOREIGN KEY constraint failed

@alexjba
Copy link
Contributor

alexjba commented May 29, 2023

@saledjenic Currently if the ConvertToKeycardAccount fails, the backend access will be locked and nim won't be able to reach it anymore. This is the variable used to lock it:

  if DB_BLOCKED_DUE_TO_PROFILE_MIGRATION:
    debug "DB blocked due to profile migration, unable to proceed with the rpc call", rpc_method=methodName
    raise newException(RpcException, "db closed due to profile migration")

On top of fixing the database is locked issue, it might be good to also improve the error handling in nim and status-go in order not to break the current app session or the profile. IMO there's two missing pieces (might be more and I'm missing something):

  1. Restore the profile to use password login
  2. Release the DB_BLOCKED_DUE_TO_PROFILE_MIGRATION lock

WDYT of adding another task to keep track of it once we plan to release for keycard support?

@alexjba
Copy link
Contributor

alexjba commented May 29, 2023

Changing the flow a little bit results in a different error - FOREIGN KEY constraint failed while inserting in keycards.

In AddKeycardOrAddAccountsIfKeycardIsAdded changing the flow From: Begin transaction -> Select from keycards -> Insert into keycards -> Database is locked To: Select from keycards -> Begin transaction -> Insert into keycards -> FOREIGN KEY constraint failed

There's a circular foreign key definition between keycards and keycards_accounts. Based on sqlite documentation we shouldn't be able to insert into any of these tables. Now I'm not sure why it worked when having only 1 db connection.
Based on this documentation, we would need what is defined in the doc as Deferred Foreign Key Constraints. This should allow us to insert in these tables if we're running the inserts on the same transaction.

@saledjenic
Copy link
Contributor Author

Currently if the ConvertToKeycardAccount fails, the backend access will be locked and nim won't be able to reach it anymore. This is the variable used to lock it:

ConvertToKeycardAccount function is called only in case of profile keypair migration. Actually that may fail only if any db operation fails. The sad thing is that if that happens we cannot recover the app, but at least for not it is like that. The result of that process is converted account and the only option available for logged in user to do is to quit the app.

Now about this part if DB_BLOCKED_DUE_TO_PROFILE_MIGRATION: ... while migration is ongoing (since that's an async job) many other parts of the app may do some db calls, if those calls come the db encryption has changed (from pass to encrypted pub key) the user may face an error (cannot recall maybe even a crash). That's the reason why we have that check.

WDYT of adding another task to keep track of it once we plan to release for keycard support?

Not sure that I understand this, but keycard support will be released with mvp.

@saledjenic
Copy link
Contributor Author

There's a circular foreign key definition between keycards and keycards_accounts.

How do you see it "circular"?
In keypairs table primary key is key_uid.
In keycards table primary key is keycard_uid. There is also a key_uid column which is a foreign key which references keypairs.key_uid

keypairs_accounts table has a foreign key key_uid which references keypairs.key_uid
keycards_accounts table has a foreign key keycard_uid which references keycards.keycard_uid

Based on described relations I don't see any circular dependency?

@alexjba
Copy link
Contributor

alexjba commented May 30, 2023

Based on described relations I don't see any circular dependency?

You are right, thanks! Got lost in keycards and keypairs. Back to square one to fully understand this behaviour.

Currently if the ConvertToKeycardAccount fails, the backend access will be locked and nim won't be able to reach it anymore. This is the variable used to lock it:

ConvertToKeycardAccount function is called only in case of profile keypair migration. Actually that may fail only if any db operation fails. The sad thing is that if that happens we cannot recover the app, but at least for not it is like that. The result of that process is converted account and the only option available for logged in user to do is to quit the app.

Now about this part if DB_BLOCKED_DUE_TO_PROFILE_MIGRATION: ... while migration is ongoing (since that's an async job) many other parts of the app may do some db calls, if those calls come the db encryption has changed (from pass to encrypted pub key) the user may face an error (cannot recall maybe even a crash). That's the reason why we have that check.

WDYT of adding another task to keep track of it once we plan to release for keycard support?

Not sure that I understand this, but keycard support will be released with mvp.

What I meant is that currently when there is an error in profile migration the account becomes unusable. It is partially migrated to keycard, but the keycard login won't work. The question is if it's feasible (also if it's something we want) to fix this so that the user can still use the same account in case of migration error?

@saledjenic
Copy link
Contributor Author

What I meant is that currently when there is an error in profile migration the account becomes unusable. It is partially migrated to keycard, but the keycard login won't work. The question is if it's feasible (also if it's something we want) to fix this so that the user can still use the same account in case of migration error?

Yes, that's true, if something goes wrong there is no a recovering process which will revert applied changes and keep the profile valid. Otherwise if an error occurs even if user is able to login, that account is corrupted, cause a migration was partially done and the data in the db are inconsistent. That's something we can think of and possibly improve, but I guess that will require wider changes cause I used the existing ConvertToKeycardAccount function and just updated it a bit.
In short something may go wrong in one of the following cases:

  1. if there is any db error running queries (which is the issue we're experiencing now)
  2. if changing db password fails
  3. if deleting local keystore files fails

for any of those we actually cannot do anything to recover from that state:

  • In case 1. what we can do? But that's actually very hard to happen (the db locked issue is because of those improvements, but once we fix that, it should be hard to get into db error)
  • in case 2. if that fails we don't have too much to do, actually nothing
  • in case 3. the same, cause if deletion any of keystore files fails in order to re-create previously delete files we need seed phrase which we don't have at that moment and no way to get it from anywhere

All in all (at least for now) that's very sensitive action (which may take very long, depends on the db size), which should go smooth, but if something goes wrong, the only what user can do is to remove local Status folder and recover his account.

@saledjenic
Copy link
Contributor Author

I'm facing the same issue when I was trying to change a display name in the latest master branch, here is the log:

DBG 2023-05-30 13:03:21.643+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom
DBG 2023-05-30 13:03:25.465+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom
2023-05-30T13:03:26.120+0200	WARN	peerstore	pstoremem/addr_book.go:248	Was passed p2p address with a different peerId. found: 16Uiu2HAkyScd7DiwgMwzfw8CFFhznH3wRzciqEUfjDzn7vyimR8c, expected: 16Uiu2HAm7v2ZkJVDPH9z9NaN7G81E8iUzSiZ6FDFcq6a4taBcAjd
DBG 2023-05-30 13:03:50.148+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_setDisplayName
ERR 2023-05-30 13:03:50.153+02:00 rpc response error                         topics="rpc" tid=3398430 file=core.nim:36 err="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
ERR 2023-05-30 13:03:50.153+02:00 error doing rpc request                    topics="rpc" tid=3398430 file=core.nim:40 methodName=wakuext_setDisplayName exception="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
ERR 2023-05-30 13:03:50.153+02:00 error:                                     topics="profile-service" tid=3398430 file=service.nim:81 procName=setDisplayName errName=RpcException errDesription="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
DBG 2023-05-30 13:03:50.154+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_setBio
DBG 2023-05-30 13:03:50.155+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_setSocialLinks
DBG 2023-05-30 13:03:50.155+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_getSocialLinks
DBG 2023-05-30 13:03:50.157+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=multiaccounts_deleteIdentityImage
DBG 2023-05-30 13:03:50.158+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom

@alexjba @osmaczko could we revert that commit where db improvements are added and merge it once we solve this issue?

@caybro
Copy link
Member

caybro commented May 30, 2023

I'm facing the same issue when I was trying to change a display name in the latest master branch, here is the log:

DBG 2023-05-30 13:03:21.643+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom
DBG 2023-05-30 13:03:25.465+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom
2023-05-30T13:03:26.120+0200	WARN	peerstore	pstoremem/addr_book.go:248	Was passed p2p address with a different peerId. found: 16Uiu2HAkyScd7DiwgMwzfw8CFFhznH3wRzciqEUfjDzn7vyimR8c, expected: 16Uiu2HAm7v2ZkJVDPH9z9NaN7G81E8iUzSiZ6FDFcq6a4taBcAjd
DBG 2023-05-30 13:03:50.148+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_setDisplayName
ERR 2023-05-30 13:03:50.153+02:00 rpc response error                         topics="rpc" tid=3398430 file=core.nim:36 err="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
ERR 2023-05-30 13:03:50.153+02:00 error doing rpc request                    topics="rpc" tid=3398430 file=core.nim:40 methodName=wakuext_setDisplayName exception="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
ERR 2023-05-30 13:03:50.153+02:00 error:                                     topics="profile-service" tid=3398430 file=service.nim:81 procName=setDisplayName errName=RpcException errDesription="\nstatus-go error [methodName:wakuext_setDisplayName, code:-32000, message:database is locked ]\n"
DBG 2023-05-30 13:03:50.154+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_setBio
DBG 2023-05-30 13:03:50.155+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_setSocialLinks
DBG 2023-05-30 13:03:50.155+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=settings_getSocialLinks
DBG 2023-05-30 13:03:50.157+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=multiaccounts_deleteIdentityImage
DBG 2023-05-30 13:03:50.158+02:00 NewBE_callPrivateRPC                       topics="rpc" tid=3398430 file=core.nim:27 rpc_method=wakuext_getLatestVerificationRequestFrom

@alexjba @osmaczko could we revert that commit where db improvements are added and merge it once we solve this issue?

Hmmm... I cannot reproduce on latest master (under Linux).

But... we have the very same issue open already: #10827

osmaczko added a commit to status-im/status-go that referenced this issue May 30, 2023
- reverts db pool until `database is locked` error is resolved

related to: status-im/status-desktop#10838
osmaczko added a commit to status-im/status-go that referenced this issue May 30, 2023
- reverts db pool until `database is locked` error is resolved

related to: status-im/status-desktop#10838
@alexjba
Copy link
Contributor

alexjba commented May 30, 2023

Apparently there can be read transactions and write transactions (based on the first statement used in the transaction). In our case the the db lock was happening when the first transaction statement is select. The transaction is set to read transaction and it can't be upgraded to write due to multiple connections/transactions. In this case the database is locked error is sent. (https://www.sqlite.org/lang_transaction.html)

The fix in our case seems to be to set _txlock to immediate. This means all transactions will be considered write transactions. And it makes sense for us as all transactions we open are write transactions.

alexjba added a commit to status-im/status-go that referenced this issue May 30, 2023
The default transaction lock is `deferred`. This means that the transaction will automatically become read or write transaction based on the first DB operation. In case the first operation is `SELECT` the transaction becomes read transaction, otherwise write transaction. When a read transaction tries to write the DB sqlite will promote the transaction to a write transaction if there is no other transaction that holds a lock. When the promotion fails `database is locked` error is returned. The error is returned immediately and does not use the busy handler.
In our case almost all read transaction would fail with `database is locked` error.

This fix is changing the default transaction lock to `IMMEDIATE`. It translates to `BEGIN IMMEDIATE` instead of `BEGIN`. In this mode, the transaction will be created as a write transaction no matter what DB operation will run as part of the transaction. The write transaction will try to obtain the DB lock immediately when `BEGIN IMMEDIATE` is called and the busy handler is used when the DB is locked by other transaction.

Fixing: status-im/status-desktop#10838
@noeliaSD
Copy link
Contributor

But... we have the very same issue open already: #10827

Scenario suite_settings/tst_userIdentity/The user can change own display name in profile has been skipped since displayName is not correctly updated in profile page. Please, enable it again when the bug is solved.

osmaczko added a commit to status-im/status-go that referenced this issue May 31, 2023
- reverts db pool until `database is locked` error is resolved

related to: status-im/status-desktop#10838
osmaczko added a commit to status-im/status-go that referenced this issue May 31, 2023
- reverts db pool until `database is locked` error is resolved

related to: status-im/status-desktop#10838
osmaczko pushed a commit to status-im/status-go that referenced this issue May 31, 2023
The default transaction lock is `deferred`. This means that the transaction will automatically become read or write transaction based on the first DB operation. In case the first operation is `SELECT` the transaction becomes read transaction, otherwise write transaction. When a read transaction tries to write the DB sqlite will promote the transaction to a write transaction if there is no other transaction that holds a lock. When the promotion fails `database is locked` error is returned. The error is returned immediately and does not use the busy handler.
In our case almost all read transaction would fail with `database is locked` error.

This fix is changing the default transaction lock to `IMMEDIATE`. It translates to `BEGIN IMMEDIATE` instead of `BEGIN`. In this mode, the transaction will be created as a write transaction no matter what DB operation will run as part of the transaction. The write transaction will try to obtain the DB lock immediately when `BEGIN IMMEDIATE` is called and the busy handler is used when the DB is locked by other transaction.

Fixing: status-im/status-desktop#10838
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants