LOCK TABLES can lead to crashes or locks when used with Galera #27071

butonic · 2017-02-01T14:43:53Z

See https://mariadb.com/kb/en/mariadb/lock-tables-and-unlock-tables/#limitations

Also we are seeing installations that have to do a 'repair table' to fix the cluster. The specific problem is

ERROR 1100 (HY000): Table oc_appconfig' was not locked with LOCK TABLES

The documentation only explains:

While a connection holds an explicit lock on a table, it cannot access a non-locked table. If you try, the [...] error will be produced [...]

AFAICT locking is only used for the oc_jobs table: https://github.com/owncloud/core/blob/master/lib/private/BackgroundJob/JobList.php#L191-L210 However a WRITE lock is intended to prevent other connections from updating the table. Galera does not propagate locks ... no Idea what exactly is causing the error.

Introduced with d0a2fa0 ... seems to be released with 9.1.0

Any other installation running 9.1 on galera without this problem?

cc @DeepDiver1975 @felixboehm @PhilippSchaffrath @dercorn @IljaN

The text was updated successfully, but these errors were encountered:

phisch · 2017-02-01T14:52:19Z

What is weird, is that an instance with this error works fine after a repair table and might randomly fail again after a few hours or a day.

Even if this is caused by jobs, a request to index.php should hold its own connection which should run without any problems. But we could see that no requests at all will work, all requests get the same error. The table is accessible perfectly fine if you connect to the database itself and run the query manual.

DeepDiver1975 · 2017-02-01T15:37:56Z

https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/

Bloody hell ..... is there a docker with galery inside? I'll immendiatly put this into the ci pipeline

butonic · 2017-02-01T15:43:34Z

http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-1/
http://galeracluster.com/2015/05/getting-started-galera-with-docker-part-2-2/

?

DeepDiver1975 · 2017-02-01T15:45:46Z

https://hub.docker.com/search/?isAutomated=0&isOfficial=0&page=1&pullCount=0&q=galera&starCount=0

DeepDiver1975 · 2017-02-01T15:52:18Z

let's try this one

docker pull erkules/galera

Helios07 · 2017-02-02T09:42:46Z

Which version of MariaDB/MySQL are we talking about? At least in my Galera cluster working with MariaDB 5.5 a repair table is not possible for an InnoDB table (and there will most probably no myISAM tables):

MariaDB [owncloud]> repair table oc_appconfig; +-----------------------+--------+----------+---------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +-----------------------+--------+----------+---------------------------------------------------------+ | owncloud.oc_appconfig | repair | note | The storage engine for the table doesn't support repair | +-----------------------+--------+----------+---------------------------------------------------------+ 1 row in set (0.01 sec)

Even if this works in newer versions, will the repair table be propagated? This would explain to happen the error to occur again at a later time when you might be connected to another database node.

By the way, I have a test installation with this Galera cluster where I do not get the error.

phisch · 2017-02-02T10:28:51Z

AFAIK it says that the engine doesn't support repair, but repair actually does multiple steps and the message comes from one of those.

butonic · 2017-02-02T11:44:45Z

@Helios07 Indeed, we got the same output with 'The storage engine for the table doesn't support repair' BUT the table worked afterwards.
Reading the docs it would be great to get the output of:

SHOW OPEN TABLES WHERE in_use <> 0
SHOW FULL PROCESSLIST;
from http://www.dbrnd.com/2016/02/mysql-script-to-identify-the-locks-and-blocking-transactions/

SELECT 
    pl.id
    ,pl.user
    ,pl.state
    ,it.trx_id 
    ,it.trx_mysql_thread_id 
    ,it.trx_query AS query
    ,it.trx_id AS blocking_trx_id
    ,it.trx_mysql_thread_id AS blocking_thread
    ,it.trx_query AS blocking_query
FROM information_schema.processlist AS pl 
INNER JOIN information_schema.innodb_trx AS it
	ON pl.id = it.trx_mysql_thread_id
INNER JOIN information_schema.innodb_lock_waits AS ilw
	ON it.trx_id = ilw.requesting_trx_id 
        AND it.trx_id = ilw.blocking_trx_id

We may also try https://github.com/innotop/innotop to monitor the locks.

AFAICT it should be the oc_jobs table. If so, we will know where to look for an alternative solution.

phisch · 2017-02-02T14:53:37Z

@butonic i tried listing open tables, but at the time the instance was unuseable there was no table in use or locked. I did not check the processlist though.

dercorn · 2017-02-06T09:53:25Z

@butonic @phisch @DeepDiver1975 any progress here? Or any ideas on how we might isolate the cause of the problem?

butonic · 2017-02-10T09:34:08Z

{"reqId":"uY8\/fpZZIdA964fn8\/hn","remoteAddr":"10.10.2.245","app":"remote","message":"Exception: {\"Exception\":\"Doctrine\\\\DBAL\\\\Exception\\\\DriverException\",\"Message\":\"An exception occurred while executing 'UPDATE `oc_authtoken` SET `last_activity` = ? WHERE `id` = ?' with params [1486376134, 247]:\\n\\nSQLSTATE[HY000]: General error: 1100 Table 'oc_authtoken' was not locked with LOCK TABLES\",\"Code\":0,\"Trace\":\"#0 \\\/var\\\/www\\\/owncloud\\\/3rdparty\\\/doctrine\\\/dbal\\\/lib\\\/Doctrine\\\/DBAL\\\/DBALException.php(116): Doctrine\\\\DBAL\\\\Driver\\\\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOException))\\n#1 \\\/var\\\/www\\\/owncloud\\\/3rdparty\\\/doctrine\\\/dbal\\\/lib\\\/Doctrine\\\/DBAL\\\/Statement.php(174): Doctrine\\\\DBAL\\\\DBALException::driverExceptionDuringQuery(Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOMySql\\\\Driver), Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOException), 'UPDATE `oc_auth...', Array)\\n#2 \\\/var\\\/www

butonic · 2017-02-10T10:39:28Z

after a LOCK TABLES oc_jobs WRITE we can see the lock with SHOW OPEN TABLES WHERE in_use <> 0 trying to select from another lable correctly gives us the was not locked with LOCK TABLES error. Nothing unusual. @phisch is going to write a patch that tries to log the locked tables on a \Doctrine\DBAL\Exception\DriverException that has a message containing was not locked with LOCK TABLES so we can narrow down the cause. Logic dictates it is a lock on oc_jobs because that is the only table we lock in core.

Note that other connections are not affected: when we lock the oc_jobs table via mysql CLI the web ui still works as designed because it does not require the oc_jobs table. But the error is seen in the web ui. In theory a web request must have created a lock and then tried to touch another table. How is that possible?

Hm ... their netscale load balancer sends all queries to one of two galera cluster master nodes in active passive mode. maybe a connection is reused that hasn't been closed correctly?

Also if @Helios07 doesn't see this problem in his test instance we may neet to ping people with more galera cluster know how. Someone like @ayurchen or @temeo.

PVince81 · 2017-04-07T07:09:11Z

Any update on this ? We need to do another RC2 with a fix for this

PVince81 · 2017-04-07T07:28:26Z

Should we try getting rid of any SQL "LOCK" commands and implement a custom locking ? But it sucks if we can't use native DB commands.

butonic · 2017-04-07T07:33:52Z

I think we should be able to get around a table lock. I implemented the necessary kind of begin transaction, update and change, select updated value, commit in https://github.com/owncloud/core/pull/25771/files I am currently trying to understand if we can use that kind of atomicity to get rid of the table lock.

A different approach may be #25100.

PVince81 · 2017-04-07T07:34:56Z

#25100 is not backportable, unless you mean extract the locking logic

PVince81 · 2017-04-07T07:51:02Z

What's the next step ? Reopening/porting https://github.com/owncloud/core/pull/25771/files ? Who can work on this ?

PVince81 · 2017-04-07T07:56:13Z

a web request must have created a lock

Some web requests like deleting a file to trash or overwriting a file might schedule a trashbin/version expiration by inserting a row into oc_jobs. I don't think there is any explicit LOCK command there, but maybe it happens implicitly.

butonic · 2017-04-07T08:01:39Z

no core/#25771 does not remove the lock. Working on it

butonic · 2017-04-07T11:07:54Z

I took a detour with subselect magic and dark arts of bending the QueryBuilder to do what I want it to do ... didn't work out as expected. Now have a much simpler solution: #27597

PVince81 · 2017-04-07T16:58:40Z

Fix to be released with 9.1.5

lock · 2019-08-01T02:00:03Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

butonic added blue-ticket Type:Bug db:mariadb feature:backgroundjobs sev2-high labels Feb 1, 2017

PVince81 added this to the 9.1.5 milestone Feb 1, 2017

PVince81 assigned phisch Mar 27, 2017

PVince81 added green-ticket sev1-critical and removed blue-ticket sev2-high labels Apr 7, 2017

butonic assigned butonic and unassigned phisch Apr 7, 2017

butonic mentioned this issue Apr 7, 2017

[stable9.1] make JobList::next() lock free #27597

Merged

9 tasks

PVince81 closed this as completed Apr 7, 2017

ownclouders added the status/STALE label Dec 21, 2017

lock bot locked as resolved and limited conversation to collaborators Aug 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LOCK TABLES can lead to crashes or locks when used with Galera #27071

LOCK TABLES can lead to crashes or locks when used with Galera #27071

butonic commented Feb 1, 2017 •

edited by DeepDiver1975

phisch commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

butonic commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

Helios07 commented Feb 2, 2017

phisch commented Feb 2, 2017

butonic commented Feb 2, 2017

phisch commented Feb 2, 2017

dercorn commented Feb 6, 2017

butonic commented Feb 10, 2017

butonic commented Feb 10, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

butonic commented Apr 7, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

butonic commented Apr 7, 2017

butonic commented Apr 7, 2017

PVince81 commented Apr 7, 2017

lock bot commented Aug 1, 2019

LOCK TABLES can lead to crashes or locks when used with Galera #27071

LOCK TABLES can lead to crashes or locks when used with Galera #27071

Comments

butonic commented Feb 1, 2017 • edited by DeepDiver1975

phisch commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

butonic commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

DeepDiver1975 commented Feb 1, 2017

Helios07 commented Feb 2, 2017

phisch commented Feb 2, 2017

butonic commented Feb 2, 2017

phisch commented Feb 2, 2017

dercorn commented Feb 6, 2017

butonic commented Feb 10, 2017

butonic commented Feb 10, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

butonic commented Apr 7, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

PVince81 commented Apr 7, 2017

butonic commented Apr 7, 2017

butonic commented Apr 7, 2017

PVince81 commented Apr 7, 2017

lock bot commented Aug 1, 2019

butonic commented Feb 1, 2017 •

edited by DeepDiver1975