New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOCK TABLES can lead to crashes or locks when used with Galera #27071
Comments
What is weird, is that an instance with this error works fine after a Even if this is caused by jobs, a request to index.php should hold its own connection which should run without any problems. But we could see that no requests at all will work, all requests get the same error. The table is accessible perfectly fine if you connect to the database itself and run the query manual. |
https://mariadb.com/kb/en/mariadb/mariadb-galera-cluster-known-limitations/ Bloody hell ..... is there a docker with galery inside? I'll immendiatly put this into the ci pipeline |
let's try this one
|
Which version of MariaDB/MySQL are we talking about? At least in my Galera cluster working with MariaDB 5.5 a repair table is not possible for an InnoDB table (and there will most probably no myISAM tables):
Even if this works in newer versions, will the repair table be propagated? This would explain to happen the error to occur again at a later time when you might be connected to another database node. By the way, I have a test installation with this Galera cluster where I do not get the error. |
AFAIK it says that the engine doesn't support repair, but repair actually does multiple steps and the message comes from one of those. |
@Helios07 Indeed, we got the same output with 'The storage engine for the table doesn't support repair' BUT the table worked afterwards.
SELECT
pl.id
,pl.user
,pl.state
,it.trx_id
,it.trx_mysql_thread_id
,it.trx_query AS query
,it.trx_id AS blocking_trx_id
,it.trx_mysql_thread_id AS blocking_thread
,it.trx_query AS blocking_query
FROM information_schema.processlist AS pl
INNER JOIN information_schema.innodb_trx AS it
ON pl.id = it.trx_mysql_thread_id
INNER JOIN information_schema.innodb_lock_waits AS ilw
ON it.trx_id = ilw.requesting_trx_id
AND it.trx_id = ilw.blocking_trx_id We may also try https://github.com/innotop/innotop to monitor the locks. AFAICT it should be the oc_jobs table. If so, we will know where to look for an alternative solution. |
@butonic i tried listing open tables, but at the time the instance was unuseable there was no table in use or locked. I did not check the processlist though. |
@butonic @phisch @DeepDiver1975 any progress here? Or any ideas on how we might isolate the cause of the problem? |
{"reqId":"uY8\/fpZZIdA964fn8\/hn","remoteAddr":"10.10.2.245","app":"remote","message":"Exception: {\"Exception\":\"Doctrine\\\\DBAL\\\\Exception\\\\DriverException\",\"Message\":\"An exception occurred while executing 'UPDATE `oc_authtoken` SET `last_activity` = ? WHERE `id` = ?' with params [1486376134, 247]:\\n\\nSQLSTATE[HY000]: General error: 1100 Table 'oc_authtoken' was not locked with LOCK TABLES\",\"Code\":0,\"Trace\":\"#0 \\\/var\\\/www\\\/owncloud\\\/3rdparty\\\/doctrine\\\/dbal\\\/lib\\\/Doctrine\\\/DBAL\\\/DBALException.php(116): Doctrine\\\\DBAL\\\\Driver\\\\AbstractMySQLDriver->convertException('An exception oc...', Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOException))\\n#1 \\\/var\\\/www\\\/owncloud\\\/3rdparty\\\/doctrine\\\/dbal\\\/lib\\\/Doctrine\\\/DBAL\\\/Statement.php(174): Doctrine\\\\DBAL\\\\DBALException::driverExceptionDuringQuery(Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOMySql\\\\Driver), Object(Doctrine\\\\DBAL\\\\Driver\\\\PDOException), 'UPDATE `oc_auth...', Array)\\n#2 \\\/var\\\/www |
after a Note that other connections are not affected: when we lock the oc_jobs table via mysql CLI the web ui still works as designed because it does not require the oc_jobs table. But the error is seen in the web ui. In theory a web request must have created a lock and then tried to touch another table. How is that possible? Hm ... their netscale load balancer sends all queries to one of two galera cluster master nodes in active passive mode. maybe a connection is reused that hasn't been closed correctly? Also if @Helios07 doesn't see this problem in his test instance we may neet to ping people with more galera cluster know how. Someone like @ayurchen or @temeo. |
Any update on this ? We need to do another RC2 with a fix for this |
Should we try getting rid of any SQL "LOCK" commands and implement a custom locking ? But it sucks if we can't use native DB commands. |
I think we should be able to get around a table lock. I implemented the necessary kind of A different approach may be #25100. |
#25100 is not backportable, unless you mean extract the locking logic |
What's the next step ? Reopening/porting https://github.com/owncloud/core/pull/25771/files ? Who can work on this ? |
Some web requests like deleting a file to trash or overwriting a file might schedule a trashbin/version expiration by inserting a row into oc_jobs. I don't think there is any explicit LOCK command there, but maybe it happens implicitly. |
no core/#25771 does not remove the lock. Working on it |
I took a detour with subselect magic and dark arts of bending the QueryBuilder to do what I want it to do ... didn't work out as expected. Now have a much simpler solution: #27597 |
Fix to be released with 9.1.5 |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
See https://mariadb.com/kb/en/mariadb/lock-tables-and-unlock-tables/#limitations
Also we are seeing installations that have to do a 'repair table' to fix the cluster. The specific problem is
The documentation only explains:
AFAICT locking is only used for the oc_jobs table: https://github.com/owncloud/core/blob/master/lib/private/BackgroundJob/JobList.php#L191-L210 However a WRITE lock is intended to prevent other connections from updating the table. Galera does not propagate locks ... no Idea what exactly is causing the error.
Introduced with d0a2fa0 ... seems to be released with 9.1.0
Any other installation running 9.1 on galera without this problem?
cc @DeepDiver1975 @felixboehm @PhilippSchaffrath @dercorn @IljaN
The text was updated successfully, but these errors were encountered: