Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread deadlocks on statement prepare under high load involving explicit locks #865

Closed
tvenhola opened this issue Oct 8, 2018 · 6 comments

Comments

@tvenhola
Copy link

tvenhola commented Oct 8, 2018

Issue description

Occasionally (race condition, I presume) preparing a SELECT / UPDATE / INSERT statement after issuing LOCK TABLES foo WRITE, bar READ; never returns. The transaction that is hanging has also been granted the locks effectively preventing other SQL connections to run their queries.

We have not been able to produce this problem in small example code but we have observed it in automated unit testing (only when tests are run in parallel) and validated that the transaction that is holding the locks is idle and inside a Tx.Prepare() function call. Killing the query connection enables the other transactions to continue normally.

In trying to reproduce the bug in smaller code we observed that we are able to produce degraded performance (connections and transactions hanging on for a minute before resolving) but not a complete deadlock. We suspect that this is because of our application queries and transactions are more complex and may be running for longer.

MariaDB [fuzz]> show processlist;
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------+----------+
| Id    | User | Host            | db   | Command | Time | State                           | Info                                | Progress |
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------+----------+
| 56455 | root | localhost       | fuzz | Query   |    0 | init                            | show processlist                    |    0.000 |
| 56493 | fuzz | localhost:51604 | fuzz | Query   |   53 | Waiting for table metadata lock | LOCK TABLES b WRITE                 |    0.000 |
| 56498 | fuzz | localhost:51614 | fuzz | Query   |   53 | Waiting for table metadata lock | LOCK TABLES a READ, b READ, c READ  |    0.000 |
| 56499 | fuzz | localhost:51616 | fuzz | Query   |   53 | Waiting for table metadata lock | LOCK TABLES a WRITE                 |    0.000 |
| 56500 | fuzz | localhost:51618 | fuzz | Query   |   52 | Waiting for table metadata lock | LOCK TABLES c WRITE, a READ, b READ |    0.000 |
| 56501 | fuzz | localhost:51622 | fuzz | Query   |   49 | Waiting for table metadata lock | LOCK TABLES b WRITE                 |    0.000 |
| 56502 | fuzz | localhost:51636 | fuzz | Query   |   53 | Opening tables                  | LOCK TABLES b WRITE, a READ, c READ |    0.000 |
| 56503 | fuzz | localhost:51648 | fuzz | Query   |   40 | Waiting for table metadata lock | LOCK TABLES a WRITE                 |    0.000 |
| 56504 | fuzz | localhost:51680 | fuzz | Query   |   40 | Waiting for table metadata lock | LOCK TABLES b READ                  |    0.000 |
| 56505 | fuzz | localhost:51682 | fuzz | Query   |   40 | Waiting for table metadata lock | LOCK TABLES a READ                  |    0.000 |
| 56506 | fuzz | localhost:51684 | fuzz | Query   |   53 | Waiting for table metadata lock | LOCK TABLES b WRITE                 |    0.000 |
+-------+------+-----------------+------+---------+------+---------------------------------+-------------------------------------+----------+
11 rows in set (0.00 sec)

Example code

With this code the jamming (which resolves eventually here) is present but full deadlock is not. This code will assume that a MariaDB database with name "fuzz" accessible for user "fuzz" without password is available. This will create 3 tables a, b and c and randomly read and write data in those. Eventually it will jam with 100% CPU utilization but no queries progressing and no disk activity.

package main

import (
	"database/sql"
	"fmt"
	_ "github.com/go-sql-driver/mysql"
	"log"
	"math/rand"
	"runtime/debug"
	"strings"
	"time"
)

var tables = []string{"a", "b", "c"}
var initStatements = []string{
	"DROP TABLE IF EXISTS c",
	"DROP TABLE IF EXISTS b",
	"DROP TABLE IF EXISTS a",
	"CREATE TABLE a (id int not null primary key auto_increment, val int not null)",
	"CREATE TABLE b (id int not null primary key auto_increment, val int not null, aid int not null, constraint foreign key (aid) references a(id))",
	"CREATE TABLE c (id int not null primary key auto_increment, val int not null, bid int not null, constraint foreign key (bid) references b(id))",
	"INSERT INTO a (val) values (1), (2), (3)",
	"INSERT INTO b (val, aid) values (1, 1), (2, 2), (3, 3)",
	"INSERT INTO c (val, bid) values (1, 1), (2, 2), (3, 3)",
}
var endTime time.Time
const maxRunTime = time.Hour
const threads = 10

func main() {
	endTime = time.Now().Add(maxRunTime)
	conn, err := connect("fuzz@tcp(localhost:3306)/fuzz")
	if err != nil {
		panic(err.Error())
	}

	err = initDB(conn)
	if err != nil {
		panic(err.Error())
	}

	for i := 0; i < threads; i++ {
		go fuzz(conn)
	}
	for time.Now().Before(endTime) {
	}
}

func fuzz(conn *sql.DB) {
	var err error
	for time.Now().Before(endTime) {
		op := rand.Intn(2)
		randomTables := randomTableSet(tables)
		switch op {
		case 0:
			err = updateTables(conn, randomTables)
		case 1:
			err = readTables(conn, randomTables)
		}
		if err != nil {
			fmt.Println(err.Error())
		}
	}
}

func connect(ds string) (conn *sql.DB, err error) {
	conn, err = sql.Open("mysql", ds)
	if err != nil {
		debug.PrintStack()
		log.Fatal(err)
	}
	err = conn.Ping()
	return
}

func initDB(conn *sql.DB) (err error) {
	for _, s := range initStatements {
		_, err = conn.Exec(s)
		if err != nil {
			fmt.Println("Failed " + s)
			return
		}
	}
	return
}

func updateTables(conn *sql.DB, tableNames []string) (err error) {
	table, others := randomTable(tableNames)
	fmt.Printf("WRITE %s READ %s\n",table, strings.Join(others, ", "))
	tx, err := conn.Begin()
	if err != nil {
		return
	}

	lockQuery := "LOCK TABLES " + table + " WRITE"
	if len(others) > 0 {
		lockQuery = lockQuery + ", " + strings.Join(others, " READ, ") + " READ"
	}

	_, err = tx.Exec(lockQuery)
	if err != nil {
		return
	}
	time.Sleep(time.Millisecond*150)
	id := randomId()
	stmt, err := tx.Prepare("UPDATE " + table + " SET val = val + round(rand()*2 - 1) WHERE id = ?")
	if err != nil {
		return
	}
	_, err = stmt.Exec(id)
	if err != nil {
		return
	}
	time.Sleep(time.Millisecond*50)
	err = stmt.Close()
	if err != nil {
		return
	}
	_, err = tx.Exec("UNLOCK TABLES")
	if err != nil {
		return
	}
	tx.Commit()

	return
}

func readTables(conn *sql.DB, tableNames []string) (err error) {
	fmt.Println("READ " + strings.Join(tableNames, ", "))
	tx, err := conn.Begin()
	if err != nil {
		return
	}
	_, err = tx.Exec("LOCK TABLES " + strings.Join(tableNames, " READ, ") + " READ")
	if err != nil {
		return
	}
	time.Sleep(time.Millisecond*150)
	tables := strings.Join(tableNames, ",")
	stmt, err := tx.Prepare("SELECT * FROM " + tables)
	if err != nil {
		return
	}
	_, err = stmt.Exec()
	if err != nil {
		return
	}
	time.Sleep(time.Millisecond*50)
	err = stmt.Close()
	if err != nil {
		return
	}
	_, err = tx.Exec("UNLOCK TABLES")
	if err != nil {
		return
	}
	err = tx.Commit()
	if err != nil {
		return
	}
	return
}

func randomId() int {
	return 1 + rand.Intn(3)
}

func randomTable(tables []string) (tab string, others []string) {
	i := rand.Intn(len(tables))
	tab = tables[i]
	others = make([]string, 0)
	for _, t := range tables {
		if t != tab {
			others = append(others, t)
		}
	}
	return
}

func randomTableSet(tables []string) []string {
	i := rand.Intn(len(tables))
	j := 1+rand.Intn(len(tables))
	if j < i {
		j, i = i, j
	}
	if i == j {
		j = j + 1
	}
	return tables[i:j]
}

Error log

From unit test run that jammed (first row lists the tx that's owned by the thread not progressing; 3 tables are locked for writes for that transaction)

MariaDB [test]> SHOW ENGINE INNODB STATUS;

---clip---

LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
mysql tables in use 3, locked 3
MySQL thread id 10187, OS thread handle 0x7f57a9f1fb00, query id 1576395 localhost 127.0.0.1 test 
---TRANSACTION 0, not started
MySQL thread id 10192, OS thread handle 0x7f57ae933b00, query id 1576444 localhost 127.0.0.1 test Waiting for table metadata lock
LOCK TABLES c WRITE,s WRITE,o READ
---TRANSACTION 11561723, not started
MySQL thread id 10188, OS thread handle 0x7f5710a22b00, query id 1576479 localhost 127.0.0.1 test Waiting for table metadata lock
LOCK TABLES u READ,o READ,c READ,v WRITE,vp WRITE,s READ
---TRANSACTION 11561721, not started
MySQL thread id 10183, OS thread handle 0x7f57ae97db00, query id 1576459 localhost 127.0.0.1 test Waiting for table metadata lock
LOCK TABLES c READ
---TRANSACTION 11561661, not started
MySQL thread id 10173, OS thread handle 0x7f579831fb00, query id 1576439 localhost 127.0.0.1 test 
---TRANSACTION 11561713, not started
MySQL thread id 10137, OS thread handle 0x7f57ae8e9b00, query id 1576415 localhost 127.0.0.1 test 
---TRANSACTION 11561702, not started
MySQL thread id 10136, OS thread handle 0x7f5721ecab00, query id 1576442 localhost 127.0.0.1 test Waiting for table metadata lock
LOCK TABLES v WRITE,vp WRITE,vs READ,vc READ
---TRANSACTION 0, not started
MySQL thread id 9633, OS thread handle 0x7f57a9fb3b00, query id 1576520 localhost root init
show engine innodb status
---TRANSACTION 11045635, not started
MySQL thread id 1, OS thread handle 0x7f57b2b40b00, query id 0 Waiting for background binlog tasks

---clip---

After killing connection 10187 the error that was produced in go was
[mysql] 2018/10/04 15:20:12 packets.go:36: unexpected EOF
and the rest of the test threads executed normally.

Configuration

Driver version (or git SHA): 1.4.0

Go version: go version go1.11 linux/amd64

Server version: 10.1.36-MariaDB-1~xenial

Server OS: Ubuntu 16.04

@methane
Copy link
Member

methane commented Oct 8, 2018

Here is issue tracker of driver. This is not a MySQL user forum or MySQL support.
Why do you post it here??
It seems just a behavior of MariaDB, and we are not free MariaDB consultant or tech support.

@tvenhola
Copy link
Author

tvenhola commented Oct 8, 2018

This is not a MariaDB issue. This is the driver thread deadlocking and not returning from statement prepare for a query.

@methane
Copy link
Member

methane commented Oct 8, 2018

driver doesn't use any locks between threads(connections).
Deadlock must not happened in driver's side.

@methane
Copy link
Member

methane commented Oct 8, 2018

Can you get goroutine dump when deadlock happens?
I think prepare statement is just waiting metadata lock for long time.

@julienschmidt

This comment has been minimized.

@methane
Copy link
Member

methane commented Oct 9, 2018

After looking and running your code, it seems normal deadlock on the server.
100% CPU utilization may be caused by MariaDB's lock. Same deadlock is happend on MySQL 8, but MySQL 8 doesn't use 100% CPU.

@methane methane closed this as completed Oct 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants