Skip to content

[BUG] Gateway downtime due to repeated unlocking #3147

@soluty

Description

@soluty

OpenIM Server Version

3.8.2 最新版本应该也有

Operating System and CPU Architecture

Linux (AMD)

Deployment Method

Source Code Deployment

Bug Description and Steps to Reproduce

报错如下:

fatal error: sync: unlock of unlocked mutex
goroutine 460 [running]:
sync.fatal({0x16d89f0?, 0x121e2dc?})
	/root/.vmr/versions/go_versions/go/src/runtime/panic.go:1007 +0x18
sync.(*Mutex).unlockSlow(0xc02f5d34e0, 0xffffffff)
	/root/.vmr/versions/go_versions/go/src/sync/mutex.go:229 +0x35
sync.(*Mutex).Unlock(...)
	/root/.vmr/versions/go_versions/go/src/sync/mutex.go:223
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Client).writeBinaryMsg(0xc03b294820, {0x7d1, {0x0, 0x0}, {0xc03117c7b0, 0x22}, 0x0, {0x0, 0x0}, {0xc02d218000, ...}})
	/repo/open-im-server/internal/msggateway/client.go:369 +0x1a8
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Client).PushMessage(0xc03b294820, {0x19176d8, 0xc006be2630}, 0xc006be42c0)
	/repo/open-im-server/internal/msggateway/client.go:325 +0x36b
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Server).pushToUser(0xc0001d07e0, {0x19176d8, 0xc006be2630}, {0xc02b8b8f90, 0x14}, 0xc006be42c0)
	/repo/open-im-server/internal/msggateway/hub_server.go:150 +0x3b0
github.com/openimsdk/open-im-server/v3/internal/msggateway.(*Server).SuperGroupOnlineBatchPushOneMsg.func1()
	/repo/open-im-server/internal/msggateway/hub_server.go:177 +0x45
github.com/openimsdk/tools/mq/memamq.(*MemoryQueue).initialize.func1()
	/root/go/pkg/mod/github.com/openimsdk/tools@v0.0.50-alpha.32/mq/memamq/queue.go:54 +0x75
created by github.com/openimsdk/tools/mq/memamq.(*MemoryQueue).initialize in goroutine 1
	/root/go/pkg/mod/github.com/openimsdk/tools@v0.0.50-alpha.32/mq/memamq/queue.go:51 +0x65

相关代码和分析如下:
在msggateway/client.go文件中, 有一个锁 c.w, 它有writeBinaryMsg方法和ResetClient方法

func (c *Client) writeBinaryMsg(resp Resp) error {
	if c.closed.Load() {
		return nil
	}

	encodedBuf, err := c.Encoder.Encode(resp)
	if err != nil {
		return err
	}

	c.w.Lock()
	defer c.w.Unlock()
func (c *Client) ResetClient(ctx *UserConnContext, conn LongConn, longConnServer LongConnServer) {
	c.w = new(sync.Mutex)

在writeBinaryMsg方法中c.w拿到锁了, 结果通过ResetClient把c.w重置了, 然后writeBinaryMsg方法中的c.w换成了重置的锁, 这时走到unlock的时候就panic了,
目前我的改动如下

func (c *Client) ResetClient(ctx *UserConnContext, conn LongConn, longConnServer LongConnServer) {
  if c.w == nil  {
    c.w = new(sync.Mutex)
  }	

一个Client对象持有一把锁就够了, 不知道这样会不会导致别的问题, 也想问下你们要怎么去修复这个bug.
并且这个问题也我之前碰到的concurrent write 问题起因差不多一致.

Screenshots Link

No response

Metadata

Metadata

Assignees

Labels

bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions