Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewriteaof 存在死锁的可能 #32

Closed
huangleee opened this issue Jul 26, 2021 · 7 comments
Closed

rewriteaof 存在死锁的可能 #32

huangleee opened this issue Jul 26, 2021 · 7 comments

Comments

@huangleee
Copy link

aof重写步骤为:

  1. 开始rewrite,对应startRewrite方法

    1. 获取写锁,暂停aof写入

    2. 获取aof文件大小

    3. 创建rewriteBuffer channel(带缓冲channel)

    4. 生成临时文件

    5. 释放写锁,恢复aof写入

  2. 读取aof文件内容,加载内容到临时DB对象

  3. 根据临时DB对象的数据,生成命令写入临时文件

  4. 结束rewrite

    1. 获取写锁,暂停aof写入

    2. 读取rewriteBuffer,写入aof临时文件

    3. 关闭rewriteBuffer,并设置为nil

    4. 重命名临时文件为aof文件

    5. open 新的aof文件,并设置为db.aofFile

    6. 释放写锁,恢复aof写入

而主程序,在第一步与第四步之间,一直可以写入aof chan,在处理aof chan中的数据时,同步写入一份到rewriteBuffer chan中,此时会存在一个问题: 程序写入aof时命令较多,超过了rewriteBuffer的缓冲大小,此时会出现 handleAof方法获取到了读锁,但是在写入rewriteBuffer时,阻塞住了,无法释放读锁
image

而 finishRewrite 方法,在结束rewrite时,需要先获取到写锁,才会接收 rewriteBuffer chan的数据,就会出现锁已被 handleAof占用,finishRewrite方法获取不到锁的情况,从而导致死锁

image

@HDT3213
Copy link
Owner

HDT3213 commented Jul 26, 2021

明白, 还真有点难处理。我把 rewrite buffer 改到临时文件好了

@huangleee
Copy link
Author

提供一种解决方法,如下:

image

@HDT3213
Copy link
Owner

HDT3213 commented Jul 28, 2021

这个方案没有解决 aofRewriteBuffer <- cmd 会阻塞的问题。handleAof 被阻塞后,db.aofChan 很快也会填满,进而阻塞主协程的 addAof 函数, 最终导致所有写命令被阻塞,同时 goroutine 数量迅速上升。我觉得还是从源头上解决 aofRewriteBuffer 会被填满的问题比较好

@huangleee
Copy link
Author

是的,上面的方案只是解决了死锁的问题。要解决aofRewriteBuffer阻塞的问题,还有个方法:

  1. 直接去掉aofRewriteBuffer,handleAof方法中,收到aofChan的数据后,获取锁,不再同步一份数据到aofRewriteBuffer。
  2. startRewrite方法中,已经获取到了aof 文件的filesize A,这个filesize在finishRewrite方法中可以用上。
  3. finishRewrite方法中,获取到锁之后,handleAof方法无法获取到锁,也就没办法继续往aof文件中写。finishRewrite方法此时再对aof文件sync落盘,此时从filesize A 到aof 文件尾,就是在重写aof文件过程中,处理的请求,直接将这些操作读出来写到临时文件,最后mv就行了

@HDT3213
Copy link
Owner

HDT3213 commented Jul 30, 2021

好主意

@holicc
Copy link

holicc commented Aug 17, 2021

读取aof文件内容,加载内容到临时DB对象

如果当前数据库中有1G的数据,再加载一份AOF文件岂不是内存又多占用了1G?

@HDT3213
Copy link
Owner

HDT3213 commented Oct 24, 2021

fixed in 48ba261

@HDT3213 HDT3213 closed this as completed Oct 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants