Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugReport] Memory Leak on multipart_reader::currentHeaders #113

Closed
vrqq opened this issue Jun 30, 2019 · 8 comments · Fixed by #115
Closed

[BugReport] Memory Leak on multipart_reader::currentHeaders #113

vrqq opened this issue Jun 30, 2019 · 8 comments · Fixed by #115
Labels
bug Something isn't working

Comments

@vrqq
Copy link
Contributor

vrqq commented Jun 30, 2019

在上传小文件时会偶尔翻车,加AddressSanitizer提示function multipart_reader::cbHeaderEnd()函数内heap-use-after-free on address。

如图所示,将HTTP POST请求拆成2部分送达feed()函数
图链接
备用地址:draw.io

  • 我们称第一个request fragment送来的地址是bufferA
    第一次把bufferA送入multipart_parser::feed()后,状态机调用multipart_reader::cbHeaderEnd()函数,将两个string_view存入multipart_headers::currentHeaders中,结束后currentHeaders.size()==1

  • 第二个request fragment随之到来,存于bufferB
    依旧是feed()调用cbHeaderEnd(),
    此时bufferA已经释放,导致上述两个multipart_headers::currentHeaders中的string_view悬空
    程序再次调用multipart_reader::cbHeaderEnd()时,multimap试图读取原有string_views内容时读到了已经被释放的地址,在此处crash。

如果幸运的话,分片不会把某一组header隔开,就不会触发bug。但也可以构造一个超级长的header,然后强行分片送抵服务器,就一定会触发。

解决方案?
想到把using multipart_headers = std::multimap<std::string_view, std::string_view>;
改成using multipart_headers = std::multimap<std::string, std::string>;
(我看到已经写了注释??)

//self->currentHeaders.emplace(std::string{ self->currentHeaderName.data(), self->currentHeaderName.length() },
// std::string{ self->currentHeaderValue.data(), self->currentHeaderValue.length() });

或者设计一个垃圾回收,触发multipart_reader::cbHeadersEnd()时,把之前的buffer再收掉。

还有一个想法 buffer改用类似链表结构,把已经submit的string_view直接砍掉,会不会造成内存很零散?。。。

另外还有一个enhancement
在class upload_file里面,打开文件失败了请给个提示。。

第二组bug: feed()状态机不能处理多次送达的buffer

针对这几行举例,依然是两个post碎片,CR没有随着第一次feed(buffer1)到来,而在下一次feed(buffer2)时才送来。但是string_view引用的buffer1已经没了。

case HEADER_VALUE_START:
if (c == SPACE) {
break;
}
headerValueMark = i;
state = HEADER_VALUE;
case HEADER_VALUE:
if (c == CR) {
dataCallback(onHeaderValue, headerValueMark, buffer, i, len, true, true);
callback(onHeaderEnd);
state = HEADER_VALUE_ALMOST_DONE;
}
break;

第一次送来的buffer:

-----------------------------4833311154639
Content-Disposition: form-data; name="file"; filename="20190628001428.png" (结尾没有\r)

第二次feed送来的buffer:

\r
Content-Type: image/png\r

PNG<binary data......binary data......>
-----------------------------4833311154639--

错误依旧为 heap-use-after-free in cinatra::multipart_reader::cbHeaderEnd

解决方案?
同上吧,我觉得存下来比较好,我们没办法保证数据包在哪里断掉。。

第三组bug:当feed()分片到达时,可能会打断正在解析的header内容

在multipart_reader.hpp中
self->currentHeaderNameself->currentHeaderValue 两个变量,第二次到达的会覆盖前一次到达的值。

同样我觉得用string比较好操作,直接连接起来。。或者就滚动buffer,改下状态机不做multipart_parser.hpp#L192-L194

这个bug的确定理由:

Breakpoint at multipart_reader.hpp:90
(gdb) p self->currentHeaderValue
	form-data; name=\"file\"; filename=\"2019

Breakpoint at multipart_reader.hpp:90
(gdb) p self->currentHeaderValue
	"0628001428.png\""

Breakpoint at connection.hpp:562
(gdb) p headers
	std::multimap with 2 elements = {["Content-Disposition"] = "1428.png\"", ["Content-Type"] = "image/png"}
@qicosmos qicosmos added the bug Something isn't working label Jul 1, 2019
@qicosmos
Copy link
Owner

qicosmos commented Jul 1, 2019

谢谢你提交issue,我会关注这个问题。

@vrqq
Copy link
Contributor Author

vrqq commented Jul 2, 2019

可以在feed()函数第一句输出buffer地址来确认这个问题,可以发现对于单次post请求,确实会有概率多次触发feed()函数。

@qicosmos
Copy link
Owner

qicosmos commented Jul 3, 2019

用postman怎么做这个测试的?

@vrqq
Copy link
Contributor Author

vrqq commented Jul 3, 2019

sorry我没有用postman,就在用firefox不断上传 27,046Bytes的图

@vrqq
Copy link
Contributor Author

vrqq commented Jul 3, 2019

测试我觉得可以模拟网络丢包试下,也许可以看到分片了。。
还有可以用超长文件名,构建一个加长header。。

server端我是在feed()输出下第一个包的大小,如果第一个包差不多能把header分开,就在断点停一下。
(我更新了第一条,又加了一个bug)

@vrqq
Copy link
Contributor Author

vrqq commented Jul 20, 2019

Hello, I submit a PR, could you have interesting to merge it after testing?...
(It also easy me to use this project)

@qicosmos
Copy link
Owner

你能准备一个测试的页面或者程序发出来吗,我想先用你提供的测试程序看看问题,确定问题在哪儿,然后再看pr

@vrqq
Copy link
Contributor Author

vrqq commented Jul 22, 2019

build cinatra_example and run this code with python 3:

import socket
import time

s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('127.0.0.1', 8080))
print('*Connected')

data1 = b'POST /upload_multipart HTTP/1.1\r\n' \
b'Host: 127.0.0.1\r\n' \
b'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0\r\n' \
b'Accept: */*\r\n' \
b'Accept-Language: en-US,en;q=0.5\r\n' \
b'Accept-Encoding: gzip, deflate\r\n' \
b'Content-Type: multipart/form-data; boundary=---------------------------191691572411478\r\n' \
b'Content-Length: 925\r\n' \
b'Connection: keep-alive\r\n' \
b'Pragma: no-cache\r\n' \
b'Cache-Control: no-cache\r\n' \
b'\r\n' \
b'-----------------------------191691572411478\r\n' \
b'Content-Disposition: form-data; name="na'

data2 = b'me"\r\n' \
b'\r\n' \
b'plaintext_document_withlongname.txt\r\n' \
b'-----------------------------191691572411478\r\n' \
b'Content-Disposition: form-data; name="type"\r\n' \
b'\r\n' \
b'text/plain\r\n' \
b'-----------------------------191691572411478\r\n' \
b'Content-Disposition: form-data; name="size"\r\n' \
b'\r\n' \
b'259\r\n' \
b'-----------------------------191691572411478\r\n' \
b'Content-Disposition: form-data; name="lastModifiedDate"\r\n' \
b'\r\n' \
b'undefined\r\n' \
b'-----------------------------191691572411478\r\n' \
b'Content-Disposition: form-data; name="file"; filename="plaintext_docu'

data3 = b'ment_withlongname.txt"\r\n' \
b'Content-Type: text/plain\r\n' \
b'\r\n' \
b'The 2011 European Women Basketball Championship, commonly called EuroBasket Women 2011, was the 33rd regional championship held by FIBA Europe. The competition was held in Poland from 2011. This was the 4th time that the EuroBasket Women was hosted by Poland.\r\n' \
b'-----------------------------191691572411478--\r\n'

print (data1 + data2 + data3)

s.send(data1)
print('*SENT -A')
time.sleep(1)
s.send(data2)
print('*SENT -B')
time.sleep(1)
s.send(data3)
print('*SENT -C')
time.sleep(1)

s.close()
print('*Close')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants