Stream parser that doesn't buffer the entire message #105

evpopov · 2022-12-16T13:14:59Z

Hi,
Parts of this may have been touched on in #100 but I wanted to start a clean discussion here.

I'm trying to redesign an RPC-like protocol to use MessagePack. The protocol was historically based on TLVs and
runs on a bare-metal system that is quite constrained on memory. The idea is for my device to accept
calls over a TCP stream. Each call would consist of a command, parameters, data, etc and the device would execute
whatever is being requested. One of those requests could be for a file upload or firmware upgrade and naturally,
such an RPC call would be hundreds of kb if not a megabyte or two and I don't have anywhere near enough RAM to buffer
the entire message. Historically, I'd have TLVs for the command, and data which allows me to know how long each section
is as well as skip over parts of the request if I don't have to parse them. I'd parse the command TLV, so naturally,
I'd know that for example I needed to save the data TLV to a file as that data TLV was being parsed a few hundred
bytes at a time. Using TLVs also helps when transferring data over TCP because TCP is stream based and in many
cases I need to know the size of the transfer ahead of time. MessagePack gives me the same benefit here because
a MessagePack object is of known size.

I'm trying to redesign this TLV-based approach and have the entire request encoded inside a MessagePack message
because this makes the protocol more "standard" rather than being defined by random proprietary TLVs. To do this,
I need to be able to "feed" the parser with random ammounts of data as it becomes available while at the same
time handle whatever the parser has decoded so far. After digging through the very good manual and trying the
different APIs I'm almost finding what I need but not quite....

NodeAPI: mpack_tree_init_stream() seems to be almost exactly what I need because I simply simulate the "feeding"
functionality through the read_fn() function and use mpack_tree_try_parse() to handle objects as they get decoded.
The read_fn() function can return zero if I don't have any new data and life is good..... Except that the NodeAPI
expects to be able to buffer the whole message and in my case, the message may be a multi-megabyte file upload.

ReaderAPI with fill and skip functions: That API gives me the freedom to parse the objects as they come in which is
ideal, but the fill function is not allowed to return zero. The problem here is that my task cannot block. It has
other things to do. Periodically, it checks for new data from the socket and can "feed" that new data to the parser,
but I simply cannot block the task.
I though maybe I could call mpack_read_tag() only if I have accumulated some data, but there is no way to know how
many bytes mpack_read_tag() will want to consume.

Am I missing something? Is there a way to parse a stream and handle the data a few bytes at a time without
buffering the entire message?

Thanks in advance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream parser that doesn't buffer the entire message #105

Stream parser that doesn't buffer the entire message #105

evpopov commented Dec 16, 2022 •

edited

Loading

Stream parser that doesn't buffer the entire message #105

Stream parser that doesn't buffer the entire message #105

Comments

evpopov commented Dec 16, 2022 • edited Loading

evpopov commented Dec 16, 2022 •

edited

Loading