Add a streaming API #113

linkmauve · 2016-01-04T19:13:01Z

When a document is received in chunks (on XMPP for example), it makes sense to initialize the parser on the first chunk, and then feed it data as it comes.

xml::ParserConfig would get a new streaming boolean that would make it never emit xml::XmlEvent::EndDocument before the root tag is closed, a feed method to xml::EventReader, taking a string and emitting newer xml::XmlEvents as they are parsed, and finally add a method to abort the stream.

The text was updated successfully, but these errors were encountered:

netvl · 2016-01-04T20:14:01Z

Frankly, I'm not sure what you mean under streaming here. According to the XML standard, the document is either well-formed or not; well-formedness errors are hard errors, that is, they unconditionally stop the parsing process. If a document does not have a valid element structure (e.g. there is no closing tag for some opening tag), then the document is not well-formed. EventReader strives to follow the specification, therefore, EndDocument should only be emitted if the root element is closed and there are no problems with the elements structure. It is a parsing error otherwise. If that's not the case, it is a bug.

Additionally, each EventReader has only one Read instance as a backing source; I'm not sure how additional methods which provide more data come into picture here.

Could you please explain in detail how exactly what you propose should work and why existing API cannot do the same thing?

linkmauve · 2016-01-04T20:25:15Z

I’m currently trying to write an XMPP client library, and this protocol starts with a prolog, then a <stream:stream …> root element with various attributes and namespace declarations, and then children of this stream:stream will be received on various events. When the user disconnects, they send </stream:stream> followed by EOF and wait for the server to do the same.

The XML “document” that makes this session is well-formed (any error will lead to the stream being closed), and has a few restrictions like no PI, no comment, no doctype/entity declaration, etc.

I am currently (ab?)using the xml::EventReader::from_str method to parse a single stanza (the name given to the direct children of stream:stream), but the issue is that it creates a newer parser for every element received, and doesn’t play well with the namespaces declared before.

My first prototype was creating an xml::EventReader from the std::tcp::TcpStream corresponding to the session, but it was waiting for the stream to EOF instead of emitting xml::XmlElements as they came.

I since moved to mio::tcp::TcpStream, which provides non-blocking IO but doesn’t implement std::io::Read, and also plan on adding TLS encryption to this stream, so pushing &str to the parser made sense but I’m open to any better suggestion.

netvl · 2016-01-04T20:46:25Z

Hmm, I see now, thanks for the explanation.

That EventReader waited for stream EOF before emitting the next event is likely a bug. Ideally it should read no more than absolutely needed from the stream to produce the next event, so if the next stanza is fully available in the stream, it should result into an event immediately. This is worth investigating, thanks.

And I do see the problem with byte sources not implementing Read. First, are you sure that mio::tcp::TcpStream does not implement Read? Its documentation hints otherwise. Second, I'm really unwilling to add any other byte sources aside from Read instances. I would suggest the following approach.

EventReader would provide a &mut R reference to its internal reader, and a configuration option would be added which would allow graceful passthrough of EOF errors. Then you can put a Cursor<Vec<u8>> inside the EventReader and push incoming data to it through &mut R reference, while reading events from it with conventional means.

What do you think?

linkmauve · 2016-01-10T20:55:22Z

Hi, sorry for the late reply.

Making the parser emit XmlEvents before EOF (triggered every time some new data are pushed into the mio::tcp::TcpStream) would be perfect for my usecase. It will require the ability to close the document at some point, like after authentication or after STARTTLS negotiation, so I’m not sure if that’s perfect yet.

I was indeed wrong, mio::tcp::TcpStream does implement Read, so my previous arguments are void. :)

netvl · 2016-01-12T15:01:08Z

@linkmauve note that since the parser is pull-based, events are not triggered in response to the new data in the stream; rather, upon the user request the parser attempts to read more data from the input stream, and thus its behavior depends on the behavior of the source (it may block or return an error).

That said, I don't know how mio::tcp::TcpStream behaves when there is no data in the socket yet. Does it block or does it return EOF or something else? If it blocks, then there is little xml-rs could do; if it returns some special value, probably an error, it will be propagated for the user consumption, and we only need to allow the parser not to terminate upon certain errors.

I'm also not sure what do you mean under closing the document. The document will be "closed" automatically when the last closing tag arrives. This does not require special actions from the user.

linkmauve · 2017-05-23T19:31:12Z

With #146 being merged, this issue is now fixed, thanks!

linkmauve closed this as completed May 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a streaming API #113

Add a streaming API #113

linkmauve commented Jan 4, 2016

netvl commented Jan 4, 2016

linkmauve commented Jan 4, 2016

netvl commented Jan 4, 2016

linkmauve commented Jan 10, 2016

netvl commented Jan 12, 2016

linkmauve commented May 23, 2017

Add a streaming API #113

Add a streaming API #113

Comments

linkmauve commented Jan 4, 2016

netvl commented Jan 4, 2016

linkmauve commented Jan 4, 2016

netvl commented Jan 4, 2016

linkmauve commented Jan 10, 2016

netvl commented Jan 12, 2016

linkmauve commented May 23, 2017