-
-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Stream support reopen parser #146
Conversation
I'm not familiar with the XMPP enough; just to clarify, as far as I understand, this kind of "reopening" is necessary only if the stream contains multiple root elements. Is that your case? If yes, then it is probably better to make it a configuration option which would allow multiple root elements. It would make the usage much simpler. |
XMPP use a unique root element called When communicating, client and server write 'stanzas' into a stream. Here's an example of an XMPP stream when connecting: <?xml version="1.0" encoding="UTF-8"?>
<stream:stream xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client" to="example.com" version="1.0">
<message from="juliet@example.com" to="romeo@example.net" xml:lang="en">
<body>first message</body>
</message> after that, the client sends another message, the stream looks like this: <?xml version="1.0" encoding="UTF-8"?>
<stream:stream xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client" to="example.com" version="1.0">
<message from="juliet@example.com" to="romeo@example.net" xml:lang="en">
<body>first message</body>
</message>
<message from="juliet@example.com" to="romeo@example.net" xml:lang="en">
<body>second message</body>
</message> The client wants to close all communications with the server, he writes the closing tag: <?xml version="1.0" encoding="UTF-8"?>
<stream:stream xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client" to="example.com" version="1.0">
<message from="juliet@example.com" to="romeo@example.net" xml:lang="en">
<body>first message</body>
</message>
<message from="juliet@example.com" to="romeo@example.net" xml:lang="en">
<body>second message</body>
</message>
</stream:stream> |
Okay, I see. If it is like this, then why do you actually need to "reset" the parser? The parser should continue reading until the closing tag for the root element is reached. Take a look at this example: https://github.com/netvl/xml-rs/blob/streaming-test/tests/streaming.rs. You can clone that branch and run Granted, the streaming parsing is not supported entirely well (e.g. character data cannot be streamed, which can be seen from the comment in the test), since I hadn't kept this use case in mind when I was writing the parser, but it still should be usable for the XMPP use case. |
if you try this: write_and_reset_position(it.source_mut(), b"<child-2/>");
assert_match!(it.next(), Some(Ok(XmlEvent::StartElement { ref name, .. })) if name.local_name == "child-2");
assert_match!(it.next(), Some(Ok(XmlEvent::EndElement { ref name })) if name.local_name == "child-2");
// Add a blank call, this one return an Error
let _ = it.next();
write_and_reset_position(it.source_mut(), b"</root>");
// it.next() return None, instead of a EndElement as expected
assert_match!(it.next(), Some(Ok(XmlEvent::EndElement { ref name })) if name.local_name == "root");
assert_match!(it.next(), Some(Ok(XmlEvent::EndDocument)));
assert_match!(it.next(), None); It's because of this: https://github.com/netvl/xml-rs/blob/streaming-test/src/reader/parser/mod.rs#L243 We call |
First, I'm actually really surprised that it returns Second, why do you actually need to call On the second thought, I probably understand why - that's because multiple new elements are possible, and you don't know in advance how much, right? If so, then yes, I believe it is a valid case. I think that the proper solution would be to add a configuration option which would allow retrying the read at the end of stream, and which would make the parser always return the let mut it = parser.into_iter();
read_more_data: loop {
read_data(it.source_mut());
for e in &mut it {
match e {
Ok(...) => ..., // process data
Err(ref e) if e.is_unexpected_eof() => continue read_more_data,
Err(e) => return Err(e),
}
}
} where impl Error {
pub fn is_unexpected_eof(&self) -> bool {
match self.kind {
ErrorKind::Io(e) if e.kind() == io::ErrorKind::UnexpectedEof => true,
_ => false,
}
}
} This, of course, also requires removal of the A naive solution for returning The proper solution would require modifying the parser and the lexer to read as much as they can from the stream and then either output what they could read (a characters event, for example), or returning an Nevertheless, I think that implementing a partial solution first does bring some benefits, since it would allow certain use cases like yours. |
How do you want to proceed for a partial implementation? |
I think that it would be something like what you did in this PR, but not upon external request through a special method, but upon checking a special flag in the parser configuration. If would like to do it, I would be very glad, since I don't know right now when I will be able to - I'm quite busy at my main job right now. |
@netvl I think it's a huge refacto to do a stream based analysis. A stream can contains unclosed tag, it will be catch has an error at one time but next iteration will feed the end of the tag. I really need to be able to reopen the parser. Can we consider this method has unsafe and ignore from documentation? |
774498d
to
4a11aae
Compare
@netvl I found a solution, let me know what you're thinking about it. thank's ❤️ |
8ddf8d6
to
4365c2e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! There are minor changes which need to be done, but I like the general idea. 👍
src/reader/lexer.rs
Outdated
@@ -221,7 +221,7 @@ pub struct Lexer { | |||
skip_errors: bool, | |||
inside_comment: bool, | |||
inside_token: bool, | |||
eof_handled: bool | |||
pub eof_handled: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer having an mutator method like fn reset_eof_handled(&mut self)
instead of making the field public, simply because other fields are not.
src/reader/config.rs
Outdated
/// By default this option is setted to false but if you want to deal with progressive inputs | ||
/// like a stream over tcp, you can set it to true to make the parser ignore end of stream | ||
/// error. | ||
pub ignore_end_of_stream: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to nitpick, but I prefer the documentation to be as clean as possible :)
Whether or not the parser should ignore the end of stream. Default is false.
By default the parser will either error out when it encounters a premature end of stream or complete normally if the end of stream was expected. If you want to continue reading from a stream whose input is supplied progressively, you can set this option to true. In this case the parser will allow you to invoke the
next()
method even if a supposed end of stream has happened.Note that support for this functionality is incomplete; for example, the parser will fail if the premature end of stream happens inside PCDATA. Therefore, use this option at your own risk.
src/reader/parser/mod.rs
Outdated
@@ -120,6 +120,9 @@ impl PullParser { | |||
pop_namespace: false | |||
} | |||
} | |||
|
|||
/// Tells if this parser are ignoring end of stream errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks if this parser ignores the end of stream errors.
tests/streaming.rs
Outdated
assert!(false); | ||
}, | ||
Some(Err(_)) => { | ||
assert!(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't actually need this.
tests/streaming.rs
Outdated
match reader.next() { | ||
None | | ||
Some(Ok(_)) => { | ||
assert!(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to write a panic!()
with some helpful error message.
Signed-off-by: Freyskeyd <simon.paitrault@gmail.com>
@netvl I fixed :) |
This is great, thanks! I'll release a new version as soon as I get to a computer with the Rust environment on it. |
Sorry for the delay! I just released version 0.5.0 with this and one other fix. |
It's actually 0.6.0 now, since another incompatible change has just got merged :) |
Hello,
I'm working on an xmpp library and I need to be able to reopen the parser to consume new bytes.
It's a first step to be able to reset some condition, I use it like that:
What do you think?
Signed-off-by: Freyskeyd simon.paitrault@gmail.com