-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework parsing: More strict and (hopefully) cleaner #12
Conversation
- Do not accept arbitrary amounts of opening trk tags, we require exactly one at the beginning and do not allow further ones - Require the closing tag to match the opening tag - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions - Use new custom errors InvalidClosingTag and MissingClosingTag
- Do not accept arbitrary amounts of opening gpx tags, we require exactly one at the beginning and do not allow further ones - Do not accept any closing tag as tag that closes gpx, require name==gpx - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening metadata tags, we require exactly one at the beginning and do not allow further ones - Do not accept any closing tag as tag that closes metadata, require name==metadata - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly
- Use custom OwnPoint type encapsulating geo::Point, and implement Default for this. - Use OwnPoint instead of geo::Point in Waypoint struct
- Do not accept arbitrary amounts of opening trkpt or wpt tags, we require exactly one and consume is passed which one it is - Do not accept any closing tag as tag that closes the waypoint, require it to match the opening tag name - Use new custom errors InvalidClosingTag and MissingClosingTag - Use Waypoint struct to hold the intermediate values while parsing - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions
- Do not accept arbitrary amounts of opening trkseg tags, we require exactly one - Do not accept any closing tag as tag that closes the tracksegment, require it to be trkseg - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of TrackSegmentEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening person or author tags, we require exactly one - Do not accept any closing tag as tag that closes the tracksegment, require the name to be the same as of the opening tag (author or person) - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening email tags, must be exactly one - Do not accept any closing tag as tag that closes the tracksegment, require the name to be the same as of the opening tag (author or person) - Use new custom errors InvalidClosingTag and MissingClosingTag - Remove id and domain temporary variables and two unneeded clone() calls - Handle end of XML without end of email tag with error instead of panic
- Do not accept arbitrary amounts of opening link tags, must be exactly one - Do not accept any closing tag as tag that closes the link, require the name to be the same as of the opening tag - Use new custom errors InvalidClosingTag and MissingClosingTag - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions
- consume now takes an additional parameter specifying the tag name the string parser is supposed to expect, it will only accept this tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors
- Accept only "bounds" as starting tag, and only "bounds" as closing tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors - Eliminate mutability and initialisation of bounds variable
src/types.rs
Outdated
@@ -171,12 +171,26 @@ impl ToGeo<f64> for TrackSegment { | |||
} | |||
} | |||
|
|||
// A Version of geo::Point that has the Default trait implemented | |||
#[derive(Clone, Debug)] | |||
struct OurPoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using a one-field tuple here be preferable/cleaner? Also, please rename it to GpxPoint
and add a comment explaining why we added default in the first place.
struct GpxPoint(Point);
impl Default for GpxPoint {
fn default() -> GpxPoint {
GpxPoint(Point::new(0., 0.))
}
}
})) => { | ||
ensure!( | ||
name.local_name == local_name, | ||
ErrorKind::InvalidChildElement(name.local_name, local_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking, this would give an error of "invalid child element 'wpt' in 'wpt'", right? Because we don't keep track of the parent element?
Or would this even happen because we check the name beforehand, so instead we should be dying here or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think it should be wpt in wpt, note the first is name.local_name, which is the actual starting tag we found, and the other is "local_name", which is the expected local_name (the starting tag we are looking for). So I think what would happen, when looking for the wpt starting tag and getting trk starting tag instead is "invalid child element trk in wpt".
I agree this is stretching the InvalidChildElement error a bit, we could add a InvalidStartingElement error that would read something like "invalid starting element trk found, expected wpt". Tell me if you think that is a good idea.
Note that the way the parser currently functions it shouldnt be possible to produce this error except in gpx - because all other parsers are sub-parsers, which would only be called if the parent parser already found their starting tag via peek().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that makes sense. It shouldn't need changing then, at least for now. Thanks for the clarity.
Some(Ok(XmlEvent::Characters(chars))) => { | ||
bail!(ErrorKind::InvalidChildElement(chars, local_name)); | ||
} | ||
Some(_) => {} //ignore other elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of event would we be skipping here, shouldn't it always be a StartElement
at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look at https://netvl.github.io/xml-rs/xml/reader/enum.XmlEvent.html you can see there are a lot of events possible. My "set parser config when spawning the XmlReader" commit that I added below will at least turn XmlEvent::Whitespace and XmlEvent::CData to become XmlEvent::Character. XmlEvent::Comment is disabled by default.
This leaves us with these possible events for the Some(_) line: StartDocument, EndDocument and ProcessingInstruction. Especially StartDocument will always occur at the beginning of any xml document (even if there is no explicit StartDocument tag), so it is always emitted in our test functions. EndDocument we could handle and bail! on it, but we can also just ignore it and get "None" in the next loop iteration. Finally ProcessingInstruction sounds weird, I am not sure how they work or if they exist for gpx, so for now I thought it safe to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thanks for clearing that up!
src/parser/gpx.rs
Outdated
@@ -260,6 +190,7 @@ mod tests { | |||
GpxVersion::Unknown | |||
); | |||
|
|||
println!("########################### {:?}", gpx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this?
@Lingepumpe wow, thank you for the fantastic work you've put together. I really like what you've done with the parser. Just a few questions/suggestions about your implementation. |
- Remove left over debugging println in gpx.rs - Rename OurPoint to GpxPoint, and use a tuple struct instead of a regular struct. - Longer comment on the value of having Default implemented for GpxPoint
I believe I addressed all review concerns, or is there something else you would like me to fix? Or will some other person review as well? Just checking the ball is not in my court ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brendanashworth the rewrite looks good to me, any other thoughts from you?
@Lingepumpe sorry about the delay, I've been unable to compile your changes due to some issues with (presumably) my Rust setup, and I've also been very busy. I should have time to do a final review and merge this within the next two days, thanks for being patient. This looks really nice though. |
- Reorder use alphabetically to please cargo fmt - Remove duplicate use - Remove unused use
It seems cargo fmt is now more restrictive with regards to "use": They are ordered in a certain way, and I also got more duplicate warnings. Updated the pull request to include those fixes. |
@Lingepumpe thanks for fixing the cargo fmt issue @brendanashworth still planning to look at this? otherwise, i'm thinking about merging this tomorrow |
bors r+ |
12: Rework parsing: More strict and (hopefully) cleaner r=frewsxcv a=Lingepumpe Reworked every parser file, with the exception of extensions.rs. Changes and cleanup were mostly: - Every parser is now responsible for consuming his own starting and closing tags (previously some parsers would be okay not having any starting tag, and would be used like this). To be able to achieve this any parser that calls further consume functions must use peek() to leave starting tags in the reader for the subparser to handle. - Many parsers were accepting any amount of their starting tag, and also consuming any closing tag as valid, ignoring the name. Now exactly one opening tag and matching closing tag is required. - Parsers that were already using peek() were doing so via a level of indirection through some form of "ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us to remove this indirection and parse XmlEvents more directly even with peek(). - Added some more custom errors, and generally use custom errors more Co-authored-by: Peter Kirk <peter.kirk@student.uibk.ac.at> Co-authored-by: Corey Farwell <coreyf@rwell.org>
Build succeeded |
Reworked every parser file, with the exception of extensions.rs.
Changes and cleanup were mostly:
parsers would be okay not having any starting tag, and would be used like this). To be able to achieve
this any parser that calls further consume functions must use peek() to leave starting tags in the reader
for the subparser to handle.
valid, ignoring the name. Now exactly one opening tag and matching closing tag is required.
"ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us
to remove this indirection and parse XmlEvents more directly even with peek().