Rework parsing: More strict and (hopefully) cleaner #12

Lingepumpe · 2018-03-26T13:27:16Z

Reworked every parser file, with the exception of extensions.rs.

Changes and cleanup were mostly:

Every parser is now responsible for consuming his own starting and closing tags (previously some
parsers would be okay not having any starting tag, and would be used like this). To be able to achieve
this any parser that calls further consume functions must use peek() to leave starting tags in the reader
for the subparser to handle.
Many parsers were accepting any amount of their starting tag, and also consuming any closing tag as
valid, ignoring the name. Now exactly one opening tag and matching closing tag is required.
Parsers that were already using peek() were doing so via a level of indirection through some form of
"ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us
to remove this indirection and parse XmlEvents more directly even with peek().
Added some more custom errors, and generally use custom errors more

- Do not accept arbitrary amounts of opening trk tags, we require exactly one at the beginning and do not allow further ones - Require the closing tag to match the opening tag - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions - Use new custom errors InvalidClosingTag and MissingClosingTag

- Do not accept arbitrary amounts of opening gpx tags, we require exactly one at the beginning and do not allow further ones - Do not accept any closing tag as tag that closes gpx, require name==gpx - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly

- Do not accept arbitrary amounts of opening metadata tags, we require exactly one at the beginning and do not allow further ones - Do not accept any closing tag as tag that closes metadata, require name==metadata - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly

- Use custom OwnPoint type encapsulating geo::Point, and implement Default for this. - Use OwnPoint instead of geo::Point in Waypoint struct

- Do not accept arbitrary amounts of opening trkpt or wpt tags, we require exactly one and consume is passed which one it is - Do not accept any closing tag as tag that closes the waypoint, require it to match the opening tag name - Use new custom errors InvalidClosingTag and MissingClosingTag - Use Waypoint struct to hold the intermediate values while parsing - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions

- Do not accept arbitrary amounts of opening trkseg tags, we require exactly one - Do not accept any closing tag as tag that closes the tracksegment, require it to be trkseg - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of TrackSegmentEvents, handle XmlEvents more directly

- Do not accept arbitrary amounts of opening person or author tags, we require exactly one - Do not accept any closing tag as tag that closes the tracksegment, require the name to be the same as of the opening tag (author or person) - Use new custom errors InvalidClosingTag and MissingClosingTag - Avoid indirection of ParseEvents, handle XmlEvents more directly

- Do not accept arbitrary amounts of opening email tags, must be exactly one - Do not accept any closing tag as tag that closes the tracksegment, require the name to be the same as of the opening tag (author or person) - Use new custom errors InvalidClosingTag and MissingClosingTag - Remove id and domain temporary variables and two unneeded clone() calls - Handle end of XML without end of email tag with error instead of panic

- Do not accept arbitrary amounts of opening link tags, must be exactly one - Do not accept any closing tag as tag that closes the link, require the name to be the same as of the opening tag - Use new custom errors InvalidClosingTag and MissingClosingTag - Use reader.peek() instead of reader.next() to not consume the opening tag for deeper parser functions

- consume now takes an additional parameter specifying the tag name the string parser is supposed to expect, it will only accept this tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors

- Accept only "bounds" as starting tag, and only "bounds" as closing tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors - Eliminate mutability and initialisation of bounds variable

brendanashworth · 2018-03-29T05:37:40Z

src/types.rs

@@ -171,12 +171,26 @@ impl ToGeo<f64> for TrackSegment {
    }
 }

+// A Version of geo::Point that has the Default trait implemented
+#[derive(Clone, Debug)]
+struct OurPoint {


Would using a one-field tuple here be preferable/cleaner? Also, please rename it to GpxPoint and add a comment explaining why we added default in the first place.

struct GpxPoint(Point); impl Default for GpxPoint { fn default() -> GpxPoint { GpxPoint(Point::new(0., 0.)) } }

brendanashworth · 2018-03-29T07:25:16Z

src/parser/mod.rs

+            })) => {
+                ensure!(
+                    name.local_name == local_name,
+                    ErrorKind::InvalidChildElement(name.local_name, local_name)


Technically speaking, this would give an error of "invalid child element 'wpt' in 'wpt'", right? Because we don't keep track of the parent element?

Or would this even happen because we check the name beforehand, so instead we should be dying here or something?

I dont think it should be wpt in wpt, note the first is name.local_name, which is the actual starting tag we found, and the other is "local_name", which is the expected local_name (the starting tag we are looking for). So I think what would happen, when looking for the wpt starting tag and getting trk starting tag instead is "invalid child element trk in wpt".

I agree this is stretching the InvalidChildElement error a bit, we could add a InvalidStartingElement error that would read something like "invalid starting element trk found, expected wpt". Tell me if you think that is a good idea.

Note that the way the parser currently functions it shouldnt be possible to produce this error except in gpx - because all other parsers are sub-parsers, which would only be called if the parent parser already found their starting tag via peek().

I see, that makes sense. It shouldn't need changing then, at least for now. Thanks for the clarity.

brendanashworth · 2018-03-29T07:26:26Z

src/parser/mod.rs

+            Some(Ok(XmlEvent::Characters(chars))) => {
+                bail!(ErrorKind::InvalidChildElement(chars, local_name));
+            }
+            Some(_) => {} //ignore other elements


What kind of event would we be skipping here, shouldn't it always be a StartElement at this point?

If you look at https://netvl.github.io/xml-rs/xml/reader/enum.XmlEvent.html you can see there are a lot of events possible. My "set parser config when spawning the XmlReader" commit that I added below will at least turn XmlEvent::Whitespace and XmlEvent::CData to become XmlEvent::Character. XmlEvent::Comment is disabled by default.

This leaves us with these possible events for the Some(_) line: StartDocument, EndDocument and ProcessingInstruction. Especially StartDocument will always occur at the beginning of any xml document (even if there is no explicit StartDocument tag), so it is always emitted in our test functions. EndDocument we could handle and bail! on it, but we can also just ignore it and get "None" in the next loop iteration. Finally ProcessingInstruction sounds weird, I am not sure how they work or if they exist for gpx, so for now I thought it safe to ignore.

Interesting, thanks for clearing that up!

brendanashworth · 2018-03-29T07:28:05Z

src/parser/gpx.rs

@@ -260,6 +190,7 @@ mod tests {
            GpxVersion::Unknown
        );

+        println!("########################### {:?}", gpx);


Should we keep this?

brendanashworth · 2018-03-29T07:29:57Z

@Lingepumpe wow, thank you for the fantastic work you've put together. I really like what you've done with the parser. Just a few questions/suggestions about your implementation.

- Remove left over debugging println in gpx.rs - Rename OurPoint to GpxPoint, and use a tuple struct instead of a regular struct. - Longer comment on the value of having Default implemented for GpxPoint

Lingepumpe · 2018-04-01T07:42:24Z

I believe I addressed all review concerns, or is there something else you would like me to fix? Or will some other person review as well? Just checking the ball is not in my court ;)

frewsxcv

@brendanashworth the rewrite looks good to me, any other thoughts from you?

brendanashworth · 2018-04-16T07:40:54Z

@Lingepumpe sorry about the delay, I've been unable to compile your changes due to some issues with (presumably) my Rust setup, and I've also been very busy. I should have time to do a final review and merge this within the next two days, thanks for being patient. This looks really nice though.

- Reorder use alphabetically to please cargo fmt - Remove duplicate use - Remove unused use

Lingepumpe · 2018-05-21T09:19:27Z

It seems cargo fmt is now more restrictive with regards to "use": They are ordered in a certain way, and I also got more duplicate warnings. Updated the pull request to include those fixes.

frewsxcv · 2018-05-22T11:12:58Z

@Lingepumpe thanks for fixing the cargo fmt issue

@brendanashworth still planning to look at this? otherwise, i'm thinking about merging this tomorrow

frewsxcv · 2018-05-25T23:17:58Z

bors r+

12: Rework parsing: More strict and (hopefully) cleaner r=frewsxcv a=Lingepumpe Reworked every parser file, with the exception of extensions.rs. Changes and cleanup were mostly: - Every parser is now responsible for consuming his own starting and closing tags (previously some parsers would be okay not having any starting tag, and would be used like this). To be able to achieve this any parser that calls further consume functions must use peek() to leave starting tags in the reader for the subparser to handle. - Many parsers were accepting any amount of their starting tag, and also consuming any closing tag as valid, ignoring the name. Now exactly one opening tag and matching closing tag is required. - Parsers that were already using peek() were doing so via a level of indirection through some form of "ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us to remove this indirection and parse XmlEvents more directly even with peek(). - Added some more custom errors, and generally use custom errors more Co-authored-by: Peter Kirk <peter.kirk@student.uibk.ac.at> Co-authored-by: Corey Farwell <coreyf@rwell.org>

bors · 2018-05-25T23:30:41Z

Build succeeded

continuous-integration/travis-ci/push

Lingepumpe added 16 commits March 26, 2018 13:34

Remove unneeded String::from() in Errors

2fd8e88

Add custom errors InvalidClosingTag and MissingClosingTag

bc6e7ba

Give InvalidElementLacksAttribute a parent string

df5025e

Add helper function verify_starting_tag

a7db722

Work around geo::Point not having impl Default

538a382

- Use custom OwnPoint type encapsulating geo::Point, and implement Default for this. - Use OwnPoint instead of geo::Point in Waypoint struct

Add helper function verify_starting_tag (unused for now)

6b32296

Rewrite string::consume cleaner and more strict:

6c5b910

- consume now takes an additional parameter specifying the tag name the string parser is supposed to expect, it will only accept this tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors

Rewrite bounds::consume cleaner and more strict:

259707f

- Accept only "bounds" as starting tag, and only "bounds" as closing tag - Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors - Eliminate mutability and initialisation of bounds variable

brendanashworth self-requested a review March 29, 2018 05:05

brendanashworth reviewed Mar 29, 2018

View reviewed changes

Lingepumpe added 2 commits March 29, 2018 14:40

Address brendanashworths comments

bbdbf43

- Remove left over debugging println in gpx.rs - Rename OurPoint to GpxPoint, and use a tuple struct instead of a regular struct. - Longer comment on the value of having Default implemented for GpxPoint

Set a parser config when spawning the XmlReader

23ffe2f

frewsxcv approved these changes Apr 8, 2018

View reviewed changes

frewsxcv and others added 2 commits May 20, 2018 12:14

Merge branch 'master' into peek_tags

7fc5d7f

Make cargo fmt happy again

f57a099

- Reorder use alphabetically to please cargo fmt - Remove duplicate use - Remove unused use

bors bot merged commit f57a099 into georust:master May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework parsing: More strict and (hopefully) cleaner #12

Rework parsing: More strict and (hopefully) cleaner #12

Lingepumpe commented Mar 26, 2018

brendanashworth Mar 29, 2018

brendanashworth Mar 29, 2018

Lingepumpe Mar 29, 2018 •

edited

brendanashworth Mar 31, 2018

brendanashworth Mar 29, 2018

Lingepumpe Mar 29, 2018 •

edited

brendanashworth Mar 30, 2018

brendanashworth Mar 29, 2018

brendanashworth commented Mar 29, 2018

Lingepumpe commented Apr 1, 2018

frewsxcv left a comment

brendanashworth commented Apr 16, 2018

Lingepumpe commented May 21, 2018

frewsxcv commented May 22, 2018

frewsxcv commented May 25, 2018

bors bot commented May 25, 2018

Rework parsing: More strict and (hopefully) cleaner #12

Rework parsing: More strict and (hopefully) cleaner #12

Conversation

Lingepumpe commented Mar 26, 2018

brendanashworth Mar 29, 2018

Choose a reason for hiding this comment

brendanashworth Mar 29, 2018

Choose a reason for hiding this comment

Lingepumpe Mar 29, 2018 • edited

Choose a reason for hiding this comment

brendanashworth Mar 31, 2018

Choose a reason for hiding this comment

brendanashworth Mar 29, 2018

Choose a reason for hiding this comment

Lingepumpe Mar 29, 2018 • edited

Choose a reason for hiding this comment

brendanashworth Mar 30, 2018

Choose a reason for hiding this comment

brendanashworth Mar 29, 2018

Choose a reason for hiding this comment

brendanashworth commented Mar 29, 2018

Lingepumpe commented Apr 1, 2018

frewsxcv left a comment

Choose a reason for hiding this comment

brendanashworth commented Apr 16, 2018

Lingepumpe commented May 21, 2018

frewsxcv commented May 22, 2018

frewsxcv commented May 25, 2018

bors bot commented May 25, 2018

Build succeeded

Lingepumpe Mar 29, 2018 •

edited

Lingepumpe Mar 29, 2018 •

edited