Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework parsing: More strict and (hopefully) cleaner #12

Merged
merged 20 commits into from
May 25, 2018

Conversation

Lingepumpe
Copy link
Contributor

Reworked every parser file, with the exception of extensions.rs.

Changes and cleanup were mostly:

  • Every parser is now responsible for consuming his own starting and closing tags (previously some
    parsers would be okay not having any starting tag, and would be used like this). To be able to achieve
    this any parser that calls further consume functions must use peek() to leave starting tags in the reader
    for the subparser to handle.
  • Many parsers were accepting any amount of their starting tag, and also consuming any closing tag as
    valid, ignoring the name. Now exactly one opening tag and matching closing tag is required.
  • Parsers that were already using peek() were doing so via a level of indirection through some form of
    "ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us
    to remove this indirection and parse XmlEvents more directly even with peek().
  • Added some more custom errors, and generally use custom errors more

- Do not accept arbitrary amounts of opening trk tags, we require exactly
  one at the beginning and do not allow further ones
- Require the closing tag to match the opening tag
- Use reader.peek() instead of reader.next() to not consume the opening
  tag for deeper parser functions
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Do not accept arbitrary amounts of opening gpx tags, we require exactly
  one at the beginning and do not allow further ones
- Do not accept any closing tag as tag that closes gpx, require name==gpx
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Avoid indirection of ParseEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening metadata tags, we require
  exactly one at the beginning and do not allow further ones
- Do not accept any closing tag as tag that closes metadata, require
  name==metadata
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Avoid indirection of ParseEvents, handle XmlEvents more directly
- Use custom OwnPoint type encapsulating geo::Point, and implement Default
  for this.
- Use OwnPoint instead of geo::Point in Waypoint struct
- Do not accept arbitrary amounts of opening trkpt or wpt tags, we
  require exactly one and consume is passed which one it is
- Do not accept any closing tag as tag that closes the waypoint,
  require it to match the opening tag name
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Use Waypoint struct to hold the intermediate values while parsing
- Use reader.peek() instead of reader.next() to not consume the opening
  tag for deeper parser functions
- Do not accept arbitrary amounts of opening trkseg tags, we
  require exactly one
- Do not accept any closing tag as tag that closes the tracksegment,
  require it to be trkseg
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Avoid indirection of TrackSegmentEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening person or author tags, we
  require exactly one
- Do not accept any closing tag as tag that closes the tracksegment,
  require the name to be the same as of the opening tag (author or person)
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Avoid indirection of ParseEvents, handle XmlEvents more directly
- Do not accept arbitrary amounts of opening email tags, must be exactly
  one
- Do not accept any closing tag as tag that closes the tracksegment,
  require the name to be the same as of the opening tag (author or person)
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Remove id and domain temporary variables and two unneeded clone() calls
- Handle end of XML without end of email tag with error instead of panic
- Do not accept arbitrary amounts of opening link tags, must be exactly
  one
- Do not accept any closing tag as tag that closes the link,
  require the name to be the same as of the opening tag
- Use new custom errors InvalidClosingTag and MissingClosingTag
- Use reader.peek() instead of reader.next() to not consume the opening
  tag for deeper parser functions
- consume now takes an additional parameter specifying the tag name the
  string parser is supposed to expect, it will only accept this tag
- Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors
- Accept only "bounds" as starting tag, and only "bounds" as closing tag
- Use InvalidChildElement, InvalidClosingTag and MissingClosingTag errors
- Eliminate mutability and initialisation of bounds variable
@brendanashworth brendanashworth self-requested a review March 29, 2018 05:05
src/types.rs Outdated
@@ -171,12 +171,26 @@ impl ToGeo<f64> for TrackSegment {
}
}

// A Version of geo::Point that has the Default trait implemented
#[derive(Clone, Debug)]
struct OurPoint {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using a one-field tuple here be preferable/cleaner? Also, please rename it to GpxPoint and add a comment explaining why we added default in the first place.

struct GpxPoint(Point);

impl Default for GpxPoint {
    fn default() -> GpxPoint {
        GpxPoint(Point::new(0., 0.))
    }
}

})) => {
ensure!(
name.local_name == local_name,
ErrorKind::InvalidChildElement(name.local_name, local_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, this would give an error of "invalid child element 'wpt' in 'wpt'", right? Because we don't keep track of the parent element?

Or would this even happen because we check the name beforehand, so instead we should be dying here or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think it should be wpt in wpt, note the first is name.local_name, which is the actual starting tag we found, and the other is "local_name", which is the expected local_name (the starting tag we are looking for). So I think what would happen, when looking for the wpt starting tag and getting trk starting tag instead is "invalid child element trk in wpt".

I agree this is stretching the InvalidChildElement error a bit, we could add a InvalidStartingElement error that would read something like "invalid starting element trk found, expected wpt". Tell me if you think that is a good idea.

Note that the way the parser currently functions it shouldnt be possible to produce this error except in gpx - because all other parsers are sub-parsers, which would only be called if the parent parser already found their starting tag via peek().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense. It shouldn't need changing then, at least for now. Thanks for the clarity.

Some(Ok(XmlEvent::Characters(chars))) => {
bail!(ErrorKind::InvalidChildElement(chars, local_name));
}
Some(_) => {} //ignore other elements
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of event would we be skipping here, shouldn't it always be a StartElement at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at https://netvl.github.io/xml-rs/xml/reader/enum.XmlEvent.html you can see there are a lot of events possible. My "set parser config when spawning the XmlReader" commit that I added below will at least turn XmlEvent::Whitespace and XmlEvent::CData to become XmlEvent::Character. XmlEvent::Comment is disabled by default.

This leaves us with these possible events for the Some(_) line: StartDocument, EndDocument and ProcessingInstruction. Especially StartDocument will always occur at the beginning of any xml document (even if there is no explicit StartDocument tag), so it is always emitted in our test functions. EndDocument we could handle and bail! on it, but we can also just ignore it and get "None" in the next loop iteration. Finally ProcessingInstruction sounds weird, I am not sure how they work or if they exist for gpx, so for now I thought it safe to ignore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for clearing that up!

@@ -260,6 +190,7 @@ mod tests {
GpxVersion::Unknown
);

println!("########################### {:?}", gpx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this?

@brendanashworth
Copy link
Member

@Lingepumpe wow, thank you for the fantastic work you've put together. I really like what you've done with the parser. Just a few questions/suggestions about your implementation.

- Remove left over debugging println in gpx.rs
- Rename OurPoint to GpxPoint, and use a tuple struct instead of a regular
  struct.
- Longer comment on the value of having Default implemented for GpxPoint
@Lingepumpe
Copy link
Contributor Author

I believe I addressed all review concerns, or is there something else you would like me to fix? Or will some other person review as well? Just checking the ball is not in my court ;)

Copy link
Member

@frewsxcv frewsxcv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brendanashworth the rewrite looks good to me, any other thoughts from you?

@brendanashworth
Copy link
Member

@Lingepumpe sorry about the delay, I've been unable to compile your changes due to some issues with (presumably) my Rust setup, and I've also been very busy. I should have time to do a final review and merge this within the next two days, thanks for being patient. This looks really nice though.

frewsxcv and others added 2 commits May 20, 2018 12:14
- Reorder use alphabetically to please cargo fmt
- Remove duplicate use
- Remove unused use
@Lingepumpe
Copy link
Contributor Author

It seems cargo fmt is now more restrictive with regards to "use": They are ordered in a certain way, and I also got more duplicate warnings. Updated the pull request to include those fixes.

@frewsxcv
Copy link
Member

@Lingepumpe thanks for fixing the cargo fmt issue

@brendanashworth still planning to look at this? otherwise, i'm thinking about merging this tomorrow

@frewsxcv
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request May 25, 2018
12: Rework parsing: More strict and (hopefully) cleaner r=frewsxcv a=Lingepumpe

Reworked every parser file, with the exception of extensions.rs.

Changes and cleanup were mostly:
- Every parser is now responsible for consuming his own starting and closing tags (previously some
  parsers would be okay not having any starting tag, and would be used like this). To be able to achieve
  this any parser that calls further consume functions must use peek() to leave starting tags in the reader
  for the subparser to handle.
- Many parsers were accepting any amount of their starting tag, and also consuming any closing tag as
  valid, ignoring the name. Now exactly one opening tag and matching closing tag is required.
- Parsers that were already using peek() were doing so via a level of indirection through some form of
  "ParseEvent" enum, I think I have found a cleaner solution using .clone() on the event which allows us
  to remove this indirection and parse XmlEvents more directly even with peek().
- Added some more custom errors, and generally use custom errors more

Co-authored-by: Peter Kirk <peter.kirk@student.uibk.ac.at>
Co-authored-by: Corey Farwell <coreyf@rwell.org>
@bors
Copy link
Contributor

bors bot commented May 25, 2018

Build succeeded

@bors bors bot merged commit f57a099 into georust:master May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants