-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WKT lexer #2360
WKT lexer #2360
Conversation
This looks like really nice work @GingerIK. I haven't given it a thorough review yet, but hope to find time to do so soon. I know the current tests are missing this, but it might be nice to see tests of the behavior when given invalid WKT. And while a lexer is definitely cooler than regex based parsing, we probably should at least give a nod to pragmatism and look to see if there are benefits in terms of parsing speed (especially given that this looks like it would bump up the build size). @GingerIK do you think it would be possible to put together some simple benchmarks (run against compiled code)? And it would be nice to report the build size before and after. At some point we should establish a convention for benchmarking (adding utilities as dev-dependencies as needed) and track changes to those benchmarks. It would be nice to be able to evaluate proposed changes based on changes to benchmarks (and build size). This should be ticketed separately though. |
'MultiLineString': ol.format.WKT.encodeMultiLineStringGeometry_, | ||
'MultiPolygon': ol.format.WKT.encodeMultiPolygonGeometry_, | ||
'GeometryCollection': ol.format.WKT.encodeGeometryCollectionGeometry_ | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is how the GeoJSON format (and others) are written, but it strikes me that we would get smaller compiled output if instead we used the ol.geom.GeometryType
enum. I imagine this means it couldn't be @const
, but I'm not sure we get much benefit from that. So something like:
/**
* @type {Object.<string, function(ol.geom.Geometry): string>}
*/
ol.format.WKT.GeometryEncoder_ = {};
ol.format.WKT.GeometryEncoder_[ol.geom.GeometryType.POINT] = ol.format.WKT.encodePointGeometry_;
// ... etc.
Ugly to write. But perhaps smaller in the end.
It would also be fine with me to handle this consistently (as you do) and change other formats at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree (both on the ol.geom.GeometryType
and the 'ugly' part). So I definitely would like to give it a try to compare both builds. However, I got a bit stuck on the upper camel case used by ol.geom.GeometryType
. Parsing the (typically uppercase) WKT's to camel case isn't very nice. Do you happen to know a simple workaround?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For encoding, CamelCase keys in this object are appropriate (since that's what geometry.getType()
will return). For decoding, a separate enum could be used with UPPERCASE keys (these would be used in assigning to ol.format.WKT.Parser.GeometryParser_
and ol.format.WKT.Parser.GeometryConstructor_
below). But I think both of these would be minor minification optimizations and should be handled separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try later on. I don't expect too much out of it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, a short test on the ol.format.WKT.GeometryEncoder_
surprisingly shows an increase of 22 bytes...
Thanks for the feedback @tschaub. I have done a simple benchmarking earlier on which showed the lexer is about 2 times as fast. I'll post an updated version of this benchmark later on. Concerning the build size I ran the following script:
Which results in:
Moreover, I added some tests to validate empty and invalid geometries. That's it for now! |
OK, here is the speed test. I temporarily added the following code to
Giving these results: Regex parser
Lexer parser
So it looks like the more complex the WKT string becomes, the more the lexer seems to pay off (in terms of speed). Which kinda makes sense. |
* Class to tokenize a WKT string. | ||
* @param {string} wkt WKT string. | ||
* @constructor | ||
* @protected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be @private
I think (and named ol.format.WKT.Lexer_
). Also minor and not necessary to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, and the same holds for the parser. But somehow it feels more comfortable to use protected classes. I guess it looks better :-)
This is very nice work @GingerIK. Thanks for the great contribution - and the extra work providing benchmarks and build results! |
You're most welcome! |
I just took a look at ths patch. Great contribution! |
Thanks @elemoine |
As I found myself some spare time and enthusiasm, I tried to write a lexer to replace the regex-based parsing of WKT strings (see also #2172). Here is the result. Please let me know whether this is anywhere near the concepts that have been conceived.