Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upA compiled version of hex2raw would be useful for large features? #10
Comments
|
Thanks very much for taking the time to share this. I haven't used the package with data that large, and I wasn't sure anyone would try to, so I didn't know if there was a good reason to use compiled code. But this provides a good case for it. What about the readWKB function? Was that also slow? |
|
The primary cause of slow performance in hex2raw was a call to the substring function. I replaced this with calls to the strsplit and paste functions, and I'm seeing a speedup of about 100x with large input. See details at cd6a4bf |
|
Hi Ian Thanks. I'll take a look. That's a pretty nice improvement as far as a pure R solution goes. Regarding your earlier question about whether readWKB is also slow... It's not slow, but faster would certainly better. For one of our regional council areas with 218409 vertices (and a WKB representation of 7274632 characters), my compiled version of hex2raw took 3.794 seconds. Then, readWKB took 3.306 seconds. By comparison rgeos::readWKT took 0.897 seconds (the WKT representation is similar in size at 7121899 characters). So, naively, I'd expect a decent speed improvement if readWKB was compiled also--I guess maybe 3 or 4-fold. As a bit of context, I am querying data from a SQL Server Spatial database. Ordinarily, I would use the readOGR function with the MSSQL driver, but I am querying from Linux, and the official MS driver seems to truncate the geometry fields, so doesn't work (interestingly, the freeTDS driver does seem to work, but we haven't looked at that driver properly since we'd need to work out if it can handle trusted authentication via Kerberos, etc.). Instead, I've had to use the JDBC driver, which means I query the database then convert the geometry. As far as I can see, I have two options: in SQL, convert the geometry to either of WKT or WKB; then in R convert to one of the sp types. In SQL casting to WKB is much faster than WKT, but then converting WKT in R is much faster than converting WKB with the tools I currently have. So, I'm just trying to find the fastest solution, but I appreciate that my use case might not be all that common, and so 'fixing' is probably a low value proposition from your end. Thanks for the reply. |
|
p.s. I have tested your updated .hex2raw function. Here's how it compares to the compiled version from the original post:
This is a big improvement considering I never actually got the function before your changes to actually finish for this particular polygon. Thanks again for looking at it. |
|
Great, thanks. I'll call this closed for now, but I do plan in the future to use compiled code in this function and elsewhere in the package to speed things up. |
|
This improvement to hex2raw is now on version 0.3-0 on CRAN. |
|
@cmhh I reused your solution in http://github.com/edzer/sfr (file src/wkb.cpp) and would be happy to list you as contributor. |
|
Factoring out the string/istringstream/hex stuff with more bare bones C code brought another factor 12 speed increase on a 500K polygons coverage; see here. |
Hi
Sorry, I don't know how to comment besides putting things in here as an 'issue'...
Anyway, I found myself in the situation where I had character vectors containing WKB geometries, where some elements had several millions of characters. To use the readWKB function, I first had to use your hex2raw function, and this is very slow in pure R. Perhaps it is a rare use case to have such large character vectors, but I did find a compiled function that did something similar to the hex2raw function made things manageable. For example (without any checks for valid inputs etc.):
To test, I converted a character vector with 185032 characters, x:
with the following results:
(The actual timings were 11.015 and 0.048 seconds, respectively).
Anyway, you may or may not find something like this useful.