Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A compiled version of hex2raw would be useful for large features? #10

Closed
cmhh opened this issue Oct 26, 2015 · 8 comments
Closed

A compiled version of hex2raw would be useful for large features? #10

cmhh opened this issue Oct 26, 2015 · 8 comments
Labels

Comments

@cmhh
Copy link

@cmhh cmhh commented Oct 26, 2015

Hi

Sorry, I don't know how to comment besides putting things in here as an 'issue'...

Anyway, I found myself in the situation where I had character vectors containing WKB geometries, where some elements had several millions of characters. To use the readWKB function, I first had to use your hex2raw function, and this is very slow in pure R. Perhaps it is a rare use case to have such large character vectors, but I did find a compiled function that did something similar to the hex2raw function made things manageable. For example (without any checks for valid inputs etc.):

library(Rcpp)
library(inline)

inc <- '
#include <fstream>
#include <iostream>
'

src <- '
   Rcpp::CharacterVector invec(cx);
   Rcpp::RawVector raw(invec[0].size() / 2);
   std::string s = Rcpp::as<std::string>(invec[0]);
   int x;
   for (int i=0; i<raw.size(); i++){
      std::istringstream iss(s.substr(i*2, 2));
      iss >> std::hex >> x;
      raw[i] = x;
   }
   return raw;
'

foo <- cxxfunction(signature(cx="character"), src, includes=inc, plugin="Rcpp")

To test, I converted a character vector with 185032 characters, x:

system.time(a <- wkb::hex2raw(x))/system.time(b <- foo(x))

with the following results:

    user   system  elapsed 
230.1667      NaN 229.4792 

(The actual timings were 11.015 and 0.048 seconds, respectively).

Anyway, you may or may not find something like this useful.

@ianmcook
Copy link
Owner

@ianmcook ianmcook commented Nov 2, 2015

Thanks very much for taking the time to share this. I haven't used the package with data that large, and I wasn't sure anyone would try to, so I didn't know if there was a good reason to use compiled code. But this provides a good case for it.

What about the readWKB function? Was that also slow?

@ianmcook ianmcook added the enhancement label Nov 2, 2015
@ianmcook
Copy link
Owner

@ianmcook ianmcook commented Nov 2, 2015

The primary cause of slow performance in hex2raw was a call to the substring function. I replaced this with calls to the strsplit and paste functions, and I'm seeing a speedup of about 100x with large input. See details at cd6a4bf
I expect to submit to CRAN soon, but in the meantime, use devtools::install_github to install the dev version.

@cmhh
Copy link
Author

@cmhh cmhh commented Nov 2, 2015

Hi Ian

Thanks. I'll take a look. That's a pretty nice improvement as far as a pure R solution goes.

Regarding your earlier question about whether readWKB is also slow... It's not slow, but faster would certainly better. For one of our regional council areas with 218409 vertices (and a WKB representation of 7274632 characters), my compiled version of hex2raw took 3.794 seconds. Then, readWKB took 3.306 seconds. By comparison rgeos::readWKT took 0.897 seconds (the WKT representation is similar in size at 7121899 characters). So, naively, I'd expect a decent speed improvement if readWKB was compiled also--I guess maybe 3 or 4-fold.

As a bit of context, I am querying data from a SQL Server Spatial database. Ordinarily, I would use the readOGR function with the MSSQL driver, but I am querying from Linux, and the official MS driver seems to truncate the geometry fields, so doesn't work (interestingly, the freeTDS driver does seem to work, but we haven't looked at that driver properly since we'd need to work out if it can handle trusted authentication via Kerberos, etc.). Instead, I've had to use the JDBC driver, which means I query the database then convert the geometry. As far as I can see, I have two options: in SQL, convert the geometry to either of WKT or WKB; then in R convert to one of the sp types. In SQL casting to WKB is much faster than WKT, but then converting WKT in R is much faster than converting WKB with the tools I currently have. So, I'm just trying to find the fastest solution, but I appreciate that my use case might not be all that common, and so 'fixing' is probably a low value proposition from your end.

Thanks for the reply.

@cmhh
Copy link
Author

@cmhh cmhh commented Nov 2, 2015

p.s. I have tested your updated .hex2raw function. Here's how it compares to the compiled version from the original post:

> benchmark(replications=5, wkb::hex2raw(x$WKB), foo(x$WKB))
                 test replications elapsed relative user.self sys.self user.child sys.child
2          foo(x$WKB)            5   8.648    1.000     8.648    0.000          0         0
1 wkb::hex2raw(x$WKB)            5  42.670    4.934    42.672    0.008          0         0

This is a big improvement considering I never actually got the function before your changes to actually finish for this particular polygon.

Thanks again for looking at it.

@ianmcook
Copy link
Owner

@ianmcook ianmcook commented Nov 2, 2015

Great, thanks. I'll call this closed for now, but I do plan in the future to use compiled code in this function and elsewhere in the package to speed things up.

@ianmcook ianmcook closed this Nov 2, 2015
@ianmcook
Copy link
Owner

@ianmcook ianmcook commented Mar 25, 2016

This improvement to hex2raw is now on version 0.3-0 on CRAN.

@edzer
Copy link

@edzer edzer commented Aug 31, 2016

@cmhh I reused your solution in http://github.com/edzer/sfr (file src/wkb.cpp) and would be happy to list you as contributor.

@edzer
Copy link

@edzer edzer commented Sep 13, 2016

Factoring out the string/istringstream/hex stuff with more bare bones C code brought another factor 12 speed increase on a 500K polygons coverage; see here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.