Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement performance critical parts with Rcpp #41

Closed
ecoRoland2 opened this issue Jun 2, 2022 · 15 comments
Closed

Implement performance critical parts with Rcpp #41

ecoRoland2 opened this issue Jun 2, 2022 · 15 comments

Comments

@ecoRoland2
Copy link
Collaborator

Functions that are basically a for loop with some basic arithmetics (like calcEmis) would be very easy to translate to compiled code. Would you like me to start doing that? There would be some development overhead:

  1. You need the right C++ compiler setup. Should not be an issue if you use linux.
  2. You need the right development setup that handles the Rcpp integration. With RStudio and devtools that is as easy as creating a new package project with Rcpp support and copying the code into it.
@sashahafner
Copy link
Member

That would be great! Let me do some profiling first to get an idea of how much this might help.

@ecoRoland2
Copy link
Collaborator Author

I went ahead. See this commit to the fork (based on the master branch): https://github.com/ecoRoland2/ALFAM2/commit/d94d9653e3b22d45f92f25d2ab97b76c3cad1c96

For now, I've done no "structural changes". A good performance boost should be possible if .wrap_calcEmis and calcEmis were both C++ function. But I'd want to avoid data.frames then and would prefer a matrix as return value ...

@sashahafner
Copy link
Member

Great! I am just getting to profiling now (sorry) so can compare. I'll let you know what I see.

A matrix should be OK (I think all columns are or could be numeric). We can just go back to a data frame in R.

@sashahafner
Copy link
Member

I see a doubling in speed for 965 plots and 8133 observations. Here are system.time() results.

Original (v2.1.1, eb15e93 on dev) (although I see you forked master, but the differences shouldn't affect speed):

   user  system elapsed
  6.354   0.012   6.367

Your fork (d94d965):

   user  system elapsed
  3.001   0.020   3.021

Very nice!

@sashahafner
Copy link
Member

There are also some simple issues I can fix. I'll work on these before coming back to your C++ addition. Or, maybe it makes the most sense to merge in your work with dev first, and then get back to the inefficient R code.

Here is one that also doubles the speed.

lineprof function with ALFAM2 v2.1.1 showed a single split operation was taking a lot of time. Results below are for 965 plots and 8133 observations. Bottom row is for entire alfam2 call, top is for split in alfam2.

   user  system elapsed
  2.768   0.000   2.768
   user  system elapsed
  6.354   0.012   6.367

Dropping a bunch of columns before split (commit 7f53f0e in dev):

   user  system elapsed
  0.144   0.000   0.144
   user  system elapsed
  3.555   0.004   3.559

There seems to be an additional improvement from switching from a data frame to a matrix here. I'll take a closer look.

@ecoRoland2
Copy link
Collaborator Author

All operations on data.frames are slow. You only get good performance with them if you can use list operations (likle [[). If a data.frame is needed, I'd use package data.table.

Splitting a data.frame (by rows) is always slow and pretty much never necessary. I'd recommend using package data.table and it's group-by operation (looks like DT[, calcEmis(...), by = group]). calcEmis would need to return a list, data.frame or data.table then but I'd probably use a list. However, I haven't understood yet, what the purpose of drop.rows is.

A combination of package data.table and Rcpp should enable a dramatic performance improvement and package parallel wouldn't be needed anymore.

@ecoRoland2
Copy link
Collaborator Author

Of couse, implementing the "group-by" in Rcpp would be even better performance-wise because we would strongly reduce the number of expensive R-function calls. It wouldn't be diffcult either.

@sashahafner
Copy link
Member

I'd recommend using package data.table and it's group-by operation (looks like DT[, calcEmis(...), by = group]).

@ChHaeni has also encouraged me to use data.table. I'm really impressed with it but have a lot of code (including within functions and packages) that uses a symbolic variable (character vector) for column names within an indexing operation. This doesn't work with data.tables or course. I know I can add .. but the effort required for the changes and some concern about compatibility with data frames scared me a bit.

But I think you two are probably right.

@sashahafner
Copy link
Member

Of couse, implementing the "group-by" in Rcpp would be even better performance-wise because we would strongly reduce the number of expensive R-function calls. It wouldn't be diffcult either.

That would be nice. Let me see what I can do with data.table first. Ideally that will give us a cleaner and already more efficient function to work with. I think I will merge that with the Rcpp work you have already done and then reassess.

@ecoRoland2
Copy link
Collaborator Author

The development version of data.table has a new interface for programming on the language: https://rdatatable.gitlab.io/data.table/articles/datatable-programming.html

I expect release to CRAN soonish.

@sashahafner
Copy link
Member

Of couse, implementing the "group-by" in Rcpp would be even better performance-wise because we would strongly reduce the number of expensive R-function calls. It wouldn't be difficult either.

I've finally learned a bit of C++ and with some trial-and-error managed to move all the grouped stuff into C++.
Now rcpp_calcEmis() accepts vectors that have values for all field plots, along with new vectors that just have the position of the groups.
So now the slow split() call isn't needed, and there are not as many calls to R functions.
There is no calcEmis() anymore.
Seems much faster, but I haven't done a comparison yet.

All in d269b90 in Rcpp-dev branch.

Few issues to work on:

    //i = relative element index
    for (R_xlen_t i = 0; i < gl; i++) {
      ct[i] = cta[gs + i];
      r1[i] = r1a[gs + i];
      r2[i] = r2a[gs + i];
      r3[i] = r3a[gs + i];
      f4[i] = f4a[gs + i];
    }

    /*
    //Alt that might work
    //Is this any better?
    //Extract group i values
    NumericVector ct(cta.begin() + gs - 1, cta.begin() + ge)
    NumericVector r1(r1a.begin() + gs - 1, r1a.begin() + ge)
    NumericVector r2(r2a.begin() + gs - 1, r2a.begin() + ge)
    NumericVector r3(r3a.begin() + gs - 1, r3a.begin() + ge)
    NumericVector f4(f4a.begin() + gs - 1, f4a.begin() + ge)
    */

length() vs. size().

and there is still a lot of R code that is used to get things ready.
See incorporation setup stuff.
Might be able to replace some of that.

@sashahafner
Copy link
Member

Data processing for incorporation takes a lot of time.
Here are some results:

> system.time(out <- alfam2(d2, app.name = 'tan.app', time.name = 'ct', time.inc
orp = 'time.incorp', group = 'pmid', warn = FALSE, prep = TRUE))
   user  system elapsed
  0.924   0.000   0.923
> round(1000 * out$dclck, 1)
Time differences in secs
  t10   t11   t12   t20   t30   t40   t50   t60   t70
 62.9   0.2   8.6 805.3  28.8   7.2   1.5   8.3   0.2
> system.time(out <- alfam2(d2, app.name = 'tan.app', time.name = 'ct', group =
'pmid', warn = FALSE, prep = TRUE))
   user  system elapsed
  0.106   0.000   0.106
> round(1000 * out$dclck, 1)
Time differences in secs
 t10  t20  t30  t40  t50  t60  t70
63.1  0.1 25.7  7.2  1.6  8.3  0.2

The 800 ms used from t12 to t20 is for this chunk:

    if(!is.null(time.incorp)) {
      # Loop through groups with incorporation (incorp.time != NA)
      for(i in names(incorp.time)[!is.na(incorp.time) & n.incorp.vals.grp > 0]) {
    ...
    }
  }

@sashahafner
Copy link
Member

287f071 moved some operations out of the loop and includes some other improvements to the incorporation code.
Gives a >60% improvement.

> system.time(out <- alfam2(d2, app.name = 'tan.app', time.name = 'ct', time.inc
orp = 'time.incorp', group = 'pmid', warn = FALSE, prep = TRUE))
   user  system elapsed
  0.314   0.000   0.314
> round(1000 * out$dclck, 1)
Time differences in secs
  t10   t11   t12   t13   t20   t30   t40   t50   t60   t70
 64.0   0.2   9.2   7.2 179.4  36.1   7.3   1.6   8.5   0.2
> system.time(out <- alfam2(d2, app.name = 'tan.app', time.name = 'ct', group =
'pmid', warn = FALSE, prep = TRUE))
   user  system elapsed
  0.106   0.000   0.107

179 ms still for the loop, which could probably be eliminated, but only with a merge() call or similar.
Might be worth additional effort.

@sashahafner
Copy link
Member

So far I have not had any problems from Rcpp stuff.
Just tested on Windows today too.
Will finally move latest version to main/master and create a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants