Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rcpp #19

Closed
marcosci opened this issue Jul 6, 2018 · 22 comments
Closed

Rcpp #19

marcosci opened this issue Jul 6, 2018 · 22 comments

Comments

@marcosci
Copy link
Member

marcosci commented Jul 6, 2018

Hey @Nowosad,

we were playing around with your rcpp functions - really handy stuff in there 馃憤

Just one weird thing with rcpp_get_coocurrence_matrix:

foo <- function(landscape){
    
    raster_padded <- pad_raster(landscape, pad_raster_value = -999,
                                pad_raster_cells = 1)
    
    adjacent_cells <- raster::adjacent(
        raster_padded,
        cells =  seq_len(raster::ncell(raster_padded)),
        directions = 4,
        pairs = TRUE
    )
    
    tb <- table(raster_padded[adjacent_cells[, 1]],
                raster_padded[adjacent_cells[, 2]])
    
    return(tb)
}


foo2 <- function(landscape){
    raster_padded <- pad_raster(landscape, pad_raster_value = -999,
                                pad_raster_cells = 1)
    
    tb <- rcpp_get_coocurrence_matrix(raster::as.matrix(raster_padded), directions = 4)
    
    return(tb)
}

x <- microbenchmark(
    r_stuff <- foo(podlasie_ccilc), 
    cpp <- foo2(podlasie_ccilc), 
    times = 100
)

x
Unit: seconds
                           expr      min       lq     mean   median       uq      max neval
 r_stuff <- foo(podlasie_ccilc) 2.066591 2.242272 2.429078 2.450698 2.558393 3.056486   100
    cpp <- foo2(podlasie_ccilc) 3.133793 3.462482 3.560858 3.565013 3.705634 3.951005   100

y <- microbenchmark(
    r_stuff <- foo(landscape), 
    cpp <- foo2(landscape), 
    times = 100
)

y
Unit: milliseconds
                      expr       min        lq      mean    median        uq      max neval
 r_stuff <- foo(landscape) 13.405927 14.166557 16.004563 14.811494 16.061616 49.77265   100
    cpp <- foo2(landscape)  2.200139  2.314256  2.839799  2.465439  2.690333 13.05235   100

.... for landscape it's way faster - but not for podlasie. Do you have anything in mind, why that could be?

@Nowosad
Copy link
Member

Nowosad commented Jul 6, 2018

Great question, Marco!
I will test it tomorrow.

@marcosci
Copy link
Member Author

marcosci commented Jul 9, 2018

Nice work @Nowosad :-)

New benchmark:

> x
Unit: milliseconds
                           expr        min         lq      mean     median        uq       max neval cld
 r_stuff <- foo(podlasie_ccilc) 3607.12295 3907.67668 4058.7611 4013.53287 4169.4011 4826.0062   100   b
    cpp <- foo2(podlasie_ccilc)   80.54347   83.02005  114.4029   94.30997  121.7541  361.9254   100  a 

.... with rcpp_get_coocurrence_matrix2. Will start implementing that soon :-)

@Nowosad
Copy link
Member

Nowosad commented Jul 9, 2018

Timings are good. The results are not (always). That is still a work in progress..

@Nowosad
Copy link
Member

Nowosad commented Jul 9, 2018

It should be fine now. @marcosci , please test it and let me know if we can merge it with master.

library(landscapemetrics)
library(raster)
#> Loading required package: sp
library(microbenchmark)
data("landscape")
data("podlasie_ccilc")
landscape <- pad_raster(landscape, pad_raster_value = -999,
                            pad_raster_cells = 1)
podlasie_ccilc <- pad_raster(podlasie_ccilc, pad_raster_value = -999,
                             pad_raster_cells = 1)

foo <- function(landscape){

    adjacent_cells <- raster::adjacent(
        landscape,
        cells =  seq_len(raster::ncell(landscape)),
        directions = 4,
        pairs = TRUE
    )

    tb <- table(landscape[adjacent_cells[, 1]],
                landscape[adjacent_cells[, 2]])

    tb <- unclass(tb)

    return(tb)
}


foo2 <- function(landscape){
    tb <- landscapemetrics:::rcpp_get_coocurrence_matrix(raster::as.matrix(landscape), directions = 4)

    return(tb)
}

x <- microbenchmark(
    r_stuff <- foo(podlasie_ccilc),
    cpp <- foo2(podlasie_ccilc),
    times = 10
)

x
#> Unit: milliseconds
#>                            expr        min         lq       mean
#>  r_stuff <- foo(podlasie_ccilc) 3930.70718 3998.36992 4183.68346
#>     cpp <- foo2(podlasie_ccilc)   84.92322   90.93474   99.53851
#>      median        uq       max neval
#>  4195.26915 4257.7961 4674.3710    10
#>    98.78565  108.2687  115.6758    10

all.equal(r_stuff, cpp, check.attributes = FALSE)
#> [1] TRUE

Created on 2018-07-09 by the reprex package (v0.2.0).

@Nowosad
Copy link
Member

Nowosad commented Jul 15, 2018

Hey @marcosci ,
quick question - do we need all of the files in src/? E.g. do we use rem-union-merge.cpp or borders.cpp?

@Nowosad Nowosad reopened this Jul 15, 2018
@marcosci
Copy link
Member Author

I thought so, for the connected labeling. But did not look at that for quite some while. We anyways have to fiddle with that, as one can only use the queen's case at the moment for the patch identification.

Are you asking because libs get that big while compiling the package?

@Nowosad
Copy link
Member

Nowosad commented Jul 15, 2018

Yes. Larger libs means also longer compilation times. I am just curious if we need all of it. If not - we should remove it.

@marcosci
Copy link
Member Author

marcosci commented Jul 15, 2018

Will dive into that tomorrow :) While doing that - what are we planing with your rcpp functions, internal or public?

@Nowosad
Copy link
Member

Nowosad commented Jul 16, 2018

I do not have a good answer to that question. It should depends on the users expectations - would they need any of the rcpp functions for a landscape ecology work?

@marcosci
Copy link
Member Author

There were for us - so I would assume for others too.
I think they have potential to be building blocks for other metrics, and something in the line of:

get_composition <- function(landscape){
    
    rcpp_get_composition_vector(raster::as.matrix(landscape))
    
}

... exported would be valuable. If you think there is a more appropriate place for them, tell me. :-)

@marcosci
Copy link
Member Author

So, Max and I played around with the connected labeling from SDMTools - turns out it is a bit faster and we could also tweak it to work 4 neighbors, instead of only 8.

They use it under a GPL license, I think if we put them as the author of the C function it should be fine to not have SDMTools as a dependency?

@Nowosad
Copy link
Member

Nowosad commented Jul 20, 2018

That is my understanding as well. See rmarkdown as an example - https://github.com/rstudio/rmarkdown/blob/master/DESCRIPTION (it uses outside code).

Edit: interesting disussion at ropensci/unconf17#32

@marcosci
Copy link
Member Author

According to the figure @cboettig is showing it would be allowed - nice.

Then, after now restructuring ~200 functions to make them understand the argument directions, we should write a proper acknowledgmenent in the C code 馃憤

@Nowosad
Copy link
Member

Nowosad commented Jul 21, 2018

@marcosci, going back to your previous question - I agree that this function could be useful for the users.
We just need to come with a better name...
Is there any other rcpp function that could be useful externally?

@Nowosad Nowosad reopened this Jul 21, 2018
@marcosci
Copy link
Member Author

marcosci commented Jul 22, 2018

Hm, I would export them all (with better names 馃構 ) except for maybe:

  • cclabel_x
  • entropy
  • circle

?

@marcosci
Copy link
Member Author

extract_xxx isn't any better, isn't?

  • calculate_xxx
  • compute_xxx
  • derive_xxx

馃

@mhesselbarth
Copy link
Member

I would prefer an identical prefix to group all of them. How about using get_xxx, so just remove the rcpp_ part for the name of the R wrapper.

Functions that may be interesting are:

  • rcpp_get_coocurrence_matrix
  • rcpp_get_coocurrence_vector
  • rcpp_get_offdiagonal_vector
  • rcpp_get_pairs
  • rcpp_get_composition_vector
  • rcpp_get_nearest_neighbor

@Nowosad
Copy link
Member

Nowosad commented Jul 31, 2018

Marco and Max, sorry for the delay with this. I'd been at a conference whole last week, and this week is also crazy. I will think about it and make it done in a week or so (or tomorrow).

marcosci added a commit that referenced this issue Jul 31, 2018
Related to Issue #19: #19

get_adjacencies is a wrapper around rcpp_get_coocurrence_matrix to have a fast alternative to the raster package for calculating the adjacency matrix
marcosci added a commit that referenced this issue Jul 31, 2018
marcosci added a commit that referenced this issue Jul 31, 2018
marcosci added a commit that referenced this issue Jul 31, 2018
@marcosci
Copy link
Member Author

Just started to implement the utility functions - nothing set in stone, if you want to change/add/remove/whatever something that is totally fine 馃憤

@Nowosad
Copy link
Member

Nowosad commented Aug 1, 2018

Awesome work!
One small idea - how about using our lsm_ prefix here as well?
E.g. lsm_get_xx()?

@mhesselbarth
Copy link
Member

We had the idea to use lsm_ as a prefix only for functions that actually calculate a metric and not for auxiliary or helper functions

@Nowosad
Copy link
Member

Nowosad commented Aug 1, 2018

Ok, that's fine.

@marcosci marcosci closed this as completed Aug 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants