Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy fastmap #24

Closed
stla opened this issue Dec 13, 2022 · 6 comments · Fixed by #26
Closed

Copy fastmap #24

stla opened this issue Dec 13, 2022 · 6 comments · Fixed by #26

Comments

@stla
Copy link

stla commented Dec 13, 2022

Hello,

To make a copy of a fastmap I use this function:

copy <- function(map) {
  fm <- fastmap()
  for(k in map$keys()) {
    fm$set(k, map$get(k))
  }
  fm
}

I'm afraid this is inefficient for a large fastmap. It would be nice to have a copy function which does the copy in C++, if possible.

@wch
Copy link
Member

wch commented Dec 19, 2022

I think that it's possible to improve performance of copying without writing and C++ code.

Basically, the strategy is to serialize() and then unserialize() the fastmap object. The first time that the copy is used (for example, by calling $get()), it will call C++ code to repopulate some internal data structures, and this takes a bit of time. Still, the overall time is faster than the manual copying in your method.

This example copies a fastmap object with 1 million items, and is 5x faster using the method I described.

library(fastmap)

rm(list=ls())
n <- 1e6
x <- seq_len(n)

m <- fastmap()
m$mset(.list = setNames(x, as.character(x)))


# Using mset(as_list())
invisible(gc())
system.time({
  m1 <- fastmap()
  m1$mset(.list = m$as_list())
})
#>    user  system elapsed 
#>   2.429   0.025   2.454 

# stla's copy function
invisible(gc())
copy <- function(map) {
  fm <- fastmap()
  for(k in map$keys()) {
    fm$set(k, map$get(k))
  }
  fm
}
system.time({
  m2 <- copy(m)
})
#>    user  system elapsed 
#>   2.955   0.029   2.984 


# Copy internal values using unserialize(serialize())
invisible(gc())
system.time({
  m3 <- unserialize(serialize(m, NULL))
  # Calling get() calls ensure_restore_map(), which causes the internal
  # key_idx_map to be repopulated. This takes a bit of time.
  m3$get('123')
})
#>    user  system elapsed 
#>   0.624   0.024   0.648 

It's possible performance could still be better by copying the internal data structure in C++, but still, this is a significant improvement.

@wch
Copy link
Member

wch commented Dec 21, 2022

I just took a quick look into this, and it should be possible to copy the C++ hopscotch_map object with the = operator.

Do you have a specific use case for this? If so, how large is the map you're using?

@wch wch mentioned this issue Dec 23, 2022
@wch wch closed this as completed in #26 Dec 23, 2022
@stla
Copy link
Author

stla commented Apr 12, 2023

Hello @wch

My program has a different behavior when I switch from my copy function given in the OP and the $clone() method. Would you know what can cause this difference?

@stla
Copy link
Author

stla commented Apr 12, 2023

Hmm IMHO this is just due to a different order of the elements. Could you please confirm that there should be no difference?

@wch
Copy link
Member

wch commented Apr 12, 2023

Can you provide a reproducible example?

@stla
Copy link
Author

stla commented Apr 13, 2023

I have but it is long and complex. Anyway I checked, I get the same results up to the order of the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants