Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to overline2 #307

Merged
merged 12 commits into from Apr 24, 2019

Conversation

Projects
None yet
2 participants
@mem48
Copy link
Collaborator

commented Apr 17, 2019

Overline2 function has the following improvements:

  • Improved Docs

  • Faster

  • Less Memory Usage

Profile for 500 lines from the pct schools dataset in london

profile <- bench::mark(check = FALSE,
            r1 = stplanr::overline(sl = rf, attrib = c("bicycle","govtarget_slc","dutch_slc")),
            r2 = stplanr::overline2(x = rf, attrib = c("bicycle","govtarget_slc","dutch_slc")),
            r3 = overline3(x = rf, attrib = c("bicycle","govtarget_slc","dutch_slc"))
)
profile[,c(1,7,10)]

r1 104.1MB 1.09m
r2 74.8MB 16.54s
r3 67.5MB 9.08s

Note that results for r2 and r3 are not identical the row order is different so:

# Check Identical
r2$geometry <- sf::st_as_text(r2$geometry)
r3$geometry <- sf::st_as_text(r3$geometry)
r2 <- as.data.frame(r2)
r3 <- as.data.frame(r3)
r2 <- r2[order(r2$geometry), ]
r3 <- r3[order(r3$geometry), ]
rownames(r2) <- 1:nrow(r2)
rownames(r3) <- 1:nrow(r3)
identical(r2, r3) # TRUE

mem48 added some commits May 24, 2018

Merge pull request #1 from ropensci/master
update to latest version
@Robinlovelace

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2019

@mem48 mem48 referenced this pull request Apr 17, 2019

Closed

Document overline2 #281

@mem48

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 17, 2019

This is a problem with dplyr::group_by when you don't quote column names

c3 <- dplyr::group_by(c3, X1, Y1, X2, Y2)

I don't know how to fix this, any suggestions?

@Robinlovelace

This comment has been minimized.

Copy link
Collaborator

commented Apr 17, 2019

Yes I know. It's like this:

utils::globalVariables(c(".", "n", "matchingID"))

Replacing that with:

utils::globalVariables(c(".", "n", "matchingID", "X1", "Y1", "X2", "Y2"))

Should allow the tests to pass.

@mem48

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 17, 2019

Thanks, pushed fix waiting for Travis

mem48 added some commits Apr 18, 2019

@mem48

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

I made one more tweak, that reduces memory usage by another 4%

mem48 added some commits Apr 23, 2019

@mem48

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 23, 2019

speeds are now for 500 lines

1 overline 274MB 4.03m
2 overline2 194MB 11.65s
3 overline2_new 306MB 6.98s

I raised the regionalization limit to 1e6, as it uses more memory but is faster on small datasets, on a larger dataset of 37,000 lines.

1 overline2 537.32MB 22.7s
2 overline2_new 1.16GB 13.8s

@Robinlovelace

This comment has been minimized.

Copy link
Collaborator

commented Apr 24, 2019

Great to see the checks passing. Good to see the speed-up too. Many thanks!

@Robinlovelace Robinlovelace merged commit 24e2cd4 into ropensci:master Apr 24, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.