Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to overline2 #307

Merged
merged 12 commits into from Apr 24, 2019
Merged

Improvements to overline2 #307

merged 12 commits into from Apr 24, 2019

Conversation

@mem48
Copy link
Collaborator

@mem48 mem48 commented Apr 17, 2019

Overline2 function has the following improvements:

  • Improved Docs

  • Faster

  • Less Memory Usage

Profile for 500 lines from the pct schools dataset in london

profile <- bench::mark(check = FALSE,
            r1 = stplanr::overline(sl = rf, attrib = c("bicycle","govtarget_slc","dutch_slc")),
            r2 = stplanr::overline2(x = rf, attrib = c("bicycle","govtarget_slc","dutch_slc")),
            r3 = overline3(x = rf, attrib = c("bicycle","govtarget_slc","dutch_slc"))
)
profile[,c(1,7,10)]

r1 104.1MB 1.09m
r2 74.8MB 16.54s
r3 67.5MB 9.08s

Note that results for r2 and r3 are not identical the row order is different so:

# Check Identical
r2$geometry <- sf::st_as_text(r2$geometry)
r3$geometry <- sf::st_as_text(r3$geometry)
r2 <- as.data.frame(r2)
r3 <- as.data.frame(r3)
r2 <- r2[order(r2$geometry), ]
r3 <- r3[order(r3$geometry), ]
rownames(r2) <- 1:nrow(r2)
rownames(r3) <- 1:nrow(r3)
identical(r2, r3) # TRUE
@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Apr 17, 2019

@mem48 mem48 mentioned this pull request Apr 17, 2019
@mem48
Copy link
Collaborator Author

@mem48 mem48 commented Apr 17, 2019

This is a problem with dplyr::group_by when you don't quote column names

c3 <- dplyr::group_by(c3, X1, Y1, X2, Y2)

I don't know how to fix this, any suggestions?

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Apr 17, 2019

Yes I know. It's like this:

utils::globalVariables(c(".", "n", "matchingID"))

Replacing that with:

utils::globalVariables(c(".", "n", "matchingID", "X1", "Y1", "X2", "Y2"))

Should allow the tests to pass.

@mem48
Copy link
Collaborator Author

@mem48 mem48 commented Apr 17, 2019

Thanks, pushed fix waiting for Travis

@mem48
Copy link
Collaborator Author

@mem48 mem48 commented Apr 23, 2019

I made one more tweak, that reduces memory usage by another 4%

@mem48
Copy link
Collaborator Author

@mem48 mem48 commented Apr 23, 2019

speeds are now for 500 lines

1 overline 274MB 4.03m
2 overline2 194MB 11.65s
3 overline2_new 306MB 6.98s

I raised the regionalization limit to 1e6, as it uses more memory but is faster on small datasets, on a larger dataset of 37,000 lines.

1 overline2 537.32MB 22.7s
2 overline2_new 1.16GB 13.8s

@Robinlovelace
Copy link
Member

@Robinlovelace Robinlovelace commented Apr 24, 2019

Great to see the checks passing. Good to see the speed-up too. Many thanks!

@Robinlovelace Robinlovelace merged commit 24e2cd4 into ropensci:master Apr 24, 2019
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants