Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only install from if a different sha from current #903

Merged
merged 16 commits into from
Jan 19, 2016

Conversation

jimhester
Copy link
Member

This modifies the behavior of install_github to only download and install a package if the sha has changed from the currently installed version.

It needs to do two requests from the GitHub API

  1. To get the sha of the 'ref' commit ('master' by default)
  2. To retrieve the package DESCRIPTION (so we can disambiguate if a repository name is different than the package name)

It then looks up the existing package DESCRIPTION (which will have a RemoteSha if it was previously installed using install_github()) and compares it to the GitHub version.

If the installed DESCRIPTION does not have a RemoteSha or they are different, install_github proceeds as normal, otherwise nothing happens.

This works well with #902, especially if a given package has a lot of heavy dependencies that take a while to compile.

A couple of places I wanted input

  1. I named the parameter skip_same, but this feels awkward, perhaps there is a better name?
  2. Personally I would prefer this was the default behavior so skip_same = TRUE, but maybe you would prefer it to default to FALSE.
  3. If we assume that all packages have the same repository name as the package name we can cut down the number of requests to just the sha. This is probably valid ~90% or more of the time, but not always.
  4. We can probably do analogous things with the other remote types, if you find this beneficial I can make pull requests for them as well.
  5. This has no tests yet, I wanted to settle on an implementation before doing so, but I can add them before it is accepted.

file.path("repos", username, repo, "contents", "DESCRIPTION"),
httr::write_memory())

content <- rawToChar(base64enc::base64decode(response$content))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to do the decoding yourself here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The github API returns a JSON object with a content field that is base64 encoded. I think httr will decode automatically if the entire response is base64 encoded, but only part of it is in this case.

If there is a httr function I can use to do this decoding that would be great, I just missed it!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it

@jimhester
Copy link
Member Author

So my main worry with this is having to query the GitHub API twice, when unauthenticated the rate limit is only 60 requests per hour, so if someone is doing testing using install_github() with 5 github packages in Remotes: they only have to run install_github() 5 times before they hit the rate limit. (6 * 2 * 5 = 60).

I think I can get the same information without querying the GitHub API using e.g.

git ls-remote git@github.com:jimhester/lintr.git
# aaf61208cf877fe3a077c5146126a0f3867205a2        HEAD
# 50ab86db4ebc7272d95990f8d15b3fabe0278936        refs/heads/chunk_parsing
# fc80f59b2051ed3043c572bd808e53368123a529        refs/heads/comment
# cc0e600b043eac2d3753ca8058c160c5d690adfd        refs/heads/linter_names
# aaf61208cf877fe3a077c5146126a0f3867205a2        refs/heads/master
# f88a4804df4c7cccb1585b1cf08a5601c41e490f        refs/heads/single_line
# a73b1762d795a67fda02475b9a02a4a4151850e9        refs/pull/109/head
# 5e4a4f295744bfb02b0f85a7da1a5f620c0e3b6f        refs/pull/110/head
# 7b3ac8f985251063ca25a3be21d803abfb6ff2ad        refs/pull/19/head
# b1a9d4bc5d501859769c498aceaef30b9a2ed23b        refs/pull/28/head
# 67b8681335482359bc0b6a6bd4ca01596e920be3        refs/pull/37/head
# 517a4df59f9dab8bfb736a9f73e2efe3021720a2        refs/pull/46/head
# 427e177926421819b917c4dd55cd3c42cb231525        refs/pull/49/head
# 7b07088df347352b0da75eebc82ecc2e7cc1ab23        refs/pull/51/head
# fc80f59b2051ed3043c572bd808e53368123a529        refs/pull/52/head
# 7b422f6bc02b6d014ee1260d0f7d8d59d185dba6        refs/pull/69/head
# 97f5e794e8c0b379ca9945c23200dbbca0e101af        refs/pull/70/head
# 7d27c930710696c91688c610a634efa3490699aa        refs/pull/70/merge
# b828e3333c2fc2e7172116d6f6de44436651c72e        refs/pull/71/head
# ab973a26e330677f91b39da9f9ca423f6a36eb90        refs/pull/74/head
# cc1ee872613fc9a527f5680f6ca491b29ce3fa41        refs/pull/78/head
# 71d6d2f58612f291838b74ad3992e8886cd1b928        refs/pull/80/head
# 6d10a4250cda301926e8462be7e67001d8f10f7d        refs/pull/83/head
# 6f17692c940ac2b81cd6648194a0ca2753abbb1a        refs/pull/90/head
# 4722335176ba6601f7a0178f821c42dbd23ba53b        refs/pull/93/head
# 207f0fd1f679ce04e85a703189e6dd7ffffca12c        refs/pull/93/merge
# b09b3ed1132983eccb61c4cf32987a28243f95ba        refs/pull/94/head
# 975a404bab2ea2aa3d094f0ffd4db1abef29b049        refs/pull/95/head
# 3335be532798f4bff0373873b37a81495b25a01e        refs/pull/95/merge
# bb932678932a1b3196b06194509019c459ea312d        refs/pull/97/head
# 3d9e70b15559c460bc60e8961dd94cdaca8dc094        refs/pull/99/head
# f7e6ba13dca58bb612f6bba0a6875a5ba53ca84f        refs/tags/0.1
# f8fcbe89c7da9cb491bc3c0cd9377ee8f68d1d0a        refs/tags/0.1^{}
# d7eaa02fb7f31c049279797eb6bcaa183d1a7898        refs/tags/0.1.0
# 7aa7d9943e57c992b3a702c69950125ddefe89f6        refs/tags/0.1.0^{}
# 00e8748ff27d631ebfb01b038c20695b2cba8138        refs/tags/0.2.0
# 2f3b60b9432f7a79cf907df24dd0dbf95a11df2d        refs/tags/0.2.0^{}
# 3ef07e7f38e7c2da73652e2a04ba82f5d654b143        refs/tags/0.2.0.9000
# b095569b69e445fccdb6e9865f10270f38bff4c0        refs/tags/0.2.0.9000^{}
# 88ad2aad380341286fce189be484e075bc60f21e        refs/tags/0.2.1.9000
# 96962306a005e84fe217a19b018bfd64ec433a28        refs/tags/0.2.1.9000^{}
# 7c46d350e1877d857b0586b74aa1274592147c4b        refs/tags/0.3.0.9000
# 2b9d30fcf7e74420b564889d08ee1af3f9518a3a        refs/tags/0.3.0.9000^{}
# a98a234dafbb76a1622ad2a22a8eebdbe5fbae7d        refs/tags/0.3.0.9001
# 0fb087149e45aeeb5f1f23d4a11cff980deb4feb        refs/tags/0.3.0.9001^{}
# 15ca21869654706f44c6d1659822b7d3760a3eec        refs/tags/0.3.0.9002
# 780be8ea5cc9ba9321f8d09f084b6efac55ad17f        refs/tags/0.3.0.9002^{}
# 3e6cb3d4f9d617533f4f6a107e2f05f028299286        refs/tags/0.3.1
# 1df8c3daf61a026565380d59b8ae7a3fe64434b9        refs/tags/v0.2.0

And the DESCRIPTION with
https://raw.githubusercontent.com/jimhester/covr/master/DESCRIPTION

This would let this work without any API requests.

It also gives us a avenue to use the same approach for the other git based install_* functions. We can use git archive --remote=git://github.com/jimhester/lintr.git HEAD:. DESCRIPTION to get just the description from a git repository without cloning the entire repository (GitHub does not support this method however).

Will try to incorporate your comments and work on this some more.

@jimhester
Copy link
Member Author

jimhester@98016e4 uses the alternative implementation I discussed above and adds support for git and bitbucket backends.

gitorious seems to have been shutdown / acquired by GitLab (https://gitorious.org/, https://en.wikipedia.org/wiki/Gitorious) so I don't think it is worth or possible to get it working. We may want to deprecate that backend...

I could add support for SVN repositories as well if we stored the revision in the DESCRIPTION metadata, however it looks like we do not ATM.

I think I addressed all of your comments that still applied however the code changed quite a bit so it could use another review.

It also needs some tests, although it seems to work via interactive testing.

@@ -69,3 +76,44 @@ remote_metadata.git_remote <- function(x, bundle = NULL, source = NULL) {
RemoteSha = sha
)
}

remote_sha.git_remote <- function(remote, ...) {
res <- system(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do this with git2r? devtools traditionally hasn't assumed that you have git installed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ropensci/git2r#130 would seem to suggest no, I can try and hunt down the code in libgit2 and see how much work it would be to add to git2r

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ropensci/git2r#172 adds ls_remote() to git2r, which exposes this functionality to us.

This PR also uses git archive to download the file for plain git remotes. I imagine git2r doesn't have this either. I think it is lower priority though as plain git is used less frequently than github or bitbucket and people will likely already have a git client installed in that case. They can always just set force = TRUE to revert to the current behavior of always installing the package.

@jimhester
Copy link
Member Author

Ok 72edbbf should now be using git2r::ls_remote(). It is also the first use of Remotes: in the wild, hopefully it works!

@jimhester
Copy link
Member Author

2425180 adds tests , note to NEWS and re-based, however it is broken because Travis uses CRAN devtools to install the dependencies, so the Remotes field isn't picked up.

git2r::remotes_ls() which I added to git2r with ropensci/git2r#172 has been merged, but I don't know when @stewid is planning the next CRAN release of git2r, so this PR likely won't make it in the upcoming devtools release anyway.

@stewid
Copy link

stewid commented Sep 8, 2015

I have not set a date, but git2r v 0.11.0 was released only 27 days ago.

@hadley
Copy link
Member

hadley commented Nov 10, 2015

@jimhester any updates on this? Now that I'm using caching on travis this would be nice :)

@stewid
Copy link

stewid commented Nov 17, 2015

@jimhester and @hadley I'm working on a new release ropensci/git2r#191

@jimhester
Copy link
Member Author

This should now be ready to merge as well. jimhester@a00cd18 passes without any new NOTES being generated.

@@ -45,6 +45,8 @@
#' @param threads number of concurrent threads to use for installing
#' dependencies.
#' It defaults to the option \code{"Ncpus"} or \code{1} if unset.
#' @param force whether to force installation even if the reference hasn't
#' changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be reworded to be a bit more clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think force might be too generic - does it all to specified packages or their deps or both? Maybe the description of the function also needs to change to emphasize that it never reinstalls packages if already present

@jimhester
Copy link
Member Author

@hadley PTAL at this when you can, particularly jimhester@65e4fe5. If you think that looks good this should be ok to merge.

if (!missing(args))
warning("`args` is deprecated", call. = FALSE)

remotes <- lapply(url, git_remote, subdir = subdir, branch = branch)

if (!isTRUE(force)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic here is round the wrong way - isFALSE(force) would be better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately there is no isFALSE function in base R, but I can write one easily enough of course.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this begs the question what the definition of isFALSE() should be. isTRUE() is defined as identical(TRUE, x), so anything but TRUE is FALSE. So we do not want identical(FALSE, x) but !identical(TRUE, x), so isFALSE() is a bit of a misnomer.

It is a small difference so I am happy to define isFALSE as !identical(TRUE, x), but it a slightly inaccurate name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, thinking it through again, I think your original code is correct.

@jimhester
Copy link
Member Author

jimhester@ce7869d adds the early return. Let me know if you see anything else that needs to change. If not I think this can be merged.

#' @export
remote_sha.git_remote <- function(remote, ...) {
if (!is.null(remote$sha)) {
return(remote$sha)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the early return was so you could eliminate the big else block ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I clearly was too tired when I looked at it this morning.

@hadley
Copy link
Member

hadley commented Jan 19, 2016

LGTM - feel free to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants