-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable ci and make stuff reproduce as per #2 #8
Conversation
No worries, will start fixing the remaining issues. |
@njahn82 the version of the website from this branch is now live at https://subugoe.github.io/openairegraph/refs/heads/ci/ |
|
||
Because hoad is not only a research project, but primarily a data product meant to be used by many, hoad is organised as an R package. | ||
|
||
Please adhere by the package development best practices laid out in: | ||
|
||
- Hadley Wickham's [R Packages](https://r-pkgs.org/index.html) | ||
- [rOpenSci Development Guide](https://devguide.ropensci.org/) | ||
|
||
|
||
### Documentation | ||
|
||
Individual functions are documented using [{roxygen2}](https://roxygen2.r-lib.org). | ||
Functions can be further organised using curated reference index in [{pkgdown}](https://pkgdown.r-lib.org/). | ||
|
||
Scientific articles, blog posts or any other long-form documentation is written as a [{pkgdown}](https://pkgdown.r-lib.org/) *article* inside the packages `/vignettes/` folder. | ||
These articles are written as `bookdown::html_document2()` documents *not* as proper R vignettes, to avoid duplicate publication and to allow more expensive computations. | ||
|
||
|
||
### Shiny Application | ||
|
||
tba. | ||
|
||
|
||
### Computing Environments | ||
|
||
There are two sets of computing environments on which this project is guaranteed to work as per the CI setup: | ||
|
||
1. The {hoad} package, including all its functions, is guaranteed to work (pass `R CMD check`) on a wide variety of computing environments as per the CI setup, a subset of the [tidyverse setup](https://github.com/r-lib/actions/tree/master/examples#quickstart-ci-workflow) and [r-hub linux builders](https://github.com/r-hub/rhub-linux-builders). | ||
Because these tests can be quite extensive, they are only run on commits to `master`, as well as pull requests to `master`. | ||
Other branches are thus not guaranteed to work. | ||
2. All other elements of this projects can be reproduced using a custom docker image in the `Dockerfile`: | ||
- the [{pkgdown}](http://pkgdown.r-lib.org) website for hoad (https://subugoe.github.io/hoad/), including its articles (~ vignettes) | ||
- the shiny web application | ||
- (batch) jobs to reproduce more expensive analyses | ||
|
||
This image is also published on every commit to [docker hub](https://hub.docker.com/repository/docker/subugoe/hoad), tagged by git sha and reference (branch, release). | ||
|
||
This image also includes the RStudio IDE, so you can also develop *inside* this image. | ||
|
||
To run this image, navigate your shell to the root of the repository on your machine and run: | ||
|
||
``` | ||
docker run --env DISABLE_AUTH=true \ | ||
--publish 8787:8787 \ | ||
--volume $(pwd):/home/rstudio/Documents | ||
subugoe/hoad:refactor-as-pkg | ||
``` | ||
|
||
This will automatically download the build image from Docker Hub. | ||
You can also rebuild it locally by running | ||
|
||
``` | ||
docker build --tag hoad . | ||
``` | ||
|
||
## Fixing typos | ||
|
||
You can fix typos, spelling mistakes, or grammatical errors in the documentation directly using the GitHub web interface, as long as the changes are made in the _source_ file. | ||
This generally means you'll need to edit [roxygen2 comments](https://roxygen2.r-lib.org/articles/roxygen2.html) in an `.R`, not a `.Rd` file. | ||
You can find the `.R` file that generates the `.Rd` by reading the comment in the first line. | ||
|
||
## Bigger changes | ||
|
||
If you want to make a bigger change, it's a good idea to first file an issue and make sure someone from the team agrees that it’s needed. | ||
If you’ve found a bug, please file an issue that illustrates the bug with a minimal | ||
[reprex](https://www.tidyverse.org/help/#reprex) (this will also help you write a unit test, if needed). | ||
|
||
|
||
### Pull request process | ||
|
||
This requires kno | ||
|
||
* Fork the package and clone onto your computer. If you haven't done this before, we recommend using `usethis::create_from_github("subugoe/hoad", fork = TRUE)`. | ||
|
||
* Install all development dependences with `devtools::install_dev_deps()`, and then make sure the package passes R CMD check by running `devtools::check()`. | ||
If R CMD check doesn't pass cleanly, it's a good idea to ask for help before continuing. | ||
* Create a Git branch for your pull request (PR). We recommend using `usethis::pr_init("brief-description-of-change")`. | ||
|
||
* Make your changes, commit to git, and then create a PR by running `usethis::pr_push()`, and following the prompts in your browser. | ||
The title of your PR should briefly describe the change. | ||
The body of your PR should contain `Fixes #issue-number`. | ||
|
||
* For user-facing changes, add a bullet to the top of `NEWS.md` (i.e. just below the first header). Follow the style described in <https://style.tidyverse.org/news.html>. | ||
|
||
### Code style | ||
|
||
* New code should follow the tidyverse [style guide](https://style.tidyverse.org). | ||
You can use the [styler](https://CRAN.R-project.org/package=styler) package to apply these styles, but please don't restyle code that has nothing to do with your PR. | ||
|
||
* We use [roxygen2](https://cran.r-project.org/package=roxygen2), with [Markdown syntax](https://cran.r-project.org/web/packages/roxygen2/vignettes/markdown.html), for documentation. | ||
|
||
* We use [testthat](https://cran.r-project.org/package=testthat) for unit tests. | ||
Contributions with test cases included are easier to accept. | ||
|
||
|
||
## Code of Conduct | ||
|
||
Please note that the hoad project is released with a | ||
[Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this | ||
project you agree to abide by its terms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the canonical place to store such meta files is now in .github/
-- {pkgdown} will automatically find them there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The contributing guidelines seems to refer to {hoad}
#' @importFrom progress progress_bar | ||
#' @importFrom purrr pmap | ||
#' @importFrom base64enc base64decode | ||
#' @importFrom tibble tibble |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are redundant; if we always call foo::bar()
, then we don't need to pollute our namespaces.
I think @importFrom
makes most sense for those packages (very few) where we as pkg devs can't be bothered always explicating foo::bar()
.
Whenever possible the tidyverse style guide advises being explicit.
@@ -1,253 +1,46 @@ | |||
|
|||
<!-- README.md is generated from README.Rmd. Please edit that file --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rendering a README.Rmd
adds a lot of extra complexity and leads to derivative files in the git repo.
Instead, I've now moved the general pkg introduction to the openairegraph.Rmd
vignette, which pkgdown will highlight on the website by default.
Vignettes always get rendered and published on every build, so that's one less thing to worry about and the repo stays clean.
@@ -1,149 +0,0 @@ | |||
<!-- Generated by pkgdown: do not edit by hand --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything in docs/
is deleted, because that pollutes the repo. Instead we know publish from the gh-pages
branch automatically on every commit.
pdf_document: | ||
fig_caption: yes | ||
keep_tex: yes | ||
template: lncs.sty | ||
link-citations: yes | ||
pkgdown: | ||
as_is: true | ||
vignette: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are only necessary for actually vignettes, and since we're only doing articles (= never plan on publishing vignettes to cran), this complexity can happily go.
I think it makes more sense to only publish articles, not vignettes, because vignettes just add more headaches and then there's two URLs where users can find roughly the same content (confusing).
I think the tidyverse team has also stopped publishing actual "vignettes" though everyone still calls them vignettes.
(Articles is what pkgdown calls vignettes which are pkgdown-only).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, was not aware of it!
```r | ||
library(openairegraph) | ||
library(jsonlite) # tools to work with json files | ||
library(tidyverse) # tools for tidy data analysis | ||
# NB, you need to download the file from Zenodo first | ||
oaire <- jsonlite::stream_in(file("data/h2020_results.gz")) | ||
openairegraph::oarg_decode(oaire, records_path = "data/records/", | ||
limit = 500, verbose = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made this into a chunk which actually is evaluated in the CI to strengthen reproducibiity.
future::plan(multisession) # initalize parallel computing | ||
oaire_data <- future.apply::future_lapply(openaire_records, | ||
function(files) { | ||
# load xml file | ||
doc <- xml2::read_xml(files) | ||
# parser | ||
out <- oarg_publications_md(doc) | ||
out$linked_projects <- list(oarg_linked_projects(doc)) | ||
out$linked_ftxt <- list(oarg_linked_ftxt(doc)) | ||
# use file path as id | ||
out$id <- files | ||
out | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, this currently doesn't work on the CI as per #12 so I had to build a conditional around it
library(openairegraph) | ||
library(jsonlite) | ||
library(tidyverse) | ||
library(scales) | ||
library(here) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need this once for the entire vignette/article. library()
calls persist.
The danger here, though is, that the chunk in question is cached.
Caches, unfortunately, don't carry side effects, so we should place whatever has side effects (such as library()
inside non-cached chunks).
tibble::as_tibble() | ||
|
||
pubs_projects <- oaire_df %>% | ||
select(id, type, best_access_right, linked_projects) %>% | ||
unnest(linked_projects) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super slow performance with this, no idea why #16
@@ -247,7 +262,7 @@ oa_monitor_ec %>% | |||
labs(x = NULL, | |||
y = "Open Access Percentage", | |||
caption = "Data: OpenAIRE Research Graph") + | |||
theme_minimal_vgrid(font_family = "Roboto") + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must enable this later in ci #5
Sorry @njahn82, this is not ready for prime time yet as per #2.
But we do technically have CI running and in place (if failing), so it's correct to write that up in the vignette 😏.
It's not so much the CI, per se, but there are still a bunch of things that need to be resolved to make this reproducible and elegant, including #7 #6 #5 #4.