Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package grooming and performance improvements #230

Open
2 of 3 tasks
dieghernan opened this issue Sep 30, 2021 · 8 comments
Open
2 of 3 tasks

Package grooming and performance improvements #230

dieghernan opened this issue Sep 30, 2021 · 8 comments

Comments

@dieghernan
Copy link
Member

dieghernan commented Sep 30, 2021

Update

  • Documentation migrated to roxygen2 markdown
  • First clean up of unneeded dependencies
  • Refactor tests

Hi @antagomir @pitkant:

Now that the package moved to a new version, let me share with you some potential improvements I would like to suggest:

Documentation

reference:
- title: Political
  desc: These functions return `sf` objects with political boundaries
  contents: has_concept("political")
- title: Infrastructures
  desc: These functions return `sf` objects with man-made features
  contents: has_concept("infrastructure")
...
  • Accordingly, the reference page could be reestructured to facilitate the navigation of users.

Dependencies and tests

This is a sensitive issue. Currently the package has 19 Imports and 13 Suggests. Also, the package has more than 100 tests for get_eurostat_geospatial() and still the coverage is below 55%.

These two issues cause that the package is heavy when installing on a fresh system and also really slow to tests.

I think it is possible to reduce dependencies to 17+5 (get rid of 10 packages) with almost no costs. On the test side, more work is needed, but probably there are room for improvement.

As a suggestion, {giscoR} could help with the spatial part but {sf} would be still needed.

I see the Documentation part quite easy (I can even set an action for performing these tasks). Also I can work on reducing dependencies.

Let mw know your thoughs, regards

@pitkant
Copy link
Member

pitkant commented Sep 30, 2021

Regarding documentation:

  • I agree with everything you've mentioned, all these seem like good quality of life improvements for the package.

Regarding package dependencies and tests:

I think we could start a project board aiming for eurostat 4.0.0 release. The specifics on what that release would contain could be discussed under the project page.

@dieghernan
Copy link
Member Author

I like the idea of the project board 😄

It can start with the Documentation part, and regarding the packages I have testes:

Imports:
    broom,
    classInt,
    countrycode,
    curl,
    dplyr,
    httr,
    magrittr,
    jsonlite,
    lubridate,
    [del] RColorBrewer,
    rappdirs,
    readr,
    RefManageR,
    stringi,
    stringr,
    tibble,
    tidyr,
    regions,
    rlang
Suggests:
    [del] covr,
    [del] Cairo,
    [del] ggplot2,
    knitr,
    [del] markdown,
    rmarkdown,
    [del] roxygen2,
    [del] rvest,
    testthat (>= 3.0.0),
    [del]  tmap,
    [del] usethis,
    sf,
    [del] here
    [new] sp

Most of this packages are not even used on the vignettes, but on the articles. We can install that in the GitHub Action, as the articles are not part of the package.

Regarding Cairo, this is a bit tricky. I think it is used for plots. Now the default (at least on pkgdown) is ragg, that provides high-quality plots. This one makes me doubt a bit, but It could be removed amending also the vignette.

@antagomir
Copy link
Member

I agree with all, I very much like to idea of moving to md docs.

The suggested ways to remove dependencies seem good as well as work towards version 4.0.0 (what does R Extension say about version numbers, should this be 4.1.1 instead?)

One way to reduce dependencies is to move all spatial visualization stuff elsewhere, this has been under discussion for a longer time.

It also seems that there is now an emerging ecosystem of packages around eurostat, including also those from @antaldaniel - we could think if it would make sense to create an more comprehensive online resource on using and enrichning eurostat (and related?) data sources based on this set of interoperable packages. But this latter one would require a bit more planning and time.

@jhuovari
Copy link

jhuovari commented Oct 1, 2021

Good ideas.

I think there are also performance issues in the get_eurostat() and particulary in the tidy_eurostat(). I tried to speed the code in the branch speed and managed to improve the performance significantly. Unfortunately, I never got time to finish the job, and now the branch is way behind the master. But there is potential... Unless someone else have already improved code meanwhile.

@dieghernan
Copy link
Member Author

Hi @antagomir
Regarding versioninf, on a quick search I see on https://cran.r-project.org/doc/manuals/r-release/R-exts.html:

The mandatory ‘Version’ field gives the version of the package. This is a sequence of at least two (and usually three) non-negative integers separated by single ‘.’ or ‘-’ characters. The canonical form is as shown in the example, and a version such as ‘0.01’ or ‘0.01.0’ will be handled as if it were ‘0.1-0’. It is not a decimal number, so for example 0.9 < 0.75 since 9 < 75.

On https://r-pkgs.org/description.html#version SemVer and X.Org are referred. On SemVer:

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.

Common practice when these changes are applied are just to increase major with trailing 0.

@dieghernan dieghernan added this to In progress in eurostat 4.0.0 Oct 3, 2021
@pitkant
Copy link
Member

pitkant commented Jun 28, 2023

@dieghernan I think this issue is again timely and closely related to #263 since several geospatial packages are going to be removed from CRAN or changed. I think now the time would be ripe to "move" get_eurostat_geospatial functionalities out of eurostat package and rely on giscoR instead.

@dieghernan
Copy link
Member Author

dieghernan commented Jun 28, 2023

I can prepare a PR switching to giscoR and including it on the Suggest field.

I think it should be pretty straightforward since the parameters of get_eurostat_geospatial are almost the same than those of gisco_get_nuts. Is that ok?

And if so, over which branch? I see some development coming recently on v4-dev

@pitkant
Copy link
Member

pitkant commented Jun 29, 2023

I think v4-dev (or, if you like, another branch that would be ultimately merged into v4) would be good. These changes fall easily under the MAJOR changes label so it's justifiable to put it there instead of in a minor release.

I'm not certain what the schedule for the release of v4.0 should be, or how long we have to go through the open issues and solve them. Before October 2023 anyway, because of this geospatial package schedule and #243, which also happens to have its deadline in October.

@pitkant pitkant added this to In progress in eurostat 4.x.x Feb 7, 2024
@pitkant pitkant moved this from In progress to Postponed for future releases in eurostat 4.0.0 Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
eurostat 4.x.x
In progress
eurostat 4.0.0
Postponed for future releases
Development

No branches or pull requests

4 participants