Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure non-intersecting sf-geometries in get_eurostat_geospatial #202

Merged
merged 5 commits into from
May 9, 2021

Conversation

retostauffer
Copy link
Contributor

Hy there.

Referring to #197 where we noticed that get_eurostat_geospatial() in some cases returns invalid geometries (Self-intersecting POLYGONS or MULTIPOLYGONS). With respect to #197 I have tried to implement a fix.

Unfortunately a pull request against the master branch, if it is preferred to have it against e.g., the development branch let me know, only few lines of changes to make.

Changes

  • get_eurostat_geospatial() has an additional argument make_valid = FALSE (current default).
    • If output_class == "sf" & !make_valid: A warning will be thrown; Causes warnings in several places (e.g., the examples in the function manual as, by default, make_valid = FALSE).
    • If output_class == "sf" & make_valid: Once applying st_buffer(shp, 0); removes the self-intersection problem and ensures that the resulting geometries are valid.
  • Further details given in the @details section; Small example added in the @example section.

Notes/remarks:

  • I originally suggested to (i) check if there are invalid (self-intersecting) geometries, if so apply st_buffer(). Creates unnecessary overhead and increases maintenance, thus suggesting to always apply st_buffer(). Microbenchmark: average time for the example used on my laptop ca. 20 milliseconds.
  • Self-intersecting geometries do not occur in all data sets! Using 1:60, nuts level 2 for year 2013 in the example (?get_eurostat_geospatial) as it is one of the situations where the problem occurs.

Feel free to come back to me if I can improve the merge request or if something is unclear or should be changed.

All best,
R

Copy link
Member

@antagomir antagomir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good overall, I requested another review from @muuankarski

Two comments:

  • Would it be feasible to add unit tests in order to guarantee correct function?
  • You could add yourself in contributor role (ctb) in DESCRIPTION file?

CI builds fail for Mac with:
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
there is no package called ‘crayon’

-> Is it possible to fix this by adding a library call in Github actions yaml file .github/workflows/R-CMD-check.yaml or updating dependencies in DESCRIPTION file?

R/get_eurostat_geospatial.R Show resolved Hide resolved
@antagomir
Copy link
Member

Btw can you confirm when the changes can be considered "ready", we will then have another look.

@retostauffer
Copy link
Contributor Author

retostauffer commented Feb 1, 2021

Work in progress! Worked on a series of tests for for get_eurostat_geospatial(output_format = "sf", ...) ("sf" only) and updated the sanity checks for this specific function.

Additional check if st_buffer() is required seems to be obsolete as it happens in most of the cases. Hope that I can extend the tests this week to ensure covering most of the function.

@retostauffer
Copy link
Contributor Author

Added a series of tests in tests/test_get_eurostat_geospatial.R covering 92 percent of the code.

  • Slightly changed the sanity checks of input arguments and added missing one (nuts_levels).
  • Adjusted conditional execution in few places.
  • Modified few commands to require less detailed tests (e.g., shp <- eval(...) for nuts_level != "all").
  • CI now succeeds for macOS-latest (not sure why; not changed anything in particular).

At first did not wanted to change the content of the function, but makes some things a bit easier/more streamlined and requires less tests. I would consider this as 'done', feel free to implement (or stash) the changes.

@antagomir
Copy link
Member

All checks pass and special mention on the nice coding style!

I would just like to see a confirmation from @muuankarski before we merge.

@retostauffer
Copy link
Contributor Author

Just noticed that this pull request is still open; anything I can help with or clarify?

Copy link
Contributor

@muuankarski muuankarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for not reacting on this on time. Very elegant solution! The only change I would make is to add row breaks in example in get_eurostat_geospatial()-function on line 60.

@antagomir
Copy link
Member

Yes, sorry that this took time. I think this is ready for merging, can you add the small row breaks to PR before we merge?

Then I think we could at the same go make a new CRAN release. I can handle that.

@antagomir
Copy link
Member

I can add the row breaks, so we get this completed.

@antagomir antagomir merged commit e044e10 into rOpenGov:master May 9, 2021
antagomir added a commit that referenced this pull request May 9, 2021
@antagomir
Copy link
Member

I merged, did some other minor updates. Now testing with devtools/win_devel.
Also added @retostauffer to DESCRIPTION file with "ctb" role.
Submitting updated version to CRAN soon.
Thanks!

@retostauffer
Copy link
Contributor Author

My apologies not to react earlier, was busy coding other stuff over the weekend. Thanks a lot your fix and nice to hear it made it - my Master student will be happy :), very much appreciated.

@antagomir
Copy link
Member

Noprob, we were busy too but happy we did it. Now we still need to overcome the CRAN battle - if there are others who should be acknowledged just let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants