New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoordinateCleaner #210

Closed
azizka opened this Issue Apr 9, 2018 · 66 comments

Comments

Projects
None yet
7 participants
@azizka
Copy link

azizka commented Apr 9, 2018

Summary

  • What does this package do? (explain in 50 words or less):
    Identify problematic records in large databases of biological and palaeontological collections, to improve data quality for analyses in biogeography, ecology and conservation.

  • Paste the full DESCRIPTION file inside a code block below:

Package: CoordinateCleaner
Type: Package
Title: Automated Cleaning of Occurrence Records from Biological Collections
Version: 1.1-0
Date: 2018-04-08
Authors@R: c(person(given = "Alexander", family = "Zizka", email = "alexander.zizka@bioenv.gu.se",
                    role = c("aut", "cre")),
             person(given = "Daniele", family = "Silvestro", role = c("ctb")))
Description: Automated cleaning of geographic species occurrence records by automated flagging of problems common to biodiversity data from biological collections. Includes automated tests to easily flag (and exclude) records assigned to country or province centroid, the open ocean, the headquarters of the Global Biodiversity Information Facility, urban areas or the location of biodiversity institutions (museums, zoos, botanical gardens, universities). Furthermore identifies per species outlier coordinates, zero coordinates, identical latitude/longitude and invalid coordinates. Also implements an algorithm to identify data sets with a significant proportion of rounded coordinates. Especially suited for large data sets. See <https://github.com/azizka/CoordinateCleaner/wiki> for more details and tutorials.
License: GPL-3
Depends: R (>= 3.0.0), sp
Imports: geosphere, ggplot2, methods, raster, rgeos, rnaturalearth, stats
LazyData: true
RoxygenNote: 6.0.1
Suggests: testthat, covr

  • URL for the package (the development repository, not a stylized html page):
    https://github.com/azizka/CoordinateCleaner

  • Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

  • geospatial data, because the package deals with improving data quality of occurrence records of biological specimens
  • reproducible research, because the package replaces potentially badly documented ad-hoc decisions from GUI GIS with clearly defined functions
  • data munging, becasue the packages processes commonly used geospatial data (geographic coordinates)
  •   Who is the target audience and what are scientific applications of this package?
    Anybody using geographic coordinates from biological collections on a large scale, thus mostly researchers in biogeography, (maco-)ecology, evolutionary biology and conservation practitioners

  • Are there other R packages that accomplish the same thing? If so, how does
    yours differ or meet our criteria for best-in-category?
    Yes, scrubr, see this pre-submission enquiry: #199

  •   If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
    #199, @sckott

Requirements

Confirm each of the following by checking the box. This package:

  • does not violate the Terms of Service of any service it interacts with.
  • has a CRAN and OSI accepted license.
  • contains a README with instructions for installing the development version.
  • includes documentation with examples for all functions.
  • contains a vignette with examples of its essential functions and uses. The vignette is not part of the package. There are extensive tutorials (https://github.com/azizka/CoordinateCleaner/tree/master/Tutorials) and a wiki (https://github.com/azizka/CoordinateCleaner/wiki) on github
  • has a test suite.
  • has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
  • I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

  • Do you intend for this package to go on CRAN? it is on CRAN already https://cran.r-project.org/web/packages/CoordinateCleaner/index.html
  • Do you wish to automatically submit to the Journal of Open Source Software? If so:
    • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
    • The package is novel and will be of interest to the broad readership of the journal.
    • The manuscript describing the package is no longer than 3000 words.
    • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
    • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
    • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
    • (Please do not submit your package separately to Methods in Ecology and Evolution)
      The manuscript is already submitted to MEE. It had to be submitted before April 1st.

Detail

  • Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings:

  • Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
    Exceptions:

  • NEWS file - there is one now, but only started with the latest version.
  • Package name: Unfortunately, the name contains capital letters (CoordinateCleaner), but it is on CRAN already.
  • Function naming: Functions for individual tests are snake_case and pipe compatible, wrapper function around all tests are CamelCase.
  • Documentation: not built with roxygen.
  • If this is a resubmission following rejection, please explain the change in circumstances:
    Nope

  • If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

SaraVarela, sckott

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Apr 10, 2018

Editor checks:

  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly
  • Archive (JOSS only, may be post-review): The repository DOI resolves correctly
  • Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @azizka! 😁 Initial checks revealed a few issues that you should solve before I assign reviewers. Do not hesitate to ask any question you might have!

  • Your package doesn't use roxygen2 yet for documentation. Our packaging guide says we strongly encourage it but we'd actually like to start enforcing it, partly to help future contributors. To convert your existing documentation to roxygen2 documentation you can use the Rd2roxygen package. When using roxygen2 you won't have to updated NAMESPACE by hand.

  • What is the licence on extra_gazeetters? Could these files live in a separate data package? (question, not request for change)

  • Why have the tar.gz and zip of your package in the repo especially since you already make them available via the use of GitHub releases?

  • Here is goodpractice::gp() output with comments.

It is good practice to

  ? write unit tests for all functions, and all package
    code in general. 62% of code lines are covered by test cases.

    R/cc_cap.R:18:NA
    R/cc_cap.R:19:NA
    R/cc_cap.R:29:NA
    R/cc_cen.R:19:NA
    R/cc_cen.R:21:NA
    ... and 580 more lines
    

You do not need to reach 100% code coverage but you can increase it a bit.

  ? not use "Depends" in DESCRIPTION, as it can cause
    name clashes, and poor interaction with other packages. Use
    "Imports" instead.

Feel free to ignore this if you need the sp Depends. By the way, is there any reason why you don't use the more modern sf? (question, not request for change)

  ? omit "Date" in DESCRIPTION. It is not required and it
    gets invalid quite often. A build date will be added to the
    package when you perform `R CMD build` on it.

Please remove Date from there.

  ? add a "URL" field to DESCRIPTION. It helps users find
    information about your package online. If your package does not
    have a homepage, add an URL to GitHub, or the CRAN package
    package page.
  ? add a "BugReports" field to DESCRIPTION, and point it
    to a bug tracker. Many online code hosting services provide bug
    trackers for free, https://github.com, https://gitlab.com, etc.
    

Make it easy to find your GitHub repo by adding these links.

  ? avoid long code lines, it is bad for readability.
    Also, many people prefer editor windows that are about 80
    characters wide. Try make your lines shorter than 80 characters

    R\cc_cap.R:1:1
    R\cc_cap.R:24:1
    R\cc_cen.R:1:1
    R\cc_coun.R:1:1
    R\cc_coun.R:17:1
    ... and 210 more lines
    

Maybe styler can help with this.

  ? avoid sapply(), it is not type safe. It might return
    a vector, or a list, depending on the input data. Consider
    using vapply() instead.

    R\WritePyRate.R:88:68

Please change sapply in favor of vapply here.

  ? not import packages as a whole, as this can cause
    name clashes between the imported packages. Instead, import
    only the specific functions you need.
    

See this part of Hadley Wickham's R package book

  ? not use exportPattern in NAMESPACE. It can lead to
    exporting functions unintendedly. Instead, export functions
    that constitute the external API of your package.

When using roxygen2 syntax you'll use the export tag before each function to be exported.

  ? fix this R CMD check NOTE: Note: found 1 marked
    Latin-1 string

Not sure what this means actually.

  ? fix this R CMD check NOTE: Found the following hidden
    files and directories: .travis.yml These were most likely
    included in error. See section 'Package structure' in the
    'Writing R Extensions' manual.

Please put such files and folders in .Rbuildignore.

Please update this thread once you've done the changes, or as soon as you have a question!


Reviewers: @isteves @Pakillo
Due date: 2018-06-07

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Apr 20, 2018

@azizka do you have any question or comment? ☺

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Apr 20, 2018

Hi @maelle sounds all good, thanks. I am working to include most of your suggestions. I'll defend my PhD next week, so little slow working on the package at the moment, but will tackle this full on afterwards.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Apr 20, 2018

Oh good luck with your defence!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 13, 2018

@azizka I hope your defence went well! I'm taking the risk to say congrats! 🎊

Did you have time to work on the package a bit?

@azizka

This comment has been minimized.

Copy link
Author

azizka commented May 14, 2018

@maelle yes, it worked out fine, thanks for congratulations and your patience! I am back to work this week, working on the package now.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 14, 2018

🎊 Grattis! 🎊

and cool!

@azizka

This comment has been minimized.

Copy link
Author

azizka commented May 16, 2018

Hi @maelle,

Please find below replies to your comments. I hope you find everything covered, please let me know what else I can do. Thanks again for the patience :-).

Note: the most recent version is working, I’ll updated the badges as soon as the problems with travis are solved (travis-ci/travis-build#1382)

Replies:

Your package doesn't use roxygen2 yet for documentation

Done - Migrated documentation and namespace generation to roxygen2

What is the licence on extra_gazeetters? Could these files live in a separate data package?

The extra gazetteers are open source, they could live in a separate data package, but I am not sure that this is necessary/useful since they have very specialized applicability.

Why have the tar.gz and zip of your package in the repo especially since you already make them available via the use of GitHub releases?

Done - Removed the .tar.gz and zip

You do not need to reach 100% code coverage but you can increase it a bit.

It is at 74% now

Feel free to ignore this if you need the sp Depends. By the way, is there any reason why you don't use the more modern sf? (question, not request for change)

Done - moved sp from ‘Depends’ to ‘Imports’. Yes, sf would be great since its faster and tidy, but unfortunately some of the dependencies do not yet support sf formats (e.g. raster, geosphere).

Please remove Date from there

Done - Removed the date from the description file

Make it easy to find your GitHub repo by adding these links.

Done - Added the urls to the description file

Avoid long code lines, it is bad for readability.

Done – cut at 80

Please change sapply in favor of vapply here

Done

not import packages as a whole, instead, import only the specific functions you need

Done

When using roxygen2 syntax you'll use the export tag before each function to be exported.

Done – switched to roxygen2

fix this R CMD check NOTE: Note: found 1 marked, Latin-1 string

I could not reproduce this note, do you have additional information on the operating system and check settings, etc? I suspect it might be related to a non-ASCII character in one of the datasets in the package, but I am not entirely sure either.

Please put such files and folders in .Rbuildignore.

Done, the .travis.yml is part of .Rbuildignore now

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 17, 2018

@azizka thanks for all your work! Am going to assign reviewers in a bit. Two last comments before the reviews:

  • could you please add a code of conduct to your repo? cf our guidelines

  • could you please respond to the two issues opened in your repo? If these are bugs, please solve them before the reviews and update this thread. :-)

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 17, 2018

👋 @isteves @Pakillo! Thanks for accepting to review this package! Your reviews are due 2018-06-07 but @Pakillo I've already noted that yours might be late (@azizka the reviewers of your package were lined up a while ago 😉)

Here is our reviewer template and our reviewer guide.

@isteves

This comment has been minimized.

Copy link

isteves commented May 29, 2018

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 7


Review Comments

Hi @azizka - great, comprehensive package! I'm impressed! Here are some comments related to the sections above:

  • Vignettes
    • Examples are currently included in the Tutorials folder & wiki; consider moving them to a vignettes folder so that users can access them locally using browseVignettes()
  • Examples
    • urbanareas: perhaps specify that file path is not the one shown: instead of load("extra_gazetteers/urbanareas.rda"), load("your/path/urbanareas.rda")
    • WritePyRate: got an error when I tried WritePyRate(x = exmpl,fname = "test", status = status): Error in [.data.frame(x, , taxon) : undefined columns selected
    • A plot call is not present in the example for plot.spatialvalid.
  • Community guidelines
    • No maintainer specified in DESCRIPTION, but maybe ok?
    • No contribution guidelines in the README or CONTRIBUTING
  • Tests
    • library(CoordinateCleaner) calls not needed within individual test scripts
    • Test_fossillevel_cleaning.R filename should be all lower case for consistency
    • I suggest using more descriptive names for context()
    • Using devtools::test(), I got the following warning - encoding is deprecated; all files now assumed to be UTF-8 - as well as these:

test_wrapper_functions.R:16: warning: (unknown)
running GBIF test, flagging records around Copenhagen

test_wrapper_functions.R:28: warning: CleanCoordinates countries argument produces correct output
running GBIF test, flagging records around Copenhagen

test_wrapper_functions.R:96: warning: fossil wrapper cleaning works
countries missing, countrycheck set to FALSE

test_wrapper_functions.R:96: warning: fossil wrapper cleaning works
running GBIF test, flagging records around Copenhagen

test_wrapper_functions.R:97: warning: fossil wrapper cleaning works
countries missing, countrycheck set to FALSE

test_wrapper_functions.R:97: warning: fossil wrapper cleaning works
running GBIF test, flagging records around Copenhagen

  • Packaging guidelines
    • according to the guidelines, badges in README.md should be below the package name (unsure how important this is)
    • include "how the package compares to other similar packages and/or how it relates to other packages" in README.md
    • add citation info to end of in README.md
    • "Avoid starting the description with the package name or This package ..." - not sure if this also applies to the README package description, but you may want to change it to match the description in the DESCRIPTION file

Here are some additional comments. Let me know if anything is unclear!

Spelling

All the text/documentation looks great on the whole! The devtools::spell_check() caught two typos that should be fixed:

Herbariorum institutions.Rd:13
THe CleanCoordinates.Rd:101

(The other "spelling errors" that it caught seem to be fine.)

Code duplication/style

For the most part, the package is both clean and organized. I especially like the cc_ functions, which each tackle a specific type of error and have consistent styles.

Among the other files, there was a mix of style/naming conventions. Is there a reason for that? (asking honestly here, since I may be missing something) I understand that for methods.spatialvalid you must use the . convention to work with classes, but I'm unsure why you switched to capitals for CleanCoordinates* and WritePyRate. These functions (as well as tc_* and the tutorials) also include a lot of variables with . rather than _. Changing the variable names so that they use a consistent style would enhance the readability/usability of the code.


In the CleanCoordinates* functions, it seems that you'd be able to greatly reduce the amount of code through some of these structural changes:

  • use ... to pass arguments specified in CleanCoordinates* to the cc_* functions called within it
  • rather than writing out TRUE/FALSE for each test, perhaps use a single argument (tests) that can take a vector of desired tests (c("sea", "gbif", "equ"))
  • set argument defaults to NULL directly in the argument list (capitals.ref = NULL), instead of in an extra step within the function
  • currently, a variable is returned for each test. If FALSE is chosen, then the returned variable contains only NA's; if TRUE is chosen, the specified test is run and the flagged output is saved. This results in code repetition that may be avoided by (a) specifying the NA output as the default in the beginning of the code and (b) using a data frame/list structure to store the results. To clarify:

Current code

if (capitals) {
    cap <- cc_cap(x,
                  lon = lon, lat = lat, buffer = capitals.rad, ref = capitals.ref,
                  value = "flags", verbose = verbose
    )
} else {
    cap <- rep(NA, dim(x)[1])
}

Proposed change

df <- data.frame(matrix(NA, nrow = nrow(x), ncol = 12))
colnames(df) <- c("val", "equ", "zer", "cap", "cen", "sea", "urb", "con",
"otl", "gbf", "inst", "dpl")


if (capitals) {
    df$cap <- cc_cap(x, value = "flags", ...)
}
#and so on for the other tests
#no `else {cap <- rep(NA, dim(x)[1])}` required
  • the CleanCoordinates* functions also have warnings/checks that perhaps are more appropriate within the cc_* functions (example)
  • in the following case (and also this one), perhaps it's better to stop the code, rather than continue running it with a warning. Also, if is.null(countries) is TRUE, then countries does not need to be set to NULL again
if (is.null(countries)) {
    countries <- NULL
    if (countrycheck) {
      countrycheck <- FALSE
      warning("countries missing, countrycheck set to FALSE")
    }
  }
  • as far as I can tell, cc_dupl can be used in place of this code

In WritePyRate, since fname and status are required, consider moving them earlier in the argument list (positions 2, 3) and not setting a default value (currently NULL)


In cc_val (starting here), there is redundancy in as.numeric/as.character/suppressWarnings that could be reduced. For example, like this:

x[[lon]] <- suppressWarnings(as.numeric(as.character(x[[lon]])))
x[[lat]] <- suppressWarnings(as.numeric(as.character(x[[lat]])))

out <- list(is.na(x[[lon]]),
            is.na(x[[lat]]),
            abs(x[[lon]]) > 180, 
            abs(x[[lat]]) > 90)

Miscellaneous

Currently, the map of the different tests distinguishes clean and flagged points by color, and distinguishes tests by shape. Are the different tests important? If not, perhaps they can grouped into a "flagged" category. If yes, then perhaps color could be used to better distinguish them.


The description of value in the Rd files (for example, here) can be clarified by moving ("flags") up in the sentence:

Depending on the ‘value’ argument, either a data.frame containing the records considered correct by the test (“clean”) or a logical vector (“flags”), with TRUE = test passed and FALSE = test failed/potentially problematic. Default = “clean”

Also, it may help to stick with options for value that are both verbs or both adjectives (but this is just a minor suggestion). For example: clean/flag, clean/flagged, cleaned/flagged.


??CoordinateCleaner has strange output: c("\Sexpr[results=rd,stage=build]tools:::Rd_package_title(\"#1\")", "CoordinateCleaner")Automated Cleaning of Occurrence Records from Biological Collections I'm no sure where this error comes from.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 29, 2018

Many thanks for your review @isteves, great points! 😸

No maintainer specified in DESCRIPTION, but maybe ok?
It is ok because it's generated automatically from Authors@R when building the package, it's actually better not having it.

@azizka You can wait for @Pakillo's review before responding, or respond before that, since @Pakillo's review was going to be a bit late, so as you prefer.

Thanks again @isteves!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented May 29, 2018

@isteves I forgot to ask you to fill the "Estimated hours spent reviewing:" field in the review template, thanks in advance 🙏

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 7, 2018

@Pakillo 👋 will you soon have time to review the package? 😺

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Jun 12, 2018

Hej @maelle and @isteves,

thanks for the comprehensive, helpful and very constructive review. I managed to address most of your issues, and argued back/put questions @maelle for help on some of them. Please find a point-by-point reply below, replies in italics. Please let me know if you have any comments/suggestions.

Thanks again!

Vignettes

Examples are currently included in the Tutorials folder & wiki; consider moving them to a vignettes folder so that users can access them locally using browseVignettes()

Done, moved tutorials to vignettes

Examples

urbanareas: perhaps specify that file path is not the one shown: instead of load("extra_gazetteers/urbanareas.rda"), load("your/path/urbanareas.rda")

Done, changed as suggested

WritePyRate: got an error when I tried WritePyRate(x = exmpl,fname = "test", status = status): Error in [.data.frame(x, , taxon) : undefined columns selected

Done, fixed the code of the example (replaced "identified_name" by "accepted_name")

A plot call is not present in the example for plot.spatialvalid.

Done added plot call to the example.

Community guidelines

No maintainer specified in DESCRIPTION, but maybe ok?

No changes, see @maelle ‘s comment

No contribution guidelines in the README or CONTRIBUTING

Done added a CONTRIBUTING.md

Tests

library(CoordinateCleaner) calls not needed within individual test scripts

Done removed the library() calls

Test_fossillevel_cleaning.R filename should be all lower case for consistency

Done, changed to lowercase

I suggest using more descriptive names for context()

Done

Using devtools::test(), I got the following warning.

Done, switch from warning to message for cc_gbif

Packaging guidelines

according to the guidelines, badges in README.md should be below the package name (unsure how important this is)

Done, moved badges downwards

include "how the package compares to other similar packages and/or how it relates to other packages" in README.md

I am not sure what exactly is needed here. I could add the table comparing CoordinateCleaner with scrubr, from the presubmission query (#199), but this seems excessive. Is this really necessary @maelle?

add citation info to end of in README.md

Done, added a suggested citation, will change to final once the corresponding manuscript is published

"Avoid starting the description with the package name or This package ..." - not sure if this also applies to the README package description, but you may want to change it to match the description in the DESCRIPTION file.

Done, synchronized the first sentences among DESCRIPTION and Readme.md

Spelling

All the text/documentation looks great on the whole! The devtools::spell_check() caught two typos that should be fixed:
Herbariorum institutions.Rd:13
THe CleanCoordinates.Rd:101
(The other "spelling errors" that it caught seem to be fine.)

Done, fixed number 2, number 1 is not a typo but the correctly latin spelling of the name of the institution “Index Herbariorum”

Code duplication/style

For the most part, the package is both clean and organized. I especially like the cc_ functions, which each tackle a specific type of error and have consistent styles.
Among the other files, there was a mix of style/naming conventions. Is there a reason for that? (asking honestly here, since I may be missing something) I understand that for methods.spatialvalid you must use the . convention to work with classes, but I'm unsure why you switched to capitals for CleanCoordinates* and WritePyRate. These functions (as well as tc_* and the tutorials) also include a lot of variables with . rather than _. Changing the variable names so that they use a consistent style would enhance the readability/usability of the code.

Yes, that is a good point. The wrapper functions originally had CamelCase to show that they differed from the individual tests. I agree that this was inconsistent, and switched to a consistent underscore_case naming scheme. Since this might causes some compatibility problems I increase the version number to 2.x with the latest version.

use ... to pass arguments specified in CleanCoordinates* to the cc_* functions called within it

I’d prefer not to do this because (i) this would mean to change the argument names in all the cc functions, and somewhat break their consistency, as for instance the “buffer” argument in cc_cap would need to become cap_buffer and in cc_cen cen_buffer and (ii) while it is true that the list of arguments is long, currently it is obvious also to unexperienced users of CleanCoordinates, that the buffers can be changed and should be thought about.

rather than writing out TRUE/FALSE for each test, perhaps use a single argument (tests) that can take a vector of desired tests (c("sea", "gbif", "equ"))

Done, switched to a character vector for all wrapper functions

set argument defaults to NULL directly in the argument list (capitals.ref = NULL), instead of in an extra step within the function

Done for all wrapper functions

currently, a variable is returned for each test. If FALSE is chosen, then the returned variable contains only NA's; if TRUE is chosen, the specified test is run and the flagged output is saved. This results in code repetition that may be avoided by (a) specifying the NA output as the default in the beginning of the code and (b) using a data frame/list structure to store the results. To clarify:

Done for all wrapper functions

the CleanCoordinates* functions also have warnings/checks that perhaps are more appropriate within the cc_* functions (example)

I’d prefer to keep them where they are, since I think this way the error message can be more informative for the user of the CleanCoordinates wrapper function, i.e. it can directly be specified how to fix the problem, and drop certain tests if they are causing the error. For instance, if no country information is available, just omitting the countries test from clean_coordinates might be a good solution, whereas cc_count does absolutely require the country information.

as far as I can tell, cc_dupl can be used in place of this code

Done, replaced by cc_dupl

In WritePyRate, since fname and status are required, consider moving them earlier in the argument list (positions 2, 3) and not setting a default value (currently NULL)

Done, moved the elements up in the list and removed the default.

In cc_val (starting here), there is redundancy in as.numeric/as.character/suppressWarnings that could be reduced.

Done, changed code as proposed

Miscellaneous

Currently, the map of the different tests distinguishes clean and flagged points by color, and distinguishes tests by shape. Are the different tests important? If not, perhaps they can grouped into a "flagged" category. If yes, then perhaps color could be used to better distinguish them.

Done. The marking of the individual tests can be switched on or off with the details option of plot.spatialvalid. I think it can be interesting to visualize which tests failed, for instance to see if there are spatial patterns in data quality, but I agree that it might also be confusing. I switched the default for details to FALSE, so that by defaults now there is only a colour difference between clean and flagged.

The description of value in the Rd files (for example, here) can be clarified by moving ("flags") up in the sentence: Depending on the ‘value’ argument, either a data.frame containing the records considered correct by the test (“clean”) or a logical vector (“flags”), with TRUE = test passed and FALSE = test failed/potentially problematic. Default = “clean”

Done, moved the part in the brackets as suggested

Also, it may help to stick with options for value that are both verbs or both adjectives (but this is just a minor suggestion). For example: clean/flag, clean/flagged, cleaned/flagged.

Done, changed to clean/flagged

??CoordinateCleaner has strange output: c("\Sexpr[results=rd,stage=build]tools:::Rd_package_title("#1")", "CoordinateCleaner")Automated Cleaning of Occurrence Records from Biological Collections I'm no sure where this error comes from.

I have no clue what is going on here. Any ideas @maelle?

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Jun 12, 2018

An additional question:

The latest build passes without problems, but fails on travis (see below). I assume the vignettes are to RAM heavy. I tried to solve this, but couldn't. Do you have any experience with this problem? Thanks!

Warning in system(command) : system call failed: Cannot allocate memory
Warning in system(command) : error in running command
Error: processing vignette 'Cleaning_PBDB_fossils_with_CoordinateCleaner.Rmd' failed with diagnostics:
pandoc document conversion failed with error 127

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 12, 2018

Thanks @azizka!

  • The table comparing CoordinateCleaner with scrubr might be excessive but adding a few lines about how CoordinateCleaner fits into the occurrence data R tooling, and where scrubr might be used in combination, would be good.

  • Did you generate the package level documentation using usethis::use_package_doc()?

  • Could you ask the Travis question in Slack to see whether someone had the same issue? I'm afraid I can't help directly.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Jun 13, 2018

  • Alright, I added the following text to the Readme.md (including links): CoordinateCleaner can be particularly useful to ensure geographic data quality when using data from GBIF (e.g. obtained with rgbif) for historical biogeography (e.g. with BioGeoBEARS or phytools), automated conservation assessment (e.g. withspeciesgeocodeR or conR) or species distribution modelling (e.g. with dismo or sdm. See scrubr and taxize for complementary taxonomic cleaning.

  • No, I migrated the documentation to Roxygen2 with Rd2roxygen as you suggested earlier.

  • Great, I'll ask on slack.

Thanks :-)

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jun 30, 2018

👋 @Pakillo could you please get your review in soon? Thank you! 😺

@Pakillo

This comment has been minimized.

Copy link

Pakillo commented Jul 1, 2018

I'm very sorry for the delay -- the end of term workload is being rather crazy. I expect to have it ready this week. Apologies

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Jul 2, 2018

Thanks for the update @Pakillo!

@Pakillo

This comment has been minimized.

Copy link

Pakillo commented Jul 6, 2018

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally (Vignettes are great, but one of them failed building in my computer)
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8


Review Comments

Hi all,

Many thanks @azizka for developing CoordinateCleaner, @maelle for the invitation to review (and the patience!), and @isteves for sharing the review and breaking the ice :)

I love CoordinateCleaner. I have already used for my own research and it really fills a gap in terms of cleaning occurrence data (or any coordinate data, really). It brings many new functionalities, it is well thought, relatively easy to use, and it works. So thanks a lot.

I agree with all comments by @isteves above. I think they have greatly contributed to make the package even nicer and more consistent. Some remaining issues or comments:

Package size

Installation of current version from GitHub takes a 93 MB download, which seems maybe too much? Apparently ~70 MB of it is git history and files (.git folder). I wonder if there is some big file hidden in history, which could perhaps be removed (e.g. using https://rtyley.github.io/bfg-repo-cleaner/). That package size was not a problem for me but could be for some users with slow connections. I think rOpenSci experts can advise more on this, or confirm this is actually OK.

Apart from git history, there are some big files in the current version of the repo. First, two largish PDF files of rendered vignettes in Tutorials folder which can probably be removed? I am not sure about rOpenSci conventions here, but maybe you can store just HTML rendered versions of the vignettes (which weigh much less, ~110 KB for the GBIF vignette) in a gh-pages branch so they can be seen nicely. Or, even better, use pkgdown to create a website for your package, including not only rendered vignettes but also rendered examples, etc.

The other second set of large files are data files (particularly GIS datasets like landmass.rda, >2 MB, or urbanareas.rda, >10MB). As most of these datasets actually come from Natural Earth, I think they could perhaps be downloaded on purpose when needed, probably using rnaturalearth package. Also, some of these datasets are directly available in rnaturalearthdata and rnaturalearthhires packages (all already part of rOpenSci), so you could maybe just use them so as not to duplicate files.

README

The Readme is good, just a few typos and minor comments. I have also made a pull request fixing some typos I found along the way.

  • Typo in badge linking to Travis (fixed in pull-request). Also, Travis is giving build error, but I see you are already on track.

  • Provide hyperlinks to NEWS, vignettes, contributing files, so the reader just have to click from README to see them?

  • When discussing related packages, I would also mention biogeo (https://github.com/cran/biogeo).

  • Use README.Rmd? So the reader can see the output of the example code in README without having to run it by themselves. Maybe do not show all output, but some of it.

  • Current example code gives errors. I think it needs updating after the API changes:

    dsl <- clean_coordinatesDS(exmpl) gives error (Changed to dsl <- clean_dataset(exmpl) in pull request)

    clean_coordinatesFOS must be changed to clean_fossils

    dc_ddmm does not exist. Change to cd_ddmm?

    tc_range does not exist. Has it changed to cf_range?

data

  • As explained above, consider removing large data files from the package? Use rnaturalearth and associated packages instead (as some functions, e.g. cc_coun, already do). Maybe only buffland_1deg (currently in extra_gazetteers) would need to be included.

  • Looks like data in capitals.rda (and parts of centroids.rda) are already contained in countryref.rda. If possible, avoid duplication.

  • institutions: If there is an script used to build this data frame, it might be good to include it (e.g. in a data-raw folder) for the sake of reproducibility and possibility of making further changes in the future.

extra-gazetteers

  • urbanareas.rda is >10 MB big, and countryborders >2 MB. As all these seem to come from Natural Earth, consider using rnaturalearth to access them when needed?

  • institutions_utf8 seems duplicated with institutions.rda in data folder? (only having 1 row more). Keep only one?

  • I think the documentation (Rd) files in this folder are not available to the user. Move all the data to the data folder, and make sure all are documented?

Documentation (man folder)

  • Rd file for CoordinateCleaner-package.Rd needs to be revised. Seems the conversion to ROxygen hasn't worked well.

  • I think the help files for CleanCoordinates, CleanCoordinatesDS, CleanCoordinatesFOS, should mark them as Deprecated, see https://ropensci.github.io/dev_guide/evolution.html#functions-deprecate-defunct.

  • Default longitude and latitude arguments for fossils functions (cf_outl, cf_range...) are "lng" and "lat", respectively, while the params explanation states the default are "decimallongitude" and "decimallatitude" (as in the rest of the package). Use the latter throughout?

Functions

  • Several functions using Spatial* objects assume they already have a geographical (latlon) projection (e.g. cc_cap, L67-68). As a suggestion, given they are Spatial objects it might be better to read their projection (proj4string) directly and ensure it is indeed correct. Otherwise, give warning and optionally make reprojection on the fly? (e.g. using spTransform). Also, as sf will become more and more popular consider enabling them also as arguments, not only sp objects (e.g. doing an internal conversion as(sf, "Spatial")).

  • cc_coun: Although there exists cc_sea, I think it would be good if this function flagged every coordinate not within the supposed country, even if those coordinates fall on sea.

  • cc_sea gives TRUE if coordinates fall on land, and FALSE if coordinates fall on sea. I think this is confusing and may induce errors in future users (expecting TRUE if falling on sea, given the function name). To respect the convention across the whole package about TRUE/FALSE, maybe then rename the function as cc_land? Which would give TRUE when a location really falls on land.

  • cc_urb: download urbanareas from Natural Earth on demand?

  • plot.spatialvalid seems to use landmass.rda for plotting, but it could also use borders from ggplot2 (probably lighter and faster, and in case landmass is finally taken out of the package).

  • clean_dataset: the example flags <- clean_dataset(test) gives warning: In FUN(X[[i]], ...) : Geographic spann to small, check 'min_span'

Dependencies

The tidyverse package is listed in Suggests and effectively loaded in vignettes, but I think only a small subset of tidyverse packages are needed? Maybe better to use only those?

Vignettes

  • Is the Tutorials folder with rendered PDF vignettes necessary? Can use lighter HTML vignettes than PDF? Use pkgdown?

  • The PBDB vignette PBDB doesn't run for me. It actually seems a problem with the paleobioDB package?
    dat <- pbdb_occurrences(base_name = "Magnoliopsida", vocab = "pbdb", limit = 5000,
    show = c("coords", "phylo", "attr", "loc", "time", "rem"))
    Error in !sapply(data, function(l) is.null(l) | (ncol(l) == 0) | (nrow(l) == :
    invalid argument type

  • The vignettes need a further revision to update links, pkg versions, typos, etc.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Aug 26, 2018

@azizka Any news?

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Aug 26, 2018

Hej,

yes it does, I fixed it as suggested. Thanks @Pakillo.

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Aug 27, 2018

@Pakillo @isteves are you both now happy with the package? If so can you check "The author has responded to my review and made changes to my satisfaction. I recommend approving this package." in your review? Thanks!

@Pakillo

This comment has been minimized.

Copy link

Pakillo commented Aug 27, 2018

Hi,

Yes, I'm very happy with the package. Many thanks @azizka for the hard work, @isteves for your comments, and @maelle for your excellent editorial work.

A few final comments:

Hope this helps.

Cheers!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Aug 28, 2018

I wonder if there is a tool to check for broken links in R packages

At least CRAN must have one... will ask in Slack.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Sep 10, 2018

Hi,

I went through @Pakillo latest changes (thanks!):

Travis is still giving error due to undocumented arguments in cc_inst (see Travis log). Please fix.

Done. Travis build is succeeding now.

There are a number of broken URL links across function help files, vignettes... (e.g. many pointing to the now deleted wiki, or to the former extra-gazetteers folder). Please revise. (I wonder if there is a tool to check for broken links in R packages @maelle ?)

Done. Replaced the links to the wiki with links to the documentation page and removed the link to the extra gazetteers.

Not required, but I agree it would be very helpful to group functions by theme (e.g. cleaning records, datasets, fossils...) in the pkgdown website as @maelle recommended (https://ropensci.github.io/dev_guide/building.html#function-grouping)

Done. Grouped the functions according to the cleaning target.

The help file for deprecated functions in the pkgdown website appears missing (https://azizka.github.io/CoordinateCleaner/reference/CoordinateCleaner-deprecated.html). But it seems to be due to the same typo in the package name we saw before (missing capital C), because https://azizka.github.io/CoordinateCleaner/reference/Coordinatecleaner-deprecated.html does work for me. Please ensure that Coordinatecleaner-deprecated.Rd gets changed to CoordinateCleaner-deprecated after Roxygenising.

Done. CoordinateCleaner-deprecated is working now.

At the Documentation section in the GitHub repo Readme, maybe point to rendered vignettes (i.e. https://azizka.github.io/CoordinateCleaner/articles/, at website built with pkgdown) rather than Rmd sources on github (https://github.com/azizka/CoordinateCleaner/tree/master/vignettes)?

Done. Changed as suggested

Also at the Readme, when talking about other related packages, maybe point to the vignette comparing their functionality? (i.e. https://azizka.github.io/CoordinateCleaner/articles/comparison_other_software.html). I think this will particularly help people approaching the package for the first time and having to decide which one to use, or how CoordinateCleaner differs from the others.

Done. Added the links.

How will things proceed now?

Cheers,

Alex

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Sep 11, 2018

We'll wait for @isteves to chime in when she has time, and then we should be close to the end of this process. 😺

@isteves

This comment has been minimized.

Copy link

isteves commented Sep 14, 2018

Hey all,

I took another brief look through the package. Rather than getting deep into the code, I checked it out the way I would any new package--I looked through the pkgdown docs, tested out the examples, and checked out bits and pieces of the documentation.

First, some praise 👏 🥇 🎆

I'm a fan of the pkgdown website. I think it's a great addition, and it made it easy to re-orient myself to the package. Specifically, I liked:

  • the organization of the functions on the reference page
  • the organization of the Articles -- they're well-ordered and divided into meaningful categories
  • the function comparison table! I don't remember it from before, and I think it would be great even as a stand-alone resource. It provides a really comprehensive overview of CoordinateCleaner and other packages in this realm

Typos/small fixes

(note--mostly just checked the pkgdown version of these...)

In the R-environment the scrubr an biogeo offer cleaning approaches complementary to CoordinateCleaner.

Sidenote: My personal preference would be to avoid starting a sentence with a lower-case package name, but it’s up to you:
scrubr combines basic geographic cleaning… -> The scrubr package combines…

Identifying erroneous coordinates using

We based the choice of tests on common problems observed in biological collection databases (see for example (???))

poison

💀

General comments/food for thought

  • I like the output of clean_coordinates, but I found it strange that some columns from the initial data were dropped. I was expecting behavior more similar to broom::augment.
  • The documentation of clean_dataset confused me. Given the description ("Identifies potentially problematic coordinates..."), I was expecting to get flagged coordinates as an output, rather the output of a test. Something like this may help: "Tests for problems associated with coordinate conversions and rounding...." Perhaps documenting the output with @return could also help
  • for the cc_* functions, the default behavior is to filter out "bad" records, but the message says "Flagged XX records." For me, "flagged" ~ "marked." Changing the wording to "removed"/"omitted"/"filtered out" would more clearly indicate to me that the function removes records.
  • cc_sea downloads something each time I run it. Perhaps you can download the relevant files just the first time the user runs the function with something along the lines of:
path <- file.path(system.file(package = "CoordinateCleaner"), "sea.EXT")

if(!file.exists(path)) {
    download_sea(path)
}

sea <- load_sea(path)
  • I like your example of the cc_* functions + the pipe. An example where flags are added as extra columns may also be worth including:
exmpl %>%
    as_tibble() %>% 
    mutate(val = cc_val(., value = "flagged"),
                sea = cc_sea(., value = "flagged"))
  • I recently learned about the @inheritParams/@inherit tags, which was such a game-changer! You can define your parameter(s) once and use @inheritParams fxn_name for any other function that uses the parameter(s) (and same for sections). Definitely give it a try at some point if you haven't already 😄

I think it's almost good to go!

-Irene

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Sep 14, 2018

Many thanks @isteves !

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Sep 24, 2018

Thanks @isteves!

I have now included almost all suggestions; please see a point-by-point reply below. I am very sorry for the typos; the documentation has become so large now that some always slip through.

Thanks especially for the hint with @inherit. I am also continuously working to increase the test coverage.
Cheers,

Alex

Typos/small fixes

In the R-environment the scrubr an biogeo offer cleaning approaches complementary to CoordinateCleaner.

Fixed. Changed to ‘and’.

Sidenote: My personal preference would be to avoid starting a sentence with a lower-case package name, but it’s up to you: scrubr combines basic geographic cleaning… -> The scrubr package combines…

Fixed. Changed as suggested.

https://azizka.github.io/CoordinateCleaner/articles/Background_the_institutions_database.html#data-compilation

Image not showing. It's trying to link to this

Fixed. Removed figure and replaced by a link.

https://azizka.github.io/CoordinateCleaner/articles/Tutorial_Cleaning_GBIF_data_with_CoordinateCleaner.html#identifying-erroneous-coordinates-using

Identifying erroneous coordinates using

Fixed. Changed to ‘with’.

We based the choice of tests on common problems observed in biological collection databases (see for example (???))

Fixed. Fixed citation.

https://github.com/azizka/CoordinateCleaner/blob/69e1848fb9ab7654bdf806c790f56e4dd354d8b0/R/clean_dataset.R#L25

poison 💀

Fixed. Changed to ‘Poisson’. 🐟

https://azizka.github.io/CoordinateCleaner/articles/Tutorial_Cleaning_GBIF_data_with_CoordinateCleaner_files/figure-html/unnamed-chunk-15-1.png

Increase point size? I thought I was looking at a blank plot at first...

Not fixed. I completely agree, that the points are hard to see which is anoying. However, this plot shows the analyses matrix as used during the test. So, in this case it was a 1000 x 1000 matrix, hence the resulting diagnostic plot has the same resolution and individual cells are plotted very small. I think it is preferably to live with this visualization to keep the link between the test and the output plot, also in the vignette.

General comments/food for thought

I like the output of clean_coordinates, but I found it strange that some columns from the initial data were dropped. I was expecting behavior more similar to broom::augment.

Fixed. Changed as suggested. The “spatialvalid” output of clean_coordinates and clean_fossils now add the test columns to x, similar to broom::augment.

The documentation of clean_dataset confused me. Given the description ("Identifies potentially problematic coordinates..."), I was expecting to get flagged coordinates as an output, rather the output of a test. Something like this may help: "Tests for problems associated with coordinate conversions and rounding...." Perhaps documenting the output with @return could also help

Fixed. Changed the description as suggested, and improved the documentation of the output with @return.

for the cc_* functions, the default behavior is to filter out "bad" records, but the message says "Flagged XX records." For me, "flagged" ~ "marked." Changing the wording to "removed"/"omitted"/"filtered out" would more clearly indicate to me that the function removes records.

Fixed. Changed the wording in the function title to “Identify” and in the description to “Removes or flags”.

cc_sea downloads something each time I run it. Perhaps you can download the relevant files just the first time the user runs the function with something along the lines of:
path <- file.path(system.file(package = "CoordinateCleaner"), "sea.EXT")

if(!file.exists(path)) {
download_sea(path)
}

sea <- load_sea(path)

Fixed. Implemented as suggested for cc_sea and cc_urb

I like your example of the cc_* functions + the pipe. An example where flags are added as extra columns may also be worth including:
exmpl %>%
as_tibble() %>%
mutate(val = cc_val(., value = "flagged"),
sea = cc_sea(., value = "flagged"))

Fixed. Added the suggested example to the “Quickstart_Flagging_problematic_coordinates_in_a_nutshell” and Tutorial_Cleaning_GBIF_data_with_CoordinateCleaner vignettes.

I recently learned about the @inheritParams/@inherit tags, which was such a game-changer! You can define your parameter(s) once and use @inheritParams fxn_name for any other function that uses the parameter(s) (and same for sections). Definitely give it a try at some point if you haven't already

Fixed. Used @inheritParams and @inherit throughout the documentation. Very nice, thanks!

@isteves

This comment has been minimized.

Copy link

isteves commented Sep 25, 2018

Hey @azizka, it looks great! @maelle -- anything else we should do?

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Sep 27, 2018

Thanks @isteves! No, I just need to run the last checks which I'll do ASAP!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Sep 27, 2018

Approved! Thanks @azizka for submitting and @isteves @Pakillo for your reviews! 😺

To-dos:

  • When mentioning the NEWS file in the README, add a link to the NEWS file so that it might be easier to find it.
  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Add the rOpenSci footer to the bottom of your README
    " [![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)"
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname).
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
    Should you want to awknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing.

We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Sep 28, 2018

Hej,

great, thanks. Very exciting. I am teaching abroad with no opportunity to work on this the next two weeks, but will do this as soon as I come back!

Thanks again to all of you!

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 12, 2018

Hi,
@isteves, @Pakillo, @maelle do you agree to be acknowledged as reviewers for the package?
Please let me know here.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 12, 2018

@maelle, after transferring ownership, how can I edit the short package description on the github page, since it links to the old pagedown, adress?

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Oct 12, 2018

@azizka I've now made you admin of the repo again! I couldn't do that before transfer. Thanks for transferring! 😸

@stefaniebutland

This comment has been minimized.

Copy link

stefaniebutland commented Oct 12, 2018

Hello @azizka. Congratulations on acceptance of CoordinateCleaner!
Are you interested in writing a post for the rOpenSci blog, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development (https://ropensci.org/blog/)?

This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/onboarding/.

Here are some technical and editorial guidelines for contributing a post: https://github.com/ropensci/roweb2#contributing-a-blog-post.

Please let me know what you think.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 13, 2018

Hi @stefaniebutland,
yes I am interested in writing a blog post, but there is a manuscript for the package in review at MEE currently, I'd prefer to wait for a decision on that first.
Cheers,

Alex

@stefaniebutland

This comment has been minimized.

Copy link

stefaniebutland commented Oct 13, 2018

Sounds good @azizka. Would be nice to be able to note the publication in the post.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 16, 2018

Hej @maelle,

almost done, the last thing is to migrate the badges. Where can I find help to migrate the travis-ci build and codecov badges properly, since they are not working at the moment.

Thanks!

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Oct 16, 2018

@azizka weird, I have now activated the repo at https://travis-ci.org/ropensci/CoordinateCleaner (your badge points to travis-ci.com at the moment), let's see if that works.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 16, 2018

OK, thanks. I switch to travis.org; let's see if it works once the first run is done. I fixed the codecov badge.

@azizka

This comment has been minimized.

Copy link
Author

azizka commented Oct 19, 2018

Alright, I think everything is working now. The latest version (2.0-2) is also on CRAN now. Is there anything else to do?

@maelle

This comment has been minimized.

Copy link
Member

maelle commented Nov 9, 2018

No, sorry, closing the issue now!

@maelle maelle closed this Nov 9, 2018

@stefaniebutland

This comment has been minimized.

Copy link

stefaniebutland commented Feb 8, 2019

@azizka I see your manuscript is submitted to MEE. Wishing you a smooth process

@stefaniebutland

This comment has been minimized.

Copy link

stefaniebutland commented Feb 11, 2019

@azizka, Scott Chamberlain who is the rOpenSci point person for the MEE collaboration told me that the coordinatecleaner paper will be published on 21 Feb - and MEE asked if we can coordinate blog post timing.

This would be great to bring more attention to your work if we can coordinate. Do you think you could have a draft post submitted by Thurs Feb 14 or Fri 15th this week? That would give me time to review and you to address any feedback.

(edited). Update - I understand that you're also working on a post for MEE. Our audiences are different, but we can discuss options for cross-posting if you don't want to write two posts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment