-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoordinateCleaner #210
Comments
Editor checks:
Editor commentsThanks for your submission @azizka! 😁 Initial checks revealed a few issues that you should solve before I assign reviewers. Do not hesitate to ask any question you might have!
It is good practice to
? write unit tests for all functions, and all package
code in general. 62% of code lines are covered by test cases.
R/cc_cap.R:18:NA
R/cc_cap.R:19:NA
R/cc_cap.R:29:NA
R/cc_cen.R:19:NA
R/cc_cen.R:21:NA
... and 580 more lines
You do not need to reach 100% code coverage but you can increase it a bit. ? not use "Depends" in DESCRIPTION, as it can cause
name clashes, and poor interaction with other packages. Use
"Imports" instead. Feel free to ignore this if you need the ? omit "Date" in DESCRIPTION. It is not required and it
gets invalid quite often. A build date will be added to the
package when you perform `R CMD build` on it. Please remove Date from there. ? add a "URL" field to DESCRIPTION. It helps users find
information about your package online. If your package does not
have a homepage, add an URL to GitHub, or the CRAN package
package page.
? add a "BugReports" field to DESCRIPTION, and point it
to a bug tracker. Many online code hosting services provide bug
trackers for free, https://github.com, https://gitlab.com, etc.
Make it easy to find your GitHub repo by adding these links. ? avoid long code lines, it is bad for readability.
Also, many people prefer editor windows that are about 80
characters wide. Try make your lines shorter than 80 characters
R\cc_cap.R:1:1
R\cc_cap.R:24:1
R\cc_cen.R:1:1
R\cc_coun.R:1:1
R\cc_coun.R:17:1
... and 210 more lines
Maybe ? avoid sapply(), it is not type safe. It might return
a vector, or a list, depending on the input data. Consider
using vapply() instead.
R\WritePyRate.R:88:68
Please change ? not import packages as a whole, as this can cause
name clashes between the imported packages. Instead, import
only the specific functions you need.
See this part of Hadley Wickham's R package book ? not use exportPattern in NAMESPACE. It can lead to
exporting functions unintendedly. Instead, export functions
that constitute the external API of your package. When using ? fix this R CMD check NOTE: Note: found 1 marked
Latin-1 string Not sure what this means actually. ? fix this R CMD check NOTE: Found the following hidden
files and directories: .travis.yml These were most likely
included in error. See section 'Package structure' in the
'Writing R Extensions' manual. Please put such files and folders in .Rbuildignore. Please update this thread once you've done the changes, or as soon as you have a question! |
@azizka do you have any question or comment? ☺ |
Hi @maelle sounds all good, thanks. I am working to include most of your suggestions. I'll defend my PhD next week, so little slow working on the package at the moment, but will tackle this full on afterwards. |
Oh good luck with your defence! |
@azizka I hope your defence went well! I'm taking the risk to say congrats! 🎊 Did you have time to work on the package a bit? |
@maelle yes, it worked out fine, thanks for congratulations and your patience! I am back to work this week, working on the package now. |
🎊 Grattis! 🎊 and cool! |
Hi @maelle, Please find below replies to your comments. I hope you find everything covered, please let me know what else I can do. Thanks again for the patience :-). Note: the most recent version is working, I’ll updated the badges as soon as the problems with travis are solved (travis-ci/travis-build#1382) Replies:
Done - Migrated documentation and namespace generation to roxygen2
The extra gazetteers are open source, they could live in a separate data package, but I am not sure that this is necessary/useful since they have very specialized applicability.
Done - Removed the .tar.gz and zip
It is at 74% now
Done - moved sp from ‘Depends’ to ‘Imports’. Yes, sf would be great since its faster and tidy, but unfortunately some of the dependencies do not yet support sf formats (e.g. raster, geosphere).
Done - Removed the date from the description file
Done - Added the urls to the description file
Done – cut at 80
Done
Done
Done – switched to roxygen2
I could not reproduce this note, do you have additional information on the operating system and check settings, etc? I suspect it might be related to a non-ASCII character in one of the datasets in the package, but I am not entirely sure either.
Done, the .travis.yml is part of .Rbuildignore now |
@azizka thanks for all your work! Am going to assign reviewers in a bit. Two last comments before the reviews:
|
👋 @isteves @Pakillo! Thanks for accepting to review this package! Your reviews are due 2018-06-07 but @Pakillo I've already noted that yours might be late (@azizka the reviewers of your package were lined up a while ago 😉) Here is our reviewer template and our reviewer guide. |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 7 Review CommentsHi @azizka - great, comprehensive package! I'm impressed! Here are some comments related to the sections above:
Here are some additional comments. Let me know if anything is unclear! SpellingAll the text/documentation looks great on the whole! The Herbariorum institutions.Rd:13 (The other "spelling errors" that it caught seem to be fine.) Code duplication/styleFor the most part, the package is both clean and organized. I especially like the Among the other files, there was a mix of style/naming conventions. Is there a reason for that? (asking honestly here, since I may be missing something) I understand that for In the
Current code
Proposed change
In WritePyRate, since In
MiscellaneousCurrently, the map of the different tests distinguishes clean and flagged points by color, and distinguishes tests by shape. Are the different tests important? If not, perhaps they can grouped into a "flagged" category. If yes, then perhaps color could be used to better distinguish them. The description of
Also, it may help to stick with options for
|
Many thanks for your review @isteves, great points! 😸
@azizka You can wait for @Pakillo's review before responding, or respond before that, since @Pakillo's review was going to be a bit late, so as you prefer. Thanks again @isteves! |
@isteves I forgot to ask you to fill the "Estimated hours spent reviewing:" field in the review template, thanks in advance 🙏 |
@Pakillo 👋 will you soon have time to review the package? 😺 |
thanks for the comprehensive, helpful and very constructive review. I managed to address most of your issues, and argued back/put questions @maelle for help on some of them. Please find a point-by-point reply below, replies in italics. Please let me know if you have any comments/suggestions. Thanks again! Vignettes
Done, moved tutorials to vignettes Examples
Done, changed as suggested
Done, fixed the code of the example (replaced "identified_name" by "accepted_name")
Done added plot call to the example. Community guidelines
No changes, see @maelle ‘s comment
Done added a CONTRIBUTING.md Tests
Done removed the library() calls
Done, changed to lowercase
Done
Done, switch from warning to message for cc_gbif Packaging guidelines
Done, moved badges downwards
I am not sure what exactly is needed here. I could add the table comparing CoordinateCleaner with scrubr, from the presubmission query (#199), but this seems excessive. Is this really necessary @maelle?
Done, added a suggested citation, will change to final once the corresponding manuscript is published
Done, synchronized the first sentences among DESCRIPTION and Readme.md Spelling
Done, fixed number 2, number 1 is not a typo but the correctly latin spelling of the name of the institution “Index Herbariorum” Code duplication/style
Yes, that is a good point. The wrapper functions originally had CamelCase to show that they differed from the individual tests. I agree that this was inconsistent, and switched to a consistent underscore_case naming scheme. Since this might causes some compatibility problems I increase the version number to 2.x with the latest version.
I’d prefer not to do this because (i) this would mean to change the argument names in all the cc functions, and somewhat break their consistency, as for instance the “buffer” argument in cc_cap would need to become cap_buffer and in cc_cen cen_buffer and (ii) while it is true that the list of arguments is long, currently it is obvious also to unexperienced users of CleanCoordinates, that the buffers can be changed and should be thought about.
Done, switched to a character vector for all wrapper functions
Done for all wrapper functions
Done for all wrapper functions
I’d prefer to keep them where they are, since I think this way the error message can be more informative for the user of the CleanCoordinates wrapper function, i.e. it can directly be specified how to fix the problem, and drop certain tests if they are causing the error. For instance, if no country information is available, just omitting the countries test from clean_coordinates might be a good solution, whereas cc_count does absolutely require the country information.
Done, replaced by cc_dupl
Done, moved the elements up in the list and removed the default.
Done, changed code as proposed Miscellaneous
Done. The marking of the individual tests can be switched on or off with the details option of plot.spatialvalid. I think it can be interesting to visualize which tests failed, for instance to see if there are spatial patterns in data quality, but I agree that it might also be confusing. I switched the default for details to FALSE, so that by defaults now there is only a colour difference between clean and flagged.
Done, moved the part in the brackets as suggested
Done, changed to clean/flagged
I have no clue what is going on here. Any ideas @maelle? |
An additional question: The latest build passes without problems, but fails on travis (see below). I assume the vignettes are to RAM heavy. I tried to solve this, but couldn't. Do you have any experience with this problem? Thanks!
|
Thanks @azizka!
|
Thanks :-) |
👋 @Pakillo could you please get your review in soon? Thank you! 😺 |
I'm very sorry for the delay -- the end of term workload is being rather crazy. I expect to have it ready this week. Apologies |
Thanks for the update @Pakillo! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 8 Review CommentsHi all, Many thanks @azizka for developing CoordinateCleaner, @maelle for the invitation to review (and the patience!), and @isteves for sharing the review and breaking the ice :) I love CoordinateCleaner. I have already used for my own research and it really fills a gap in terms of cleaning occurrence data (or any coordinate data, really). It brings many new functionalities, it is well thought, relatively easy to use, and it works. So thanks a lot. I agree with all comments by @isteves above. I think they have greatly contributed to make the package even nicer and more consistent. Some remaining issues or comments: Package sizeInstallation of current version from GitHub takes a 93 MB download, which seems maybe too much? Apparently ~70 MB of it is git history and files (.git folder). I wonder if there is some big file hidden in history, which could perhaps be removed (e.g. using https://rtyley.github.io/bfg-repo-cleaner/). That package size was not a problem for me but could be for some users with slow connections. I think rOpenSci experts can advise more on this, or confirm this is actually OK. Apart from git history, there are some big files in the current version of the repo. First, two largish PDF files of rendered vignettes in The other second set of large files are data files (particularly GIS datasets like landmass.rda, >2 MB, or urbanareas.rda, >10MB). As most of these datasets actually come from Natural Earth, I think they could perhaps be downloaded on purpose when needed, probably using READMEThe Readme is good, just a few typos and minor comments. I have also made a pull request fixing some typos I found along the way.
data
extra-gazetteers
Documentation (
|
@azizka Any news? |
Hej, yes it does, I fixed it as suggested. Thanks @Pakillo. |
Hi, Yes, I'm very happy with the package. Many thanks @azizka for the hard work, @isteves for your comments, and @maelle for your excellent editorial work. A few final comments:
Hope this helps. Cheers! |
At least CRAN must have one... will ask in Slack. |
Hi, I went through @Pakillo latest changes (thanks!):
Done. Travis build is succeeding now.
Done. Replaced the links to the wiki with links to the documentation page and removed the link to the extra gazetteers.
Done. Grouped the functions according to the cleaning target.
Done. CoordinateCleaner-deprecated is working now.
Done. Changed as suggested
Done. Added the links. How will things proceed now? Cheers, Alex |
We'll wait for @isteves to chime in when she has time, and then we should be close to the end of this process. 😺 |
Hey all, I took another brief look through the package. Rather than getting deep into the code, I checked it out the way I would any new package--I looked through the pkgdown docs, tested out the examples, and checked out bits and pieces of the documentation. First, some praise 👏 🥇 🎆I'm a fan of the pkgdown website. I think it's a great addition, and it made it easy to re-orient myself to the package. Specifically, I liked:
Typos/small fixes(note--mostly just checked the pkgdown version of these...)
Sidenote: My personal preference would be to avoid starting a sentence with a lower-case package name, but it’s up to you:
💀
General comments/food for thought
path <- file.path(system.file(package = "CoordinateCleaner"), "sea.EXT")
if(!file.exists(path)) {
download_sea(path)
}
sea <- load_sea(path)
I think it's almost good to go! -Irene |
Many thanks @isteves ! |
Thanks @isteves! I have now included almost all suggestions; please see a point-by-point reply below. I am very sorry for the typos; the documentation has become so large now that some always slip through. Thanks especially for the hint with @inherit. I am also continuously working to increase the test coverage. Alex Typos/small fixes
Fixed. Changed to ‘and’.
Fixed. Changed as suggested.
Fixed. Removed figure and replaced by a link.
Fixed. Changed to ‘with’.
Fixed. Fixed citation.
Fixed. Changed to ‘Poisson’. 🐟
Not fixed. I completely agree, that the points are hard to see which is anoying. However, this plot shows the analyses matrix as used during the test. So, in this case it was a 1000 x 1000 matrix, hence the resulting diagnostic plot has the same resolution and individual cells are plotted very small. I think it is preferably to live with this visualization to keep the link between the test and the output plot, also in the vignette. General comments/food for thought
Fixed. Changed as suggested. The “spatialvalid” output of clean_coordinates and clean_fossils now add the test columns to x, similar to broom::augment.
Fixed. Changed the description as suggested, and improved the documentation of the output with @return.
Fixed. Changed the wording in the function title to “Identify” and in the description to “Removes or flags”.
Fixed. Implemented as suggested for cc_sea and cc_urb
Fixed. Added the suggested example to the “Quickstart_Flagging_problematic_coordinates_in_a_nutshell” and Tutorial_Cleaning_GBIF_data_with_CoordinateCleaner vignettes.
Fixed. Used @inheritParams and @inherit throughout the documentation. Very nice, thanks! |
Thanks @isteves! No, I just need to run the last checks which I'll do ASAP! |
Approved! Thanks @azizka for submitting and @isteves @Pakillo for your reviews! 😺 To-dos:
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. (https://ropensci.org/blog/). If you are interested, @stefaniebutland will be in touch about content and timing. We've started putting together a gitbook with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here. |
Hej, great, thanks. Very exciting. I am teaching abroad with no opportunity to work on this the next two weeks, but will do this as soon as I come back! Thanks again to all of you! |
@maelle, after transferring ownership, how can I edit the short package description on the github page, since it links to the old pagedown, adress? |
@azizka I've now made you admin of the repo again! I couldn't do that before transfer. Thanks for transferring! 😸 |
Hello @azizka. Congratulations on acceptance of CoordinateCleaner! This link will give you many examples of blog posts by authors of onboarded packages so you can get an idea of the style and length you prefer: https://ropensci.org/tags/onboarding/. Here are some technical and editorial guidelines for contributing a post: https://github.com/ropensci/roweb2#contributing-a-blog-post. Please let me know what you think. |
Hi @stefaniebutland, Alex |
Sounds good @azizka. Would be nice to be able to note the publication in the post. |
Hej @maelle, almost done, the last thing is to migrate the badges. Where can I find help to migrate the travis-ci build and codecov badges properly, since they are not working at the moment. Thanks! |
@azizka weird, I have now activated the repo at https://travis-ci.org/ropensci/CoordinateCleaner (your badge points to travis-ci.com at the moment), let's see if that works. |
OK, thanks. I switch to travis.org; let's see if it works once the first run is done. I fixed the codecov badge. |
Alright, I think everything is working now. The latest version (2.0-2) is also on CRAN now. Is there anything else to do? |
No, sorry, closing the issue now! |
@azizka I see your manuscript is submitted to MEE. Wishing you a smooth process |
@azizka, Scott Chamberlain who is the rOpenSci point person for the MEE collaboration told me that the coordinatecleaner paper will be published on 21 Feb - and MEE asked if we can coordinate blog post timing. This would be great to bring more attention to your work if we can coordinate. Do you think you could have a draft post submitted by Thurs Feb 14 or Fri 15th this week? That would give me time to review and you to address any feedback. (edited). Update - I understand that you're also working on a post for MEE. Our audiences are different, but we can discuss options for cross-posting if you don't want to write two posts. |
Summary
What does this package do? (explain in 50 words or less):
Identify problematic records in large databases of biological and palaeontological collections, to improve data quality for analyses in biogeography, ecology and conservation.
Paste the full DESCRIPTION file inside a code block below:
URL for the package (the development repository, not a stylized html page):
https://github.com/azizka/CoordinateCleaner
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
Who is the target audience and what are scientific applications of this package?
Anybody using geographic coordinates from biological collections on a large scale, thus mostly researchers in biogeography, (maco-)ecology, evolutionary biology and conservation practitioners
Are there other R packages that accomplish the same thing? If so, how does
yours differ or meet our criteria for best-in-category?
Yes, scrubr, see this pre-submission enquiry: Pre-submission inquiry: CoordinateCleaner #199
If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Pre-submission inquiry: CoordinateCleaner #199, @sckott
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.The manuscript is already submitted to MEE. It had to be submitted before April 1st.
Detail
Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
Exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
Nope
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
SaraVarela, sckott
The text was updated successfully, but these errors were encountered: