-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
antanym #198
Comments
👋 @raymondben Sorry for the slow response on our end. We will respond with next steps shortly. |
Editor checks:
Editor commentsThanks for your submission @raymondben !!
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 95% of code lines are covered by test cases.
R/load.R:71:NA
R/load.R:72:NA
R/load.R:73:NA
R/load.R:74:NA
R/load.R:97:NA
... and 6 more lines
✖ avoid long code lines, it is bad for readability. Also,
many people prefer editor windows that are about 80 characters
wide. Try make your lines shorter than 80 characters
R/load.R:59:1
R/load.R:64:1
R/load.R:67:1
R/load.R:71:1
R/load.R:79:1
... and 71 more lines Seeking reviewers now 🕐 Reviewers: @johnbaums @lbusett |
Reviewers: @johnbaums @lbusett |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 5 Review CommentsThe The package is generally already in good shape. Documentation is good (though the Comments on Functionsan_read()
would it be possible to have the user download a "zipped" version of the dataset?
an_filter()
an_preferred()
minor Also, I'd suggest lengthening a bit the Description in the documentation to detail how an_countries(), an_feature_types(), an_cga_sources(), an_gazetters()minor The documentation for these functions "points" to the documentation of an_url()minor I'd suggest changing the function name to an_suggest()I really like this function in association with
an_thin()
Comments on Vignettes
Comments on Examples
Comments on Inline documentationInline documentation is good. I'd just suggest however to introduce carriage returns Other CommentsPersonally, I think that having at least all "major" functions of a package in separate .R That's all. Thanks for sharing this interesting package (and thanks to rOpensci for |
thanks for your review @lbusett ! @johnbaums - can you get your review in soon? |
Thanks @lbusett - that's very helpful! |
@johnbaums - can you get your review in soon? |
@sckott @raymondben apologies for the slow turnaround and missed notifications. Aiming to submit tonight, tomorrow at the latest. |
The download url for the gaz data is currently dead. Any backup url I can use to grab the data, @raymondben? Just had a couple more things I wanted to look at. Not a big deal if not.
|
Hi @johnbaums - sorry about that, AAD had a complete IT outage last night and looks like our geoserver is a lingering casualty. Have asked for it to be checked, will let you know when it's back on its feet. |
@johnbaums - IT'S ALIIIIIVE! |
Package ReviewPlease check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
DocumentationThe package includes all the following forms of documentation:
Functionality
Final approval (post-review)
Estimated hours spent reviewing: 6 Review CommentsThanks for the opportunity to review I appreciate the effort the authors have made in ensuring the code is clear, concise, readable, and well documented. The package is well structured, with individual functions that perform clear, separate tasks, and internal helpers that prevent clutter and repetition. Data caching is a great convenience - the authors raise an important point about the relevance of this for Antarctic field workers. The package is very sound from a technical perspective, and I could not find any glaring omissions regarding functionality. I provide a few minor comments below, primarily relating to the docs. README.Rmd
Antanym.Rmd
DESCRIPTION
utils.R
an_preferred()
an_thin()
an_suggest()
## scale >= 10e6: we have full coverage (nearly so for 12mill) of all
## scar_common_ids, so use per-feature predictions for scale < 10e6: use
## predictions by feature properties (except maybe if area of interest lies
## within a catalogued map) stations as special case? an_filter()
an_read()
General issues
Vignette Thanks again for the opportunity to review what I think is a valuable and neatly coded contribution. I'm looking forward to seeing it on CRAN! |
thanks for your review @johnbaums ! @raymondben continue discussion here ... |
Thanks @johnbaums, thanks again @lbusett. Will get on to revisions in due course. |
any progress on changes @raymondben ? |
Ah, sorry, this had slipped off our attention radar. Will get back to it soon. |
Thanks for your patience, and thanks again for your reviews. While we think it's reasonable to say that the package was in pretty good shape on submission, it is without question better now - your input has definitely improved it. Short notes against each of your review points are below. Please let us know if further details are needed on any of these. The revised code is currently sitting in the ropenscirev branch of SCAR/antanym. Lorenzo
Fixed.
The caching interface has been simplified: calling
The geoserver installation from which the data are delivered doesn't support compressed CSVs, but all traffic goes through an additional reverse proxy that supports compression of the http traffic. As far as we can tell (looking at the returned server headers and the request size using wireshark) this is indeed being applied to our requests. We suspect that the erratic download times that you are seeing are due to the server being located at the bottom end of the world in Tasmania, which is not ideal for northern hemisphere users. We'll ask the group that hosts the database to look at options, perhaps it might be possible for the data to be mirrored elsewhere (e.g. a SCAR institution in Europe) - but that's beyond the remit of the R package itself.
Added.
Done. In fact the extent can now be passed as an sp object, its bbox, a raster object, an Extent, or a numeric vector.
We've changed the query behaviour:
And while we don't want to attempt to provide a regexp tutorial, some regexp examples have also been added.
We agree that it's useful, but also on reflection it's probably also good to be able to turn this off and search for the full phrase. Rather than force the user to delve into regular expressions to achieve this,
Done, see above.
The original behaviour (that you reviewed) was to choose unmatched names by row order (i.e. pick the first row with the feature in question), and yes that was a bit opaque! There are now two options, controlled by the new
Use a filter after the preferencing:
They are basically equivalent, except that entries from the GEBCO undersea feature gazetteer do not have an associated country_name. Also, the cga_source_gazetteer entries are ISO 3-letter country codes rather than full names as in country_name. To simplify things for the user, we have added a new "origin" column to the data, which is either the full country name, or the organisation name for non-national (i.e. GEBCO) names. This will also help us in the future when additional gazetteers (e.g. subantarctic ones) are added, because those won't have cga_source_gazetteer values (they are not part of the CGA). We can add an appropriate "origin" value for those rows. The "country_name" and "cga_source_gazetteer" columns are also now NOT included in the data by default (but the user can get them with
Added.
Done.
Description expanded: "Features are given a suitability score based on maps prepared by expert cartographers. Data were tabulated from a collection of such maps, indicating for each feature whether it was named on a given map, along with details (such as scale) of the map. These data are used as the basis of a recommendation algorithm, which suggests the best features to name on a map given its properties (extent and scale). This is an experimental function and currently only implemented for map_scale values of 10 million or larger."
Ah yes, the function doc wasn't correct there. Fixed.
Updated as you suggest.
Done.
Huh, interesting. We can't reproduce a memory leak/crash (win32/win64/linux64). We just get an error with "cannot allocate memory" message. Could you please send the output of Nevertheless, to avoid the memory allocation issue (or crash) we've added a
Actually the thinning will work even if the
Scores given by
The README has been cut down to give the intro/context, a few examples, and highlight the particular functionality of antanym (suggesting names, resolving multiple names). The vignette has been expanded.
The vignette has now been broken into a number of sections, generally giving increasingly-complex examples in each.
Done.
Done.
The vignette now mentions the
All examples now use
Added.
Style is to some extent personal preference. Personally (BR) I'm not a particular fan of enforced maximum line lengths in general. I find that typically it's wasteful of screen real estate, and a decent text editor can be configured to soft wrap if that's what the user wants. The long lines have been largely left as is.
Mostly done. John
Done.
Done. And did a recursive grep to make sure there weren't others.
Actually we've just removed "SCAR" here, it's probably not helpful to have that level of detail in the Description field. In fact "SCAR" has mostly been removed throughout: it's enough to refer to the "Composite Gazetteer of Antarctica" without prepending it by "SCAR" each time.
Got it. And another one elsewhere too!
See response to the same question above. Basically, now random by default.
Changed "in_coi" to "preferred_gaz_rows" and "out_coi" to "not_preferred_gaz_rows"
Yep, done, and similarly in
Done. And yes, a score_weighting of 0 achieves the same as score_col=NULL (or all scores equal). Tests added to confirm this.
We tried great-circle distances here (using geosphere's implementation) but we are calculating N^2 distances and great-circle becomes prohibitively slow, even for modestly-sized data frames. You are right about the downside, it will tend to over-select names at high latitudes. However, the bias doesn't seem too bad (perhaps because there are many more names at lower latitudes (coastal Antarctica) than high latitudes?). But we'll look at improvements as time permits. The new hypertidy/geodist package might offer a way forward. (Minor update: a bit of experimentation suggests that hypertidy/geodist is fast enough to use here. But we'll wait until it's had some time to stabilize before incorporating it - it's still WIP).
Done.
Comment broken up into the two if-else parts, and wording clarified.
The
The
Done. Well actually done slightly differently, because
Wording changed (note also that the caching parameter has changed, see above, but your suggestion still applies when the user provides a specific caching directory - e.g.
Whoops, SCAR's site got updated. Fixed.
Now superseded by
Done - though the first argument (usually the gazetteer object) hasn't been named if it's also the first argument in the function parameter list, since this seems to be idiomatic R usage.
All functions now have their own man page. Vignette expanded, as noted above, including a description of the columns in the gaz data (these are also documented in the |
@lbusett @johnbaums are you happy with revisions/comments from the maintainer? feel free to make any suggestions/comments here if you have anything further. if nothing, that's okay too, either way let me know |
(Comments particularly welcome on the revised |
Hi all, I am a bit busy at the moment, but should be able to have a look at the implemented changes by the beginning of next week. |
sorry for being late on this, but I had very limited time in the last two weeks. I had a look at the changes: great work! All points raised in my review were nicely addressed. I particularly like the changes introduced in the Lorenzo |
thanks @lbusett ! @johnbaums are you happy with changes/responses? |
Thanks for looping me in on the changes, and sorry for my abysmally slow handling of GitHub notifications. I'm very satisfied - great package! |
Approved! Thanks again for your submission @raymondben
|
Super, thanks @lbusett and @johnbaums once again for your time and attention to detail. |
@sckott repo transferred, will finish changing links and so on once I'm admin |
okay, try again |
All done, I think? |
See comment above about a blog post. thoughts on that? |
Yup, is a good idea. We're thinking of a more comprehensive post (this package plus others relevant to Antarctic stuff), but will be aiming to do something, anyway. We'll chat with @stefaniebutland in due course. |
Okay, thanks! |
Summary
Provides access to Antarctic geographic place name information, and tools for working with those names.
https://github.com/SCAR/antanym
Data retrieval (it retrieves the SCAR Composite Gazetteer data from its host server), and geospatial data (geographic place names are obviously geospatial in nature).
Antarctic researchers, particularly those wanting to produce maps or figures of Antarctic regions. Such figures often need spatial features to be labelled (ice shelves, glaciers, stations, etc). The gazetteer data can also be useful in a quantitative sense, for example for calculating distance to the nearest station (which could perhaps be used as a proxy for human disturbance in an ecological modelling scenario).
yours differ or meet our criteria for best-in-category?
The
geonames
package provides access to the global geonames.org database of place names, via their API. The geonames.org database includes place name information from the SCAR Composite Gazetteer of Antarctica (CGA). However, there seems to be a good case for a separate package that deals specifically with Antarctic place names:the SCAR CGA is the authoritative source for Antarctic place name information, and we (SCAR) are the custodians of it. geonames.org has ingested CGA information into their global database and is thus a secondary source of this information. At the time of writing the geonames.org copy of the CGA is out of date (e.g. it does not contain Lassesen Island, Ginger Reef, or Pavlova Island, all added in 2017).
antanym provides more information about features than does geonames, including the narrative (description of how a feature came to be given its name and who or what it was named after), the date on which it was named, information about the source of the name, and more.
geonames requires a login because it consumes the geonames.org API; antanym does not.
much of the geonames package functionality is irrelevant in an Antarctic context (e.g.
GNcities()
,GNcountry*()
, andGNfindNearbyPostalCodes()
- there are no cities, countries, or post codes in Antarctica).conversely, antanym has functionality specific to the CGA that geonames does not. In particular, the CGA is a composite gazetteer, which means that a single feature can have multiple names (given by different naming authorities from different countries). Antanym has functions to help resolve these, and to discover all of the names associated with a feature. Using the geonames package, it does not appear to be directly possible to find all of the names associated with a given feature. (As an aside, the geonames.org database must hold this multiple-names information in some form, because using the geonames.org
get
API method (http://api.geonames.org/get?geonameId=6627943&username=demo) shows the alternate names for Booth Island. However, unless I've missed it, the geonames package doesn't wrap theget
API method.)Antanym also has functions for suggesting names to add to a map and thinning a list of names to get a more visually pleasing spatial coverage for plotting. Because of the differences in functionality, we have not made any particular attempt to align antanym's interface with that of the geonames package. Antanym's interface is instead loosely modelled on a dplyr-style approach to filtering and subsetting.
antanym deals only with the CGA, so it's small enough to cache locally and use offline, whereas geonames is a set of wrappers around the geonames.org API and can't be used offline (though you could
memoise
particular calls and cache those, but then you couldn't issue new queries while offline). This might sound like a trivial issue but is a genuine consideration for Antarctic workers with limited internet access in the field, on station, or at sea.as time permits, we will add other Antarctic-related gazetteers to antanym, which are not available through geonames. These include subantarctic gazetteers and informal gazetteers managed by SCAR and the Australian Antarctic Data Centre.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:Clean bill of health!
Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Barry Rowlingson (github barryrowlingson) - author of the geonames package and all around spatial wizard.
The text was updated successfully, but these errors were encountered: