Skip to content

Commit

Permalink
40 historical shapefiles (#82)
Browse files Browse the repository at this point in the history
* link to raw RUIAN for admin data / resolves #77

* download links for zip codes & KFME grid / relates #77

* pkgdown - prepare for future deployment

* cleaner silnice

* sidestep the pesky mbcsToSbcs failure

* that pesky unicode!

* a more colorful plot of kraje

* align ozymandias and dockerfile

* + přfuk confirmation

* internals of the world, unite!

* helper functions created

* update docs for new internals

* historie in principle working / relates #40

* resolve CI fails on R-devel

* yet another shot at a clean workflow run

* unicode gets no respect

* shorter title for clarity

* update docs of history()

* historie in tests / relates #40

* resolve vignette warning

* date for release 1.12.0 set to 2023-10-29
  • Loading branch information
jlacko committed Oct 29, 2023
1 parent 836e1b3 commit a9a87c2
Show file tree
Hide file tree
Showing 47 changed files with 524 additions and 114 deletions.
3 changes: 3 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,6 @@ vignette\.Rmd\.orig
^LICENSE\.md$
^CONTRIBUTING\.md$
^CITATION\.cff$
^_pkgdown\.yml$
^docs$
^pkgdown$
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ inst/doc
/data-raw/eu_dem*
/joss-paper/*.pdf
*.log
docs
12 changes: 7 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
Package: RCzechia
Type: Package
Title: Spatial Objects of the Czech Republic
Version: 1.11.2.999999999
Date: 2023-XX-XX
Version: 1.12.0
Date: 2023-10-29
Authors@R: c(
person("Jindra", "Lacko", , "jindra.lacko@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0002-0375-5156")),
comment = c(ORCID = "0000-0002-0375-5156")),
person("Nick", "Bearman", role = "rev",
comment = "Nick reviewed the package for JOSS, providing helpful comments leading to significant improvement of the package."))
comment = c(ORCID = "0000-0002-8396-4061",
"Nick reviewed the package for JOSS, providing helpful comments leading to significant improvement of the package."))
)
Maintainer: Jindra Lacko <jindra.lacko@gmail.com>
Description: Administrative regions and other spatial objects of the Czech Republic.
URL: https://github.com/jlacko/RCzechia
URL: https://rczechia.jla-data.net
BugReports: https://github.com/jlacko/RCzechia/issues
License: MIT + file LICENSE
Encoding: UTF-8
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export(casti)
export(chr_uzemi)
export(geocode)
export(geomorfo)
export(historie)
export(kraje)
export(lesy)
export(obce_body)
Expand Down
10 changes: 8 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
## version 1.11.2 (2023-XX-XX)
## version 1.12.0 (2023-10-29)

- [!] introduced an option for persistent local caching via setting the `RCZECHIA_HOME` environment variable; this has to be set manually - either directly or via a set_home() function call

- added history() function providing historical admin areas, together with census data

- updated documentation for the geomorfo function
- updated documentation for the geomorfo() function

- introduced pkgdown documentation on https://rczechia.jla-data.net

## version 1.11.1 (2023-03-04)

Expand Down
4 changes: 3 additions & 1 deletion R/KFME_grid.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
#'
#' Function returning grid covering the Czech Republic according to the Kartierung der Flora Mitteleuropas methodology.
#'
#' The function returns a {sf} data frame of grid cells. Depending on the value of parameter `resolution` either low resolution (26×42 cells - default) with labels in 4 digit format (e.g. Hrčava = 6479) or high resolution (104×168 cells) with labels in 4 digit + 1 letter format (e.g Hrčava = 6479c).
#' The function returns a `sf` data frame of grid cells. Depending on the value of parameter `resolution` either low resolution (26×42 cells - default) with labels in 4 digit format (e.g. Hrčava = 6479) or high resolution (104×168 cells) with labels in 4 digit + 1 letter format (e.g Hrčava = 6479c).
#'
#' Raw version of the dataset is available for download for use in non-R setting on <https://rczechia.jla-data.net/kfme_czechia.gpkg>.
#'
#' @param resolution Should the function return high or low resolution shapefile? Allowed values are "low" and "high". Default is "low".
#'
Expand Down
4 changes: 3 additions & 1 deletion R/casti.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
#' City Parts
#' City Districts
#'
#' Function taking no parameters and returning data frame of districts of Prague and other major cities as `sf` polygons.
#'
#' Due to package size constraints the data are stored externally (and a working internet connection is required to use the package).
#'
#' The dataset is based on RUIAN data by the Czech cadastral office. If necessary you can download the most up to date raw dataset in VFR format (a special case of XML which is understood by GDAL) on <https://vdp.cuzk.cz/vdp/ruian/vymennyformat> (in Czech only).
#'
#' The data is current to June 2021. Downloaded size is 1.5 MB.
#'
#'
Expand Down
40 changes: 0 additions & 40 deletions R/downloader.R

This file was deleted.

11 changes: 11 additions & 0 deletions R/geomorfo.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,17 @@
#'
#' @source CENIA / INSPIRE, via Mgr. Vojtěch Blažek, Ph.D. <https://www.arcgis.com/home/item.html?id=25813686a8564b0bbcdc951a5573cfa4>
#'
#' @examples
#'
#' \donttest{
#' library(sf)
#'
#' soustavy <- RCzechia::geomorfo("subprovincie")
#'
#' plot(soustavy["kod"])
#'
#' }
#'
#' @export

geomorfo <- function(level) {
Expand Down
36 changes: 36 additions & 0 deletions R/helpers.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#' Set the local cache directory
#'
#' The function sets the environment variable RCZECHIA_HOME to be used as a local cache for RCzechia remote files; if unset tempdir() is used instead, with persistence for current session only.
#'
#' Note that when set (it is unset by default) the remote files will be cached to local file system and persist between R sessions, for good or bad.
#'
#' Also note that you can set the value of RCZECHIA_HOME environment variable directly, either via a \code{Sys.setenv()} call or via your \code{.Renviron} file.
#'
#' @param path file to be downloaded (or not...) from S3
#'
#' @return TRUE for success and FALSE for failure; returned silently

set_home <- function(path) {

if(file.access(path, mode = 2) == 0) {
Sys.setenv("RCZECHIA_HOME" = path)
invisible(TRUE)
} else {
warning("'path' not found or not writeable; default will be used instead")
invisible(FALSE)
}

}

#' Unset the local cache directory
#'
#' The function unsets the environment variable RCZECHIA_HOME, meaning tempdir() will be used in future function calls, and no persistent data will be stored locally.
#'
#' @return TRUE for success and FALSE for failure; returned silently

unset_home <- function() {

Sys.unsetenv("RCZECHIA_HOME")
invisible(TRUE)

}
67 changes: 67 additions & 0 deletions R/historie.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#' Historical censuses of the Czech Republic
#'
#' Function returning historical admin areas of the Czech Republic, together with relevant census data as specified by parameter **era**.
#'
#' The census data structure is too complex to fully list here; most of the fields are self documenting (for Czech speakers) - and when in doubt please consult the original metadata at <https://cuni.maps.arcgis.com/home/item.html?id=c2f19cd1146747a9a8daf5b900e7747b>, or the original journal article at <https://doi.org/10.14712/23361980.2015.93>.
#'
#' Of notable interest is the 1930 census, which was the last before WWII - and thus the last one to include Czechoslovak citizens of German ethnicity.
#'
#' Due to package size constraints the data are stored externally (and a working internet connection is required to use the package).
#'
#'
#' @param era a historical era of interest.
#'
#' @return `sf` data frame with historical admin area names & census data + geometry; namely:
#'
#' \describe{
#' \item{okresy_1921}{soudní okresy + census 1921; 328 rows / 92 columns + geometry}
#' \item{okresy_1930}{soudní okresy + census 1931; 330 rows / 90 columns + geometry}
#' \item{okresy_1947}{politické okresy + census 1947; 162 rows / 16 columns + geometry}
#' \item{okresy_1950}{správní okresy + census 1950; 193 rows / 57 columns + geometry}
#' \item{okresy_1961}{správní okresy + census 1960; 76 rows / 105 columns + geometry}
#' \item{okresy_1970}{správní okresy + census 1970; 76 rows / 144 columns + geometry}
#' \item{okresy_1980}{správní okresy + census 1980; 76 rows / 148 columns + geometry}
#' \item{okresy_1991}{správní okresy + census 1991; 76 rows / 155 columns + geometry}
#' \item{okresy_2001}{správní okresy + census 2001; 77 rows / 174 columns + geometry}
#' \item{okresy_2011}{správní okresy + census 2011; 77 rows / 176 columns + geometry}
#' \item{kraje_1950}{kraje + census 1950; 13 rows / 55 columns + geometry}
#' \item{kraje_1961}{kraje + census 1960; 8 rows / 103 columns + geometry}
#' \item{kraje_1970}{kraje + census 1970; 8 rows / 144 columns + geometry}
#' \item{kraje_1980}{kraje + census 1980; 8 rows / 146 columns + geometry}
#' \item{kraje_1991}{kraje + census 1991; 8 rows / 153 columns + geometry}
#' \item{kraje_2001}{kraje + census 2001; 14 rows / 172 columns + geometry}
#' \item{kraje_2011}{kraje + census 2011; 14 rows / 174 columns + geometry}
#' }
#'
#' Credits:
#' 1) „Tento výstup vznikl v rámci řešení projektu číslo DF12P01OVV033 Zpřístupnění historických prostorových a statistických dat v prostředí GIS řešeného v rámci programu Aplikovaného výzkumu a vývoje národní a kulturní identity (NAKI), jehož poskytovatel je Ministerstvo kultury České republiky.“
#' 2) „JÍCHOVÁ, J., SOUKUP, M., NEMEŠKAL, J., OUŘEDNÍČEK, M., POSPÍŠILOVÁ, L., SVOBODA, P., ŠPAČKOVÁ, P. a kol. (2014): Geodatabáze historických statistických a prostorových dat Česka ze Sčítání lidu, domů a bytů 1921−2011. Urbánní a regionální laboratoř, Přírodovědecká fakulta Univerzity Karlovy v Praze, Praha.“
#'
#' @source Urbánní a regionální laboratoř (UrRlab) působící na katedře sociální geografie a regionálního rozvoje Přírodovědecké fakulty Univerzity Karlovy v Praze <https://www.historickygis.cz/>
#'
#' @examples
#'
#' \donttest{
#' library(sf)
#'
#' pre_war <- RCzechia::historie("okresy_1930")
#'
#' plot(pre_war[, 47], main = "Residents of German ethnicity")
#'
#' }
#'
#' @export

historie <- function(era) {

if(missing(era)) {
stop("historical era is an obligatory parameter!")
}

if (!is.element(era, c("okresy_1921", "okresy_1930", "okresy_1947", "okresy_1950", "okresy_1961", "okresy_1970", "okresy_1980", "okresy_1991", "okresy_2001", "okresy_2011", "kraje_1950", "kraje_1961", "kraje_1970", "kraje_1980", "kraje_1991", "kraje_2001", "kraje_2011"))) {
stop(paste(era, "is not a valid historical era!"))
}

result <- .downloader(paste0("history_", era, ".rds"))
result
}
95 changes: 95 additions & 0 deletions R/internals.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#' Internal function - generic downloader, used to serve the rds files from S3
#'
#' The function utilizes environment variable RCZECHIA_MIRROR as a mirror location of remote files; to configure an alternative (possibly local) repository use `Sys.setenv("RCZECHIA_MIRROR" = "file:///someplace/local")`
#'
#' @param file file to be downloaded (or not...) from S3
#' @keywords internal

.downloader <- function(file) {
network <- as.logical(Sys.getenv("NETWORK_UP", unset = TRUE)) # dummy variable to allow testing of network
remote_path <- Sys.getenv("RCZECHIA_MIRROR", unset = "https://rczechia.jla-data.net/") # remote archive
local_dir <- Sys.getenv("RCZECHIA_HOME", unset = tempdir()) # local cache directory - or tempdir if unset

remote_file <- paste0(remote_path, file) # path to AWS S3
local_file <- file.path(local_dir, file) # local file - in tempdir, or local cache if set

if (file.exists(local_file) & network) {
message(paste("RCzechia: using dataset stored locally in", local_dir))
} else {
if (!.ok_to_proceed(remote_file) | !network) { # network is down
# message("No internet connection.")
return(NULL)
}

# proceed to download via curl
message("RCzechia: downloading remote dataset.")
curl::curl_download(url = remote_file,
destfile = local_file,
quiet = F)
} # /if - local file exists

# everything except rasters
if(tools::file_ext(local_file) == "rds") local_df <- readRDS(local_file)

# rasters, and rasters only
if(tools::file_ext(local_file) == "tif") local_df <- terra::rast(local_file)

# serve the result back
local_df

} # /function

#' Internal function - tests availability of internet resources
#'
#' @param remote_file resource to be tested
#' @keywords internal

.ok_to_proceed <- function(remote_file) {

# local files are OK to proceed by definiton
if (grepl("file:///", remote_file)) return(TRUE)

# remote files require testing
try_head <- function(x, ...) {
tryCatch(
httr::HEAD(url = x, httr::timeout(10), ...),
error = function(e) conditionMessage(e),
warning = function(w) conditionMessage(w)
)
}

is_response <- function(x) {
class(x) == "response"
}

network <- as.logical(Sys.getenv("NETWORK_UP", unset = TRUE)) # dummy variable to allow testing of network

# First check internet connection
if (!curl::has_internet() | !network) {
message("No internet connection.")
return(FALSE)
}
# Then try for timeout problems
resp <- try_head(remote_file)
if (!is_response(resp)) {
message("Timeout reached; external data source likely broken.")
return(FALSE)
}
# Then stop if status > 400
if (httr::http_error(resp)) {
message("Data source broken.")
return(FALSE)
}

# safe to proceed
TRUE
}

# check the environment variable & report back

.onAttach <- function(libname, pkgname) {

home <- Sys.getenv("RCZECHIA_HOME")

if(home != "") packageStartupMessage("Using local RCzechia cache at ", home, appendLF = TRUE)
}
17 changes: 15 additions & 2 deletions R/kraje.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#'
#' Due to package size constraints the data are stored externally (and a working internet connection is required to use the package).
#'
#' The dataset is based on RUIAN data by the Czech cadastral office. If necessary you can download the most up to date raw dataset in VFR format (a special case of XML which is understood by GDAL) on <https://vdp.cuzk.cz/vdp/ruian/vymennyformat> (in Czech only).
#'
#' The data is current to June 2021 (i.e changes introduced by act 51/2020 Sb. are reflected). Downloaded size of high resolution shapefile is <1 MB.
#'
#' @param resolution Should the function return high or low resolution shapefile? Allowed values are "high" (default) and "low". This parameter affects only the geometry column, all other fields remain the same.
Expand All @@ -21,8 +23,19 @@
#' @examples
#' library(sf)
#'
#' hranice <- kraje("low")
#' plot(hranice, col = "white", max.plot = 1)
#' colors <- rainbow(14) # legend colors
#'
#' hranice <- RCzechia::kraje("low")
#'
#' plot(hranice["KOD_CZNUTS3"],
#' col = colors,
#' main = "Czech Regions",
#' xlim = st_bbox(hranice)[c(1, 3)] * c(1, 1.1))
#'
#' legend("right",
#' hranice$KOD_CZNUTS3,
#' fill = colors,
#' bty = "n")
#'
#' @export

Expand Down
2 changes: 2 additions & 0 deletions R/obce_body.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#'
#' Due to package size constraints the data are stored externally (and a working internet connection is required to use the package).
#'
#' The dataset is based on RUIAN data by the Czech cadastral office. If necessary you can download the most up to date raw dataset in VFR format (a special case of XML which is understood by GDAL) on <https://vdp.cuzk.cz/vdp/ruian/vymennyformat> (in Czech only).
#'
#' The data is current to June 2021 (i.e changes introduced by act 51/2020 Sb. are reflected). Downloaded size is <1 MB.
#'
#' @return `sf` data frame with 6.258 rows of 14 variables + geometry
Expand Down
2 changes: 2 additions & 0 deletions R/obce_polygony.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#'
#' Due to package size constraints the data are stored externally (and a working internet connection is required to use the package).
#'
#' The dataset is based on RUIAN data by the Czech cadastral office. If necessary you can download the most up to date raw dataset in VFR format (a special case of XML which is understood by GDAL) on <https://vdp.cuzk.cz/vdp/ruian/vymennyformat> (in Czech only).
#'
#' The data is current to June 2021 (i.e changes introduced by act 51/2020 Sb. are reflected). Downloaded size is 13.3 MB (so use with caution, and patience).
#'
#' @return `sf` data frame with 6.258 rows of 14 variables + geometry
Expand Down

0 comments on commit a9a87c2

Please sign in to comment.