Skip to content

Commit

Permalink
Merge pull request #283 from rOpenGov/v4
Browse files Browse the repository at this point in the history
eurostat 4.0.0
  • Loading branch information
antagomir committed Dec 19, 2023
2 parents dc987e7 + 31869b7 commit 6425c8b
Show file tree
Hide file tree
Showing 280 changed files with 6,597 additions and 21,002 deletions.
1 change: 0 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ docs
^\.Rproj\.user$
^packrat/
^\.Rprofile$
^NEWS.md$
^cran-comments\.md$
^data-raw$
^revdep$
Expand Down
12 changes: 12 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# To get started with Dependabot version updates, you'll need to specify which
# package ecosystems to update and where the package manifests are located.
# Please see the documentation for all configuration options:
# https://docs.github.com/github/administering-a-repository/configuration-options-for-dependency-updates

version: 2
updates:
- package-ecosystem: "github-actions" # See documentation for possible values
directory: "/" # Location of package manifests
schedule:
interval: "weekly"

10 changes: 3 additions & 7 deletions .github/workflows/rogtemplate-gh-pages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@v2

Expand All @@ -34,11 +34,7 @@ jobs:
local::.
any::pkgdown
ropengov/rogtemplate
any::magick
any::tmap
any::Cairo
any::rcmdcheck
any::XML

- name: Build logo if not present and prepare template
run: |
Expand All @@ -57,6 +53,6 @@ jobs:

- name: Deploy package
run: |
git config --local user.name "$GITHUB_ACTOR"
git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
Rscript -e 'pkgdown::deploy_to_branch(new_process = FALSE)'
2 changes: 1 addition & 1 deletion .github/workflows/test-coverage.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-r@v2
with:
Expand Down
35 changes: 17 additions & 18 deletions .github/workflows/tidy_code.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,24 @@ jobs:
env:
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- uses: r-lib/actions/setup-pandoc@master
- uses: r-lib/actions/setup-pandoc@v2

- uses: r-lib/actions/setup-r@master
- uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true

- uses: r-lib/actions/setup-r-dependencies@master
- uses: r-lib/actions/setup-r-dependencies@v2
with:
extra-packages: |
usethis
styler
urlchecker
devtools
roxygen2
local::.
any::usethis
any::styler
any::urlchecker
any::devtools
any::roxygen2
ropengov/rogtemplate
needs: website
- name: Tidy code
run: |
Expand All @@ -48,17 +49,15 @@ jobs:

- name: Commit results
run: |
git config --local user.name "$GITHUB_ACTOR"
git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
git config --local user.name "github-actions[bot]"
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
git add -A
git commit -m 'Tidy code' || echo "No changes to commit"
git push origin || echo "No changes to commit"
- name: Trigger pkgdown workflow
if: success()
uses: peter-evans/repository-dispatch@v1
with:
token: ${{ secrets.REPO_GHA_PAT }}
repository: ${{ github.repository }}
event-type: trigger-pkgdown-workflow
client-payload: '{"ref": "${{ github.ref }}", "sha": "${{ github.sha }}"}'
run: |
git config --local user.name "github-actions[bot]"
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
Rscript -e 'pkgdown::deploy_to_branch(new_process = FALSE)'
24 changes: 13 additions & 11 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Type: Package
Package: eurostat
Title: Tools for Eurostat Open Data
Version: 3.8.3
Date: 2023-03-07
Version: 4.0.0
Date: 2023-12-19
Authors@R: c(
person("Leo", "Lahti", , "leo.lahti@iki.fi", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5537-637X")),
Expand Down Expand Up @@ -31,36 +31,38 @@ URL: https://ropengov.github.io/eurostat/,
https://github.com/rOpenGov/eurostat
BugReports: https://github.com/rOpenGov/eurostat/issues
Depends:
methods,
R (>= 3.5.0)
R (>= 3.6.0)
Imports:
broom,
classInt,
countrycode,
curl,
digest,
dplyr,
httr,
httr2 (>= 0.2.3),
ISOweek,
jsonlite,
lubridate,
rappdirs,
readr,
RefManageR,
regions,
rlang,
stringi,
stringr,
tibble,
tidyr (>= 1.0.0),
ISOweek
xml2,
data.table (>= 1.14.8)
Suggests:
RColorBrewer,
giscoR,
knitr,
rmarkdown,
sf,
sp,
testthat (>= 3.0.0),
remotes
testthat (>= 3.0.0)
VignetteBuilder:
knitr
Config/Needs/website: ggplot2, tmap, styler, sessioninfo,
ropengov/rogtemplate, ragg
Config/testthat/edition: 3
Config/testthat/parallel: false
Encoding: UTF-8
Expand Down
51 changes: 37 additions & 14 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,22 @@ export(clean_eurostat_cache)
export(cut_to_classes)
export(dic_order)
export(eurotime2date)
export(eurotime2date2)
export(eurotime2num)
export(eurotime2num2)
export(get_bibentry)
export(get_eurostat)
export(get_eurostat_dic)
export(get_eurostat_folder)
export(get_eurostat_geospatial)
export(get_eurostat_interactive)
export(get_eurostat_json)
export(get_eurostat_toc)
export(grepEurostatTOC)
export(harmonize_country_code)
export(harmonize_geo_code)
export(label_eurostat)
export(label_eurostat2)
export(label_eurostat_tables)
export(label_eurostat_vars)
export(list_eurostat_cache_items)
export(recode_nuts)
export(recode_to_nuts_2013)
export(recode_to_nuts_2016)
Expand All @@ -32,42 +32,65 @@ export(validate_nuts_regions)
importFrom(ISOweek,ISOweek2date)
importFrom(RefManageR,BibEntry)
importFrom(RefManageR,toBiblatex)
importFrom(broom,tidy)
importFrom(classInt,classIntervals)
importFrom(countrycode,countrycode)
importFrom(curl,curl_download)
importFrom(data.table,":=")
importFrom(data.table,.SD)
importFrom(data.table,fread)
importFrom(data.table,melt)
importFrom(data.table,setDT)
importFrom(digest,digest)
importFrom(dplyr,"%>%")
importFrom(dplyr,case_when)
importFrom(dplyr,coalesce)
importFrom(dplyr,filter)
importFrom(dplyr,left_join)
importFrom(dplyr,inner_join)
importFrom(dplyr,mutate)
importFrom(httr,RETRY)
importFrom(httr,build_url)
importFrom(httr,content)
importFrom(httr,http_error)
importFrom(httr,parse_url)
importFrom(httr,status_code)
importFrom(httr2,"%>%")
importFrom(httr2,req_error)
importFrom(httr2,req_perform)
importFrom(httr2,req_proxy)
importFrom(httr2,req_retry)
importFrom(httr2,req_user_agent)
importFrom(httr2,request)
importFrom(httr2,resp_body_json)
importFrom(httr2,resp_content_type)
importFrom(httr2,resp_is_error)
importFrom(httr2,url_build)
importFrom(httr2,url_parse)
importFrom(jsonlite,fromJSON)
importFrom(jsonlite,toJSON)
importFrom(lubridate,day)
importFrom(lubridate,dmy)
importFrom(lubridate,month)
importFrom(lubridate,year)
importFrom(lubridate,ymd)
importFrom(methods,as)
importFrom(readr,col_character)
importFrom(readr,cols)
importFrom(readr,read_tsv)
importFrom(regions,recode_nuts)
importFrom(regions,validate_geo_code)
importFrom(regions,validate_nuts_regions)
importFrom(rlang,"!!")
importFrom(rlang,sym)
importFrom(stats,na.omit)
importFrom(stringi,stri_extract_first_regex)
importFrom(stringi,stri_replace_all_fixed)
importFrom(stringi,stri_replace_all_regex)
importFrom(stringr,str_extract)
importFrom(stringr,str_glue)
importFrom(stringr,str_replace_all)
importFrom(tibble,as_tibble)
importFrom(tibble,is_tibble)
importFrom(tidyr,gather)
importFrom(tidyr,pivot_longer)
importFrom(tidyr,separate)
importFrom(utils,data)
importFrom(utils,capture.output)
importFrom(utils,download.file)
importFrom(utils,hasName)
importFrom(utils,menu)
importFrom(utils,person)
importFrom(utils,toBibtex)
importFrom(xml2,read_xml)
importFrom(xml2,xml_find_all)
importFrom(xml2,xml_text)
58 changes: 57 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,59 @@
# eurostat 4.0.0

## Major updates

* Add data.table to package Imports and make using data.table functions optional with `get_eurostat()` `use.data.table` argument. This is especially useful with big datasets that would otherwise take a long time to go through the different data cleaning functions or crash R with their large memory footprint. (issue #277, PR #278)
* switch from `httr` package to `httr2` (issue #273, PR #276)
* Rewritten caching functionalities, making it possible to cache filtered queries and rely on local caches if the user attempt to filter a complete dataset that has already been cached. A list of queries and cached item hashes is stored in a cache_list.json file in cache folder. This can be viewed with a new function: `list_eurostat_cache_items()`. (Affects issues mentioned in #144, #257, #258, fixed in PR #267)
* Column names in `.eurostatTOC` object (returned by `get_eurostat_toc()`) now use dots instead of spaces in the style of `base::make.names()`, e.g. turning `last update of data` to `last.update.of.data` (PR #271)
* `.eurostatTOC` object includes a new hierarchy column that represents the position of each folder, dataset and table in the folder structure.
* `search_eurostat()` includes the option to search Table of Content items by dataset codes in addition to titles. This makes it possible to make further queries from similar datasets (e.g. "nama_10_gdp", "nama_10r_2gdp", "nama_10r_3popgdp") that might have different titles.
* `label_eurostat_tables()` has been rewritten to use the new SDMX API instead of `table_dic.dic` file in Eurostat Bulk Download Listing (PR #271)
* Remove legacy code related to downloading data from old bulk download facilities and temporary functions added in package version 3.7.14.
* `get_eurostat_geospatial()` now leverages on `giscoR::gisco_get_nuts()` for
downloading geospatial data (PR #264, thanks to @dieghernan):
* `"spdf"` output class soft-deprecated, it would return a `sf` object with a message.
* `make_valid` parameter soft-deprecated.
* Added `...` to the function so additional parametes can be passed to `giscoR::gisco_get_nuts()`.
* Dataset `eurostat_geodata_60_2016` updated.
* `get_eurostat_geospatial()` now requires sf package to work at all (PR #280, thanks to @dieghernan)

## Minor updates

* Added suppressWarnings() to some of the tests that use TOC's directly or indirectly as the tests are not directly related to TOC files.
* Use more parameter inheritance in package function documentation to reduce discrepancies between different functions (DRY-principle) (PR #270)
* Documentation more explicitly explains how to use filter parameters in `get_eurostat()` and `get_eurostat_json()` functions. The documentation now warns users about potential problems caused by `time` / `TIME_PERIOD` parameters when used to query datasets that contain quarterly data (issue #260)
* As continuation of the update done in 3.7.14, started to use the new URL also for dictionary files in `get_eurostat_dic()` and `label_eurostat()` functions.
* `get_bibentry()` now outputs "Accessed YYYY-MM-DD" and "dataset last updated YYYY-MM-DD" in note field as otherwise it would be sporadically printed or not at all printed from `urldate` field.
* Print more informative API error messages. (issue #261, PR #262, thanks to @ake123)
* Removed `sp`, `methods` and `broom` packages from dependencies.
* Added `giscoR` to Suggests. (PR #264)

## New features

* Added new function: `get_eurostat_interactive()` for interactively searching and downloading data from Eurostat SDMX API. The function aims to make good data citation practices more prominently visible and also make it easier to explore what different arguments in `get_eurostat()` function do.
* There is also a new internal function `eurostat:::fixity_checksum()` to easily calculate a fixity checksum for datasets downloaded from Eurostat. The fixity checksum can, for example, be saved in research notes and reported in as part of data appendices. Printing the fixity checksum is encouraged by including an option to print it in every `get_eurostat_interactive()` query.
* Added a new internal function `clean_eurostat_toc()` for easy removal of TOC objects from .EurostatEnv environment. (PR #278)
* New internal function `check_lang()` (PR #270)
* `get_eurostat()` function now explicity accepts a 'lang' argument, for passing onwards to `get_eurostat_json()` and `label_eurostat()` (PR #270)
* New user facing function: `get_eurostat_folder()` for downloading all datasets in a folder. The function is limited to downloading folders that contain at maximum 20 datasets. This function relies on new internal helper functions: `toc_count_whitespace()`, `toc_determine_hierarchy()`, `toc_count_children()` and `toc_list_children()`. (PR #270)
* EXPERIMENTAL: `get_eurostat_toc()` and `set_eurostat_toc()` now have experimental features that support downloading TOCs in French and German as well. This support, in turn, is leveraged in `get_bibentry()` which now has a language parameter: `lang` (PR #270)
* Related to updates to `get_eurostat_toc()`, `search_eurostat()` now supports searching from French and German TOC-files as well (PR #270)

## Deprecated and defunct

* `grepEurostatTOC()` is completely marked as defunct and is enroute to being removed from the package as `search_eurostat()` is now the only way to fetch Eurostat TOC items and search (grep) them (PR #270)
* During the development of the 4.0.0 version there was a temporary function called `label_eurostat_vars2` that has been removed in the final version, as promised earlier: "The old function will be completely removed after October 2023 when Eurostat Bulk Download Listing website is retired and `label_eurostat_vars2` will be renamed to `label_eurostat_vars()`". The new `label_eurostat_vars()` function uses the new SDMX API to retrieve names for dataset columns. Function evolution is subject to ongoing Eurostat API developments. (PR #270)

## Bug fixes

* Added a more informatic warning message in situations where TOC datasets downloaded from Eurostat might not have proper titles. For some reason this was isolated to German and French language versions of TOC while English language TOC had proper titles for all items. (PR #278)
* `get_bibentry()` returns correct codes for titles and warns the user if some / all of the requested codes were not found in the TOC (PR #270)
* `get_bibentry()` uses the date field with the internal BibEntry format that can be easily translated to other formats: bibtex, bibentry (PR #270)
* `get_bibentry()` now outputs dataset codes in titles correctly so that `bibtex` and `biblatex` entries can be copypasted into bibliographies without adding escape characters manually (PR #270)
* Fix issue related to downloading quarterly data (issue #260, PR #271)
* Reduce RAM usage in `eurotime2date()` when handling big datasets containing weekly data and tens of millions of rows (dataset used for testing mentioned in issue #200).

# eurostat 3.8.3 (2023-03-07)

## Bug fixes
Expand Down Expand Up @@ -220,4 +276,4 @@

# eurostat 0.9.1 (2014-04-24)

* Package collected from statfi and smarterpoland
* Package collected from statfi and smarterpoland
Loading

0 comments on commit 6425c8b

Please sign in to comment.