| title | author | date | output | vignette | ||||
|---|---|---|---|---|---|---|---|---|
sorvi tutorial |
rOpenGov core team |
2017-04-19 |
|
%\VignetteIndexEntry{sorvi Markdown Vignette} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{Cairo} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc}
|
Finnish open government data toolkit for R
This R package provides miscellaneous tools for Finnish open government data. Your contributions, bug reports and other feedback are welcome!
Available data sources and tools
Installation (Asennus)
Finnish provinces (Maakuntatason informaatio)
- Basic province information (Area, Population, Population Density)
- Finnish-English province name translations
Finnish municipalities (Kuntatason informaatio)
- Land Survey Finland (Maanmittauslaitos / MML)
- Municipality-Postal code conversions (Kunnat vs. postinumerot)
- Municipality name-ID conversions (Kunnat vs. kuntakoodit)
- Municipality-province conversions (Kunnat vs. maakunnat)
- Generic synonyme converter (Synonyymit)
Finnish personal identification number (HETU) (Henkilotunnuksen kasittely)
See also other rOpenGov packages, in particular:
- gisfin Visualization of Finnish geographic information
- helsinki Helsinki open data tools
- sotkanet THL Sotkanet database on health and demography
- pxweb PX-Web interface to access data fom Statistics Finland and other PX-Web compliant sources
- finpar Finnish parliament data
Installation
We assume you have installed R. If you use RStudio, change the default encoding to UTF-8. Linux users should also install CURL.
Install the stable release version in R:
install.packages("sorvi")Development version for developers:
library(devtools)
install_github("ropengov/sorvi")Test the installation by loading the library:
library(sorvi)We recommend setting the UTF-8 encoding:
Sys.setlocale(locale="UTF-8") ## [1] ""
Brief examples of the package tools are provided below. Further examples are available in Louhos-blog and in our Rmarkdown blog.
Province information (Maakunnat)
Finnish-English translations
Finnish-English translations for province names:
translations <- load_sorvi_data("translation_provinces")
print(head(translations))## English Finnish
## 1 Åland Islands Ahvenanmaa
## 2 South Karelia Etelä-Karjala
## 3 Southern Ostrobothnia Etelä-Pohjanmaa
## 4 Southern Savonia Etelä-Savo
## 5 Kainuu Kainuu
## 6 Tavastia Proper Kanta-Häme
Convert the given terms (for now, using tools from the bibliographica R package):
# install_github("ropengov/bibliographica")
library(bibliographica) # Get some synonyme mapping tools## Error: package or namespace load failed for 'bibliographica'
translated <- bibliographica::map(c("Varsinais-Suomi", "Lappi"), translations, from = "Finnish", to = "English", keep.names = TRUE)## Error: object 'rbind_all' is not exported by 'namespace:dplyr'
head(translated)## Error in head(translated): object 'translated' not found
Municipality information
Finnish municipality information is available through Statistics Finland (Tilastokeskus; see pxweb package) and Land Survey Finland (Maanmittauslaitos). The row names for each data set are harmonized and can be used to match data sets from different sources, as different data sets may carry different versions of certain municipality names.
Land Survey Finland (municipality information)
Source: Maanmittauslaitos, MML. See also the gisfin package for further Finnish GIS data sets.
municipality.info.mml <- get_municipality_info_mml()
library(knitr)
kable(municipality.info.mml[1:2,])| Kohderyhma | Kohdeluokk | AVI | Maakunta | Kunta | AVI_ni1 | AVI_ni2 | Maaku_ni1 | Maaku_ni2 | Kunta_ni1 | Kunta_ni2 | Kieli_ni1 | Kieli_ni2 | AVI.FI | Kieli.FI | Maakunta.FI | Kunta.FI | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 71 | 84200 | 2 | 02 | 284 | Lounais-Suomen aluehallintovirasto | Regionförvaltningsverket i Sydvästra Finland | Varsinais-Suomi | Egentliga Finland | Koski Tl | N_A | Suomi | N_A | Lounais-Suomen aluehallintovirasto | Suomi | Varsinais-Suomi | Koski.Tl |
| 5 | 71 | 84200 | 4 | 06 | 508 | Länsi- ja Sisä-Suomen aluehallintovirasto | Regionförvaltningsverket i Västra och Inre Finland | Pirkanmaa | Birkaland | Mänttä-Vilppula | N_A | Suomi | N_A | Länsi- ja Sisä-Suomen aluehallintovirasto | Suomi | Pirkanmaa | Mänttä-Vilppula |
Conversions
Municipality-Province mapping
Map all municipalities to correponding provinces
m2p <- municipality_to_province()
head(m2p) # Just show the first ones## Koski.Tl Mänttä-Vilppula Äänekoski
## "Varsinais-Suomi" "Pirkanmaa" "Keski-Suomi"
## Ähtäri Akaa Alajärvi
## "Etelä-Pohjanmaa" "Pirkanmaa" "Etelä-Pohjanmaa"
Map selected municipalities to correponding provinces:
municipality_to_province(c("Helsinki", "Tampere", "Turku")) ## Helsinki Tampere Turku
## "Uusimaa" "Pirkanmaa" "Varsinais-Suomi"
Speed up conversion with predefined info table:
m2p <- municipality_to_province(c("Helsinki", "Tampere", "Turku"), municipality.info.mml)
head(m2p)## Helsinki Tampere Turku
## "Uusimaa" "Pirkanmaa" "Varsinais-Suomi"
Municipality name-ID conversion
Municipality name to code
convert_municipality_codes(municipalities = c("Turku", "Tampere"))## Turku Tampere
## "853" "837"
Municipality codes to names
convert_municipality_codes(ids = c(853, 837))## 853 837
## "Turku" "Tampere"
Complete conversion table
municipality_ids <- convert_municipality_codes()
kable(head(municipality_ids)) # just show the first entries| id | name | |
|---|---|---|
| 2 | 284 | Koski.Tl |
| 5 | 508 | Mänttä-Vilppula |
| Äänekoski | 992 | Äänekoski |
| Ähtäri | 989 | Ähtäri |
| Akaa | 020 | Akaa |
| Alajärvi | 005 | Alajärvi |
Synonyme conversions
Generic conversion of synonymes into harmonized terms.
First, get a synonyme-name mapping table. In this example we harmonize Finnish municipality names that have multiple versions. But the synonyme list can be arbitrary.
f <- system.file("extdata/municipality_synonymes.csv", package = "sorvi")
synonymes <- read.csv(f, sep = "\t") Validate the synonyme list and add lowercase versions of the terms:
synonymes <- bibliographica::check_synonymes(synonymes, include.lowercase = TRUE)## Error: object 'rbind_all' is not exported by 'namespace:dplyr'
Convert the given terms from synonymes to the harmonized names:
harmonized <- bibliographica::map(c("Mantta", "Koski.Tl"), synonymes)## Error: object 'rbind_all' is not exported by 'namespace:dplyr'
head(harmonized)## Error in head(harmonized): object 'harmonized' not found
Personal identification number (HETU)
Extracting information from a Finnish personal identification number
library(sorvi)
hetu("111111-111C")## hetu gender personal.number checksum date day month year
## 1 111111-111C Male 111 C 1911-11-11 11 11 1911
## century.char
## 1 -
The function accepts also vectors as input, returning a data frame:
library(knitr)
kable(hetu(c("010101-0101", "111111-111C")))| hetu | gender | personal.number | checksum | date | day | month | year | century.char |
|---|---|---|---|---|---|---|---|---|
| 010101-0101 | Female | 10 | 1 | 1901-01-01 | 1 | 1 | 1901 | - |
| 111111-111C | Male | 111 | C | 1911-11-11 | 11 | 11 | 1911 | - |
Extracting specific field
hetu(c("010101-0101", "111111-111C"), extract = "gender")## [1] "Female" "Male"
Validate Finnish personal identification number:
valid_hetu("010101-0101") # TRUE/FALSE## [1] TRUE
TODO
TODO list of further data sources
Licensing and Citations
This work can be freely used, modified and distributed under the Two-clause BSD license.
citation("sorvi")##
## Kindly cite the sorvi R package as follows:
##
## (C) Leo Lahti, Juuso Parkkinen, Joona Lehtomaki, Juuso Haapanen,
## Einari Happonen and Jussi Paananen (rOpenGov 2010-2017). sorvi:
## Finnish open data toolkit for R. URL:
## http://github.com/rOpenGov/sorvi
##
## A BibTeX entry for LaTeX users is
##
## @Misc{,
## title = {sorvi: Finnish open government data toolkit for R},
## author = {Leo Lahti and Juuso Parkkinen and Joona Lehtomaki and Juuso Haapanen and Einari Happonen and Jussi Paananen},
## doi = {10.5281/zenodo.10280},
## year = {2011},
## }
##
## Many thanks for all contributors!
Session info
This vignette was created with
sessionInfo()## R version 3.3.3 (2017-03-06)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.2 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_BE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_BE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_BE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_BE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sorvi_0.8.13 tibble_1.3.0 knitr_1.15.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.10.1 dplyr_0.5.0.9004 assertthat_0.2.0
## [4] R6_2.2.0 magrittr_1.5 evaluate_0.10
## [7] highr_0.6 rlang_0.0.0.9017 stringi_1.1.5
## [10] data.table_1.10.4 genderdata_0.5.0 babynames_0.2.1
## [13] tools_3.3.3 stringr_1.2.0 glue_1.0.0
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher