Permalink
Find file
Fetching contributors…
Cannot retrieve contributors at this time
executable file 434 lines (287 sloc) 12 KB
title author date output
sorvi tutorial
rOpenGov core team
2015-07-18
html_document
theme
flatly

Finnish open government data toolkit for R

This R package provides miscellaneous tools for Finnish open government data. Your contributions, bug reports and other feedback are welcome!

Available data sources and tools

Installation (Asennus)

Finnish provinces (Maakuntatason informaatio)

Finnish municipalities (Kuntatason informaatio)

ID conversion tools

Finnish personal identification number (HETU) (Henkilotunnuksen kasittely)

Visualization tools (Visualisointirutiineja)

See also other rOpenGov packages, in particular:

  • gisfin Visualization of Finnish geographic information
  • helsinki Helsinki open data tools
  • sotkanet THL Sotkanet database on health and demography
  • pxweb PX-Web interface to access data fom Statistics Finland and other PX-Web compliant sources
  • finpar Finnish parliament data

Installation

We assume you have installed R. If you use RStudio, change the default encoding to UTF-8. Linux users should also install CURL.

Install the stable release version in R:

install.packages("sorvi")

Development version for developers:

library(devtools)
install_github("ropengov/sorvi")

Test the installation by loading the library:

library(sorvi)

We also recommend setting the UTF-8 encoding:

Sys.setlocale(locale="UTF-8") 
## Warning in Sys.setlocale(locale = "UTF-8"): OS reports request to set
## locale to "UTF-8" cannot be honored
## [1] ""

Brief examples of the package tools are provided below. Further examples are available in Louhos-blog and in our Rmarkdown blog.

Province information (Maakunnat)

Finnish-English translations

Finnish-English translations for province names (we have not been able to solve all encoding problems yet; solutions welcome!):

translations <- load_sorvi_data("translations")
kable(as.matrix(translations))
## Error in kable_markdown(x = structure(c("Ã\u0085land Islands", "South Karelia", : the table must have a header (column names)

Municipality information

Finnish municipality information is available through Statistics Finland (Tilastokeskus; see stafi package) and Land Survey Finland (Maanmittauslaitos). The row names for each data set are harmonized and can be used to match data sets from different sources, as different data sets may carry different versions of certain municipality names.

Land Survey Finland (municipality information)

Source: Maanmittauslaitos, MML.

municipality.info.mml <- get_municipality_info_mml()
library(knitr)
kable(municipality.info.mml[1:2,])
Kohderyhma Kohdeluokk AVI Maakunta Kunta AVI_ni1 AVI_ni2 Maaku_ni1 Maaku_ni2 Kunta_ni1 Kunta_ni2 Kieli_ni1 Kieli_ni2 AVI.FI Kieli.FI Maakunta.FI Kunta.FI
3 71 84200 2 02 284 Lounais-Suomen aluehallintovirasto Regionförvaltningsverket i Sydvästra Finland Varsinais-Suomi Egentliga Finland Koski Tl N_A Suomi N_A Lounais-Suomen aluehallintovirasto Suomi Varsinais-Suomi Koski.Tl
6 71 84200 4 06 508 Länsi- ja Sisä-Suomen aluehallintovirasto Regionförvaltningsverket i Västra och Inre Finland Pirkanmaa Birkaland Mänttä-Vilppula N_A Suomi N_A Länsi- ja Sisä-Suomen aluehallintovirasto Suomi Pirkanmaa Mänttä-Vilppula

Conversions

Municipality-Province mapping

Map all municipalities to correponding provinces

m2p <- municipality_to_province() 
kable(head(m2p)) # Just show the first ones
## Error in kable_markdown(x = structure(c("Koski.Tl", "Mänttä-Vilppula", : the table must have a header (column names)

Map selected municipalities to correponding provinces:

municipality_to_province(c("Helsinki", "Tampere", "Turku")) 
##          Helsinki           Tampere             Turku 
##         "Uusimaa"       "Pirkanmaa" "Varsinais-Suomi"

Speed up conversion with predefined info table:

m2p <- municipality_to_province(c("Helsinki", "Tampere", "Turku"), municipality.info.mml)
kable(head(m2p))
## Error in kable_markdown(x = structure(c("Helsinki", "Tampere", "Turku", : the table must have a header (column names)

Municipality name-ID conversion

Municipality name to code

convert_municipality_codes(municipalities = c("Turku", "Tampere"))
##   Turku Tampere 
##   "853"   "837"

Municipality codes to names

convert_municipality_codes(ids = c(853, 837))
##       853       837 
##   "Turku" "Tampere"

Complete conversion table

municipality_ids <- convert_municipality_codes()
kable(head(municipality_ids)) # just show the first entries
id name
3 284 Koski.Tl
6 508 Mänttä-Vilppula
Äänekoski 992 Äänekoski
Ähtäri 989 Ähtäri
Akaa 020 Akaa
Alajärvi 005 Alajärvi

Synonyme conversions

Generic conversion of synonymes into harmonized terms.

First, get a synonyme-name mapping table. In this example we harmonize Finnish municipality names that have multiple versions. But the synonyme list can be arbitrary.

f <- system.file("extdata/municipality_synonymes.csv", package = "sorvi")
synonymes <- read.csv(f, sep = "\t")         

Validate the synonyme list and add lowercase versions of the terms:

synonymes <- check_synonymes(synonymes, include.lowercase = TRUE)

Convert the given terms from synonymes to the harmonized names:

harmonized <- harmonize_names(c("Mantta", "Koski.Tl"), synonymes)
kable(harmonized)
name original
Mäntta Mantta
Koski Tl Koski.Tl

Personal identification number (HETU)

Extracting information from a Finnish personal identification number

library(sorvi)
hetu("111111-111C")
##          hetu gender personal.number checksum       date day month year
## 1 111111-111C   Male             111        C 1911-11-11  11    11 1911
##   century.char
## 1            -

The function accepts also vectors as input, returning a data frame:

library(knitr)
kable(hetu(c("010101-0101", "111111-111C")))
hetu gender personal.number checksum date day month year century.char
010101-0101 Female 10 1 1901-01-01 1 1 1901 -
111111-111C Male 111 C 1911-11-11 11 11 1911 -

Extracting specific field

hetu(c("010101-0101", "111111-111C"), extract = "gender")
## [1] "Female" "Male"

Validate Finnish personal identification number:

valid_hetu("010101-0101") # TRUE/FALSE
## [1] TRUE

Visualization tools

Draw regression curve with smoothed error bars based on the Visually-Weighted Regression by Solomon M. Hsiang. The sorvi implementation extends Felix Schonbrodt's original code.

library(sorvi) 
data(iris)
p <- regression_plot(Sepal.Length ~ Sepal.Width, iris) 
print(p)

plot of chunk regressionline

TODO

TODO list of further data sources

Licensing and Citations

This work can be freely used, modified and distributed under the Two-clause BSD license.

citation("sorvi")
## 
## Kindly cite the sorvi R package as follows:
## 
##   (C) Leo Lahti, Juuso Parkkinen, Joona Lehtomaki, Juuso Haapanen,
##   Einari Happonen and Jussi Paananen (rOpenGov 2010-2015).  sorvi:
##   Finnish open data toolkit for R.  URL:
##   http://ropengov.github.com/sorvi
## 
## A BibTeX entry for LaTeX users is
## 
##   @Misc{,
##     title = {sorvi: Finnish open government data toolkit for R},
##     author = {Leo Lahti and Juuso Parkkinen and Joona Lehtomaki and Juuso Haapanen and Einari Happonen and Jussi Paananen},
##     doi = {10.5281/zenodo.10280},
##     year = {2011},
##   }
## 
## Many thanks for all contributors! See: http://ropengov.github.com

Session info

This vignette was created with

sessionInfo()
## R version 3.2.1 (2015-06-18)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: Ubuntu 15.04
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] sorvi_0.7.30       knitr_1.10.5       scimapClient_0.2.1
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.11.6        magrittr_1.5       MASS_7.3-41       
##  [4] munsell_0.4.2      colorspace_1.2-6   R6_2.0.1          
##  [7] highr_0.5          stringr_1.0.0      plyr_1.8.3        
## [10] dplyr_0.4.2        tools_3.2.1        parallel_3.2.1    
## [13] grid_3.2.1         gtable_0.1.2       DBI_0.3.1         
## [16] lazyeval_0.1.10    assertthat_0.1     digest_0.6.8      
## [19] RJSONIO_1.3-0      RColorBrewer_1.1-2 reshape2_1.4.1    
## [22] ggplot2_1.0.1      formatR_1.2        evaluate_0.7      
## [25] labeling_0.3       stringi_0.5-5      scales_0.2.5      
## [28] proto_0.3-10