/
nuts.Rmd
1151 lines (913 loc) · 48.6 KB
/
nuts.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "nuts: Convert European Regional Data in R"
author:
- name: "Moritz Hennicke"
url: https://hennicke.science/
orcid_id: 0000-0001-6811-1821
- name: "Werner Krause"
url: https://krausewe.github.io/
orcid_id: 0000-0002-5069-7964
vignette: >
%\VignetteIndexEntry{nuts: Convert European Regional Data in R}
%\VignetteEngine{knitr::knitr}
%\VignetteEncoding{UTF-8}
output:
distill::distill_article:
theme: theme.css
toc: true
number_sections: true
toc_depth: 4
toc_float:
collapsed: false
smooth_scroll: true
---
```{r, include = FALSE}
knitr::opts_chunk$set(
dpi=300,
fig.width = 8,
fig.height = 5,
out.width = "100%",
R.options = list(width = 70)
)
# Needed to fix issues with phantom.js
Sys.setenv(OPENSSL_CONF="/dev/null")
```
```{css, echo=FALSE}
pre code {
white-space: pre-wrap;
}
```
<!-- # Overview -->
<!-- The `nuts` package simplifies the analysis of European regional data by providing an efficient offline solution for converting data between NUTS versions and levels. Leveraging the methodology and conversion matrices of the European Commission's [Joint Research Center (JRC)](https://urban.jrc.ec.europa.eu/nutsconverter/#/), this package is designed to address the challenges posed by time-varying boundaries of European regions. -->
# Key Features
:::float-image
```{r out.width='150px', out.extra='style="float:left; padding:30px"', echo=FALSE, fig.alt ="Package logo of a squirrel holdling a walnut colored with the flag of Europe"}
knitr::include_graphics("logo.png")
```
- Efficient **offline conversion** of European regional data.
- Conversion between five NUTS **versions**: 2006, 2010, 2013, 2016, 2021.
- Conversion between three regional **levels**: NUTS-1, NUTS-2, NUTS-3.
- Ability to convert **multiple** NUTS versions at once when e.g. NUTS versions differ across countries and years. This scenario is common when working with data sourced from EUROSTAT.
- (Dasymetric) **Spatial interpolation** based on five **weights** (regional area size, 2011 and 2018 population size, 2012 and 2018 built-up area) built from granular [100m x 100m] geodata by the European Commission's [Joint Research Center (JRC)](https://urban.jrc.ec.europa.eu/tools/nuts-converter).
:::
# NUTS Codes
The Nomenclature of Territorial Units for Statistics (NUTS) is a geocode standard for referencing the administrative divisions of European countries. A NUTS code starts with a two-letter combination indicating the country.[^1] The administrative subdivisions, or **levels**, are referred to with an additional number or a capital letter (NUTS-1). A second (NUTS-2) or third (NUTS-3) subdivision level is referred to with another digit each.
For example, the German district *Northern Saxony* (*Nordsachsen*) is located within the region *Leipzig* and the federate state Saxony.
- NUTS-1: States
- DED: Saxony
- NUTS-2: States/Government Regions
- DED5: Leipzig
- NUTS-3: Districts
- DED53: Northern Saxony
Since administrative boundaries in Europe change for demographic, economic, political or other reasons, there are five different **versions** of the NUTS Nomenclature (2006, 2010, 2013, 2016, and 2021). The current version, effective from 1 January 2021, lists 104 regions at NUTS-1, 283 regions at NUTS-2, 1 345 regions at NUTS-3 level[^2].
[^1]: [European Interinstitutional Style Guide](https://publications.europa.eu/code/en/en-5000600.htm) In the case of Greece for instance this code was changed from GR to EL in 2011.
[^2]: [2022 report of the European Union](https://ec.europa.eu/eurostat/documents/3859598/15193590/KS-GQ-22-010-EN-N.pdf)
## Spatial interpolation in a nutshell
When administrative units are restructured, regional data measured within old boundaries can be converted to the new boundaries under reasonable assumptions. The main task of this package is to use (**dasymetric**) **spatial interpolation** to accomplish this.
Let's take the example of the German state Saxony in the figures below. Here, the NUTS-2 regions *Leipzig* (`DED3` → `DED5`) and *Chemnitz* (`DED1` → `DED4`) were reorganized. We are interested in the number of manure storage facilities in 2003 provided by [EUROSTAT](https://ec.europa.eu/eurostat/databrowser/view/PAT_EP_RTOT/default/table) based on the 2006 NUTS version. A part of *Leipzig* was reassigned to *Chemnitz* (center plot), prompting us to recalculate the number of storage facilities in the 2010 version (right plot).
A simple approach is to redistribute manure storage facilities proportional to the transferred area, assuming equal distribution of manure storages across space. In a dasymetric approach, we could make use of built-up area, assuming that manure deposits are more likely to be found close to residential areas and economic sites. In our example, *Leipzig* lost about 7.7% (\(\frac{5574}{72360}\)) of its built-up area. We re-calculate the number of manure storage facilities by computing 7.7% of *Leipzig's* manure storages \(\frac{5574}{72360} * 700 = 54\), subtracting them from Leipzig and adding them to *Chemnitz*.
See the Section [*Spatial interpolation in detail*](#method) for an in-depth description of the weighting procedure.
```{r echo = FALSE, message=F, warning=F}
library(nuts)
library(stringr)
library(ggplot2)
library(ggrepel)
library(sf)
library(terra)
library(raster)
library(kableExtra)
library(ggpubr)
library(formatR)
library(ggalluvial)
library(dplyr)
```
```{r echo = FALSE , warning=F , message=F,results='hide'}
data(manure, package = "nuts")
manure_indic <- manure %>%
filter(nchar(geo) == 4) %>%
filter(indic_ag == "I07A_EQ_Y") %>%
dplyr::select(-indic_ag ) %>%
filter( str_detect(geo, "^DE")) %>%
filter( time == 2003 )
class <- manure_indic %>%
distinct(geo, .keep_all = T) %>%
nuts_classify(
data = .,
nuts_code = "geo"
)
transf <- class %>%
nuts_convert_version(
to_version = "2010",
variables = c('values'='absolute'),
weight = 'artif_surf12'
)
small <- manure_indic %>%
filter( geo %in% c( 'DED1' , 'DED3' ) )
small <- small %>%
mutate( artif_surf12 = case_when( geo == "DED1" ~ 98648
, geo == "DED3" ~ 66786 + 5574
))
shape_06_n3 <- read_sf("shapefiles/NUTS_RG_20M_2006_3857_DE.shp") %>%
filter(LEVL_CODE == 3) %>%
full_join( manure_indic , by = c("NUTS_ID" = "geo")) %>%
filter( str_detect( NUTS_ID , '^DED1|^DED3' ))
shape_06_n2 <- read_sf("shapefiles/NUTS_RG_20M_2006_3857_DE.shp") %>%
filter(LEVL_CODE == 2) %>%
full_join( manure_indic , by = c("NUTS_ID" = "geo")) %>%
filter( str_detect( NUTS_ID , '^DED1|^DED3' ))
shape_06_n2_centr <- shape_06_n2 %>%
st_centroid( ) %>%
left_join( small , by = c("NUTS_ID" = "geo"))
p_initial = ggplot() +
geom_sf(data = shape_06_n2 , linewidth = 1 , aes( fill = NUTS_ID )) +
geom_sf(data = shape_06_n3 , color = 'grey' , linewidth = .5 , fill = NA ) +
geom_sf_label( data = shape_06_n2_centr
, aes( label = paste0( NUTS_ID , ': ' , NUTS_NAME , '\n' , values.x , ' facilties' , '\n' , 'BU: ' , artif_surf12 ))
, size = 3
) +
scale_fill_manual( values = c( "#177e89" , "#ffc857" )) +
theme_minimal( ) +
facet_wrap(~"2003 data\n\n(NUTS VERSION 2006)") +
theme(legend.position = "none"
, axis.text = element_blank( )
, axis.ticks = element_blank( )
, axis.title = element_blank( )
)
DED33_centr <- shape_06_n3 %>%
filter( NUTS_ID == "DED33" ) %>%
st_centroid( ) %>%
mutate( artif_surf12 = 5574 )
shape_06_n2_step <- shape_06_n2 %>%
mutate( artif_surf12 = case_when( NUTS_ID == 'DED3' ~ 66786
, NUTS_ID == 'DED1' ~ 98648 )
, NUTS_ID = case_when( NUTS_ID == 'DED3' ~ "DED5"
, NUTS_ID == 'DED1' ~ "DED4" )
)
p_step = ggplot() +
geom_sf(data = shape_06_n2 , linewidth = 1 , aes( fill = NUTS_ID )) +
geom_sf(data = shape_06_n3 , color = 'grey' , fill = NA , linewidth = .5 ) +
geom_sf_label( data = shape_06_n2_step , aes( label = paste0( NUTS_ID , ': ' , NUTS_NAME , '\n' , '??? facilities' , '\n' , 'BU: ' , artif_surf12 ))
, size = 3) +
geom_sf(data = shape_06_n3 %>% filter( NUTS_ID %in% c("DED33")), color = 'red' , fill = NA , linewidth = 1 ) +
geom_sf_text( data = DED33_centr , aes( label = paste0( "BU: \n" , artif_surf12 )), size = 3 , lineheight = .75 ) +
scale_fill_manual( values = c( "#177e89" , "#ffc857" )) +
theme_minimal( ) +
facet_wrap(~"2003 data\n\n(NUTS VERSION 2006 → 2010)") +
theme(legend.position = "none"
, axis.text = element_blank( )
, axis.ticks = element_blank( )
, axis.title = element_blank( ))
moved <- 700 * (5574 / 72360)
DED5 <- 700 - 54
DED4 <- 2600 + 54
shape_06_n2_final <- shape_06_n2_step %>%
mutate( artif_surf12 = case_when( NUTS_ID == 'DED5' ~ 66786
, NUTS_ID == 'DED4' ~ 98648 + 5574 )
, values.x = case_when( NUTS_ID == 'DED5' ~ DED5
, NUTS_ID == 'DED4' ~ DED4 )
)
p_final = ggplot() +
geom_sf(data = shape_06_n2 , linewidth = 1 , aes( fill = NUTS_ID )) +
geom_sf(data = shape_06_n3 , color = 'grey' , fill = NA , linewidth = .5 ) +
geom_sf_label( data = shape_06_n2_final
, aes( label = paste0( NUTS_ID , ': ' , NUTS_NAME , '\n' , round( values.x , 1 ) , ' facilties' , '\n' , 'BU: ' , artif_surf12 ))
, size = 3 ) +
geom_sf(data = shape_06_n3 %>% filter( NUTS_ID %in% c("DED33")), color = 'red' , fill = "#177e89" , linewidth = 1 ) +
scale_fill_manual( values = c( "#177e89" , "#ffc857" )) +
theme_minimal( ) +
facet_wrap(~"2003 data\n\n(NUTS VERSION 2010)") +
theme(legend.position = "none"
, axis.text = element_blank( )
, axis.ticks = element_blank( )
, axis.title = element_blank( ))
```
```{r, echo=FALSE, fig.cap="Holdings with Manure Storage Facilities; BU = Built-up area in square meters; Sources: [Shapefiles](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) and [data](https://ec.europa.eu/eurostat/databrowser/view/PAT_EP_RTOT/default/table) are from EUROSTAT; Created using the [sf](https://r-spatial.github.io/sf/) package.", fig.alt ="Maps of NUTS 3 regions Chemnitz and Leipzig in NUTS version 2003, between 2003 and 2006 and 2006. They visualize the example in the text in which Chemnitz contributes a part of its area to Leipzig."}
gridExtra::grid.arrange( p_initial, p_step , p_final , nrow = 1 )
```
# Usage
The package comes with three main functions:
- `nuts_classify()` detects the NUTS version(s) and level(s) of a data set. Its output can be directly fed into the two other functions.
- `nuts_convert_version()` converts your data to a desired NUTS version (2006, 2010, 2013, 2016, 2021). This transformation works in any direction.
- `nuts_aggregate()` aggregates data to some upper-level NUTS code, i.e., it transforms NUTS-3 data to the NUTS-2 or NUTS-1 level (but not vice versa).
## Workflow
The conversion can only be conducted after classifying the NUTS version(s) and level(s) of your data using the function `nuts_classify()`. This step ensures the validity and completeness of your NUTS codes before proceeding with the conversion.
```{r , echo=FALSE, out.width='60%', fig.align="center", fig.cap="Sequential workflow to convert regional NUTS data", fig.alt ="Flow diagram that shows that conversion functions are run after classification."}
knitr::include_graphics("flow.png")
```
## Identifying NUTS version and level
The `nuts_classify()` function's main purpose is to find the most suitable NUTS **version** and to identify the **level** of the data set. Below, you see an example using patent application data (per one million inhabitants) for Norway in 2012 at the NUTS-2 level. This data is again provided by [EUROSTAT](https://ec.europa.eu/eurostat/databrowser/view/PAT_EP_RTOT/default/table?lang=en).
```{r}
# Load packages
library(nuts)
library(dplyr)
library(stringr)
# Loading and subsetting Eurostat data
data(patents, package = "nuts")
pat_n2 <- patents %>%
filter(nchar(geo) == 4) # NUTS-2 values
pat_n2_mhab_12_no <- pat_n2 %>%
filter(unit == "P_MHAB") %>% # Patents per one million inhabitants
filter(time == 2012) %>% # 2012
filter(str_detect(geo, "^NO")) %>% # Norway
dplyr::select(-unit)
# Classifying the Data
pat_classified <- nuts_classify(
data = pat_n2_mhab_12_no,
nuts_code = "geo"
)
```
The function returns a list with three items. These items can be called directly from the output object (`data$...`) or retrieved using the three helper functions `nuts_get_data()`, `nuts_get_version()`, and `nuts_get_missing()`.
1. The first item gives the **original data set** augmented with the columns `from_version`, `from_level`, and `country`, indicating the NUTS version that best suits the data. All functions of the package always group NUTS codes across **country names** which are automatically generated from the provided NUTS codes.
Below, you see that all data entries correspond to the 2016 NUTS version.
```{r}
# pat_classified$data # Call list item directly or...
nuts_get_data(pat_classified) # ...use helper function
```
2. The second item provides an overview of the share of matching NUTS codes for each of the five existing NUTS versions. The **overlap** is computed within country and possibly additional groups (if provided via the `group_vars` argument).
```{r}
# pat_classified$versions_data # Call list item directly or...
nuts_get_version(pat_classified) # ...use helper function
```
3. The third item gives all NUTS codes that are **missing** across groups. Such missing codes might lead to conversion errors and are, by default, omitted from all conversion procedures. In our example, no NUTS codes are missing.
<!-- We recommend to check whether missing values for these NUTS codes can be replaced, perhaps with 0. -->
```{r}
# pat_classified$missing_data # Call list item directly or...
nuts_get_missing(pat_classified) # ...use helper function
```
## Converting data between NUTS versions
Once the NUTS version and level of the original data are identified, you can easily **convert** the data to any other **NUTS version**. Here is an example of transforming the 2013 Norwegian data to the 2021 NUTS version. Between 2016 and 2021, the number of NUTS-2 regions in Norway decreased by one as the borders of six regions were transformed. The maps below show the affected regions.
We provide the classified NUTS data, specify the target NUTS version for data transformation, and supply the variable containing the values to be interpolated. It is important to indicate the **variable type** in the named input-vector since the interpolation approaches differ for [absolute and relative values](https://urban.jrc.ec.europa.eu/nutsconverter/docs/2022_08_04_NUTS_converter.pdf).
```{r}
# Converting Data to 2021 NUTS version
pat_converted <- nuts_convert_version(
data = pat_classified,
to_version = "2021",
variables = c("values" = "relative")
)
```
```{r, echo=FALSE,include=FALSE}
no_2006 <-
read_sf("shapefiles/NUTS_RG_20M_2016_3857_NO.shp") %>%
filter(LEVL_CODE == 2) %>%
full_join(pat_n2_mhab_12_no , by = c("NUTS_ID" = "geo"))
no_changes <- cross_walks %>%
filter(
nchar(from_code) == 4,
from_version == 2016,
to_version == 2021,
grepl("^NO", from_code),
from_code != to_code
)
no_changes <- unique(c(no_changes$from_code, no_changes$to_code))
gg_2006 = ggplot() +
geom_sf(
data = no_2006,
aes(fill = values) ,
color = 'grey' ,
linewidth = .5
) +
geom_sf(
data = filter(no_2006, NUTS_ID %in% no_changes),
color = 'red' ,
fill = "#00000000"
) +
scale_fill_continuous(high = "#132B43",
low = "#56B1F7",
name = "Patents per 1 M habitants") +
theme_minimal() +
facet_wrap( ~ "Original 2012 data\n\n(NUTS VERSION 2016)")
no_2021 <-
read_sf("shapefiles/NUTS_RG_20M_2021_3857_NO.shp") %>%
filter(LEVL_CODE == 2) %>%
full_join(pat_converted , by = c("NUTS_ID" = "to_code")) %>%
filter(NUTS_ID != "NO0B")
gg_2021 = ggplot() +
geom_sf(
data = no_2021,
aes(fill = values) ,
color = 'grey' ,
linewidth = .5
) +
geom_sf(
data = filter(no_2021, NUTS_ID %in% no_changes),
color = 'red' ,
fill = "#00000000"
) +
scale_fill_continuous(high = "#132B43",
low = "#56B1F7",
name = "Patents per 1 M habitants") +
theme_minimal() +
facet_wrap( ~ "Transformed data\n\n(NUTS VERSION 2021)")
```
```{r, echo=FALSE, fig.cap="Converting patent data between versions; Sources: [Shapefiles](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) and [data](https://ec.europa.eu/eurostat/databrowser/view/PAT_EP_RTOT/default/table?lang=en) are from EUROSTAT; Created using the [sf](https://r-spatial.github.io/sf/) package.", fig.alt ="Two maps of patents per 1M habitants in Norwegian NUTS 2 regions in NUTS version 2016 and converted to NUTS version 2021"}
ggarrange( gg_2006 , gg_2021 , nrow = 1 , common.legend = T, legend = "bottom")
```
The output below displays the corresponding data frames based on the original and converted NUTS codes. The original data set comprises of seven observations, whereas the converted data set contains six. The regions `NO01`, `NO03`, `NO04`, and `NO05` are lost, while `NO08`, `NO09`, and `NO0A` are now listed.
```{r}
pat_n2_mhab_12_no
```
```{r}
pat_converted
```
### Converting multiple variables simultaneously
You can also convert **multiple variables** at once. Below, we add the number of patent applications per 1000 inhabitants as a second variable:
```{r}
# Converting Multiple Variables
pat_n2_mhab_12_no %>%
mutate(values_per_thous = values * 1000) %>%
nuts_classify(
data = .,
nuts_code = "geo"
) %>%
nuts_convert_version(
data = .,
to_version = "2021",
variables = c("values" = "relative",
"values_per_thous" = "relative")
)
```
### Converting grouped data
Longitudinal regional data, as commonly supplied by EUROSTAT, often comes with varying NUTS versions across countries and years (and other dimensions). It is possible to harmonize data across such **groups** with the `group_vars` argument in `nuts_classify()`. Below, we transform data within country and year groups for Sweden, Slovenia, and Croatia to the 2021 NUTS version.
```{r, tidy=TRUE, tidy.opts=list(width.cutoff=60)}
# Classifying grouped data (time)
pat_n2_mhab_sesihr <- pat_n2 %>%
filter(unit == "P_MHAB") %>%
filter(str_detect(geo, "^SE|^SI|^HR"))
pat_classified <- nuts_classify(
nuts_code = "geo",
data = pat_n2_mhab_sesihr,
group_vars = "time"
)
```
Note that the detected best-fitting NUTS versions differ across countries:
```{r, tidy=TRUE, tidy.opts=list(width.cutoff=60)}
nuts_get_data(pat_classified) %>%
group_by(country, from_version) %>%
tally()
```
The grouping is stored and passed on to the conversion function:
```{r}
# Converting grouped data (Time)
pat_converted <- nuts_convert_version(
data = pat_classified,
to_version = "2021",
variables = c("values" = "relative")
)
```
Conveniently, the group argument can also be used to transform higher dimensional data. Below, we include two indicators for patent applications to convert data that varies at the indicator-year-country-NUTS code level.
```{r}
# Classifying and converting multi-group data
pat_n2_mhabmact_12_sesihr <- pat_n2 %>%
filter(unit %in% c("P_MHAB", "P_MACT")) %>%
filter(str_detect(geo, "^SE|^SI|^HR"))
pat_converted <- pat_n2_mhabmact_12_sesihr %>%
nuts_classify(
data = .,
nuts_code = "geo",
group_vars = c("time", "unit")
) %>%
nuts_convert_version(
data = .,
to_version = "2021",
variables = c("values" = "relative")
)
```
## Converting data between NUTS levels
The `nuts_aggregate()` function facilitates the **aggregation** of data from lower NUTS **levels** to higher ones using spatial weights. This enables users to summarize variables upward from the NUTS-3 level to NUTS-2 or NUTS-1 levels. It is important to note that this function does not support disaggregation since this comes with strong assumptions about the spatial distribution of a variable's values.
In the following example, we illustrate how to aggregate the total number of patent applications in Sweden from NUTS-3 to higher levels. The functions below return a warning concerning non-identifiable NUTS codes. See [*Non-identified NUTS codes*](#nuts_not_identified) for further information.
```{r}
data("patents", package = "nuts")
# Aggregating data from NUTS-3 to NUTS-2 and NUTS-1
pat_n3 <- patents %>%
filter(nchar(geo) == 5)
pat_n3_nr_12_se <- pat_n3 %>%
filter(unit %in% c("NR")) %>%
filter(time == 2012) %>%
filter(str_detect(geo, "^SE"))
pat_classified <- nuts_classify(
data = pat_n3_nr_12_se,
nuts_code = "geo"
)
pat_level2 <- nuts_aggregate(
data = pat_classified,
to_level = 2,
variables = c("values" = "absolute")
)
pat_level1 <- nuts_aggregate(
data = pat_classified,
to_level = 1,
variables = c("values" = "absolute")
)
```
```{r, echo=FALSE, fig.cap="Aggregating patents from NUTS 3 to NUTS 2 and NUTS 1; Sources: [Shapefiles](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) and [data](https://ec.europa.eu/eurostat/databrowser/view/PAT_EP_RTOT/default/table?lang=en) are from EUROSTAT; Created using the [sf](https://r-spatial.github.io/sf/) package.", fig.alt = "Three maps of Sweden with patent applications at the NUTS 3 level and aggregated to NUTS level 2 and 1."}
eu_nuts3 <-
read_sf("shapefiles/NUTS_RG_20M_2016_3857_SE.shp") %>%
filter(LEVL_CODE == 3) %>%
full_join(pat_n3_nr_12_se , by = c("NUTS_ID" = "geo"))
eu_nuts2 <-
read_sf("shapefiles/NUTS_RG_20M_2016_3857_SE.shp") %>%
filter(LEVL_CODE == 2) %>%
full_join(pat_level2 , by = c("NUTS_ID" = "to_code"))
eu_nuts1 <-
read_sf("shapefiles/NUTS_RG_20M_2016_3857_SE.shp") %>%
filter(LEVL_CODE == 1) %>%
full_join(pat_level1 , by = c("NUTS_ID" = "to_code"))
gg_nuts3 = ggplot() +
geom_sf(
data = eu_nuts3,
aes(fill = values) ,
color = 'grey' ,
linewidth = .25
) +
scale_fill_continuous(high = "#132B43", low = "#56B1F7") +
theme_minimal() + theme(legend.position = "bottom") +
facet_wrap( ~ "NUTS-3 data")
gg_nuts2 = ggplot() +
geom_sf(
data = eu_nuts2,
aes(fill = values) ,
color = 'grey' ,
linewidth = .25
) +
scale_fill_continuous(high = "#132B43", low = "#56B1F7") +
theme_minimal() + theme(legend.position = "bottom") +
facet_wrap( ~ "NUTS-2 data")
gg_nuts1 = ggplot() +
geom_sf(
data = eu_nuts1,
aes(fill = values) ,
color = 'grey' ,
linewidth = .25
) +
scale_fill_continuous(high = "#132B43", low = "#56B1F7") +
theme_minimal() + theme(legend.position = "bottom") +
facet_wrap( ~ "NUTS-1 data")
gg = ggpubr::ggarrange(gg_nuts3 , gg_nuts2 , gg_nuts1 , nrow = 1)
annotate_figure(gg, top = text_grob("Patent applications across Swedish NUTS regions"))
```
## Inconsistent versions and levels
### Non-identified NUTS codes {#nuts_not_identified}
If the input data contains NUTS codes that cannot be identified in any NUTS version, the output of `classify_nuts` lists all of these codes. All conversion procedures (`nuts_convert_version()` and `nuts_aggregate()`) will work as expected while ignoring values for these regions.
The example below classifies 2012 patent data from Denmark. The original EUROSTAT data contains the codes `DKZZZ` and `DKXXX`, which are not part of the conversion matrices. Codes ending with the letter Z refer to "[Extra-Regio](https://stat.gov.pl/en/regional-statistics/classification-of-territorial-units/classification-of-territorial-units-for-statistics-nuts/principles-for-creation-and-development-of-nuts-units/)" territories. These codes collect statistics for territories that cannot be attached to a certain region.[^3] Codes ending with the letter X refer to observations with unknown regions.
[^3]: Such as air-space, territorial waters and the continental shelf, embassies, consulates, military bases and deposits of oil, natural gas, etc. in international waters.
```{r}
pat_n3.nr.12.dk <- pat_n3 %>%
filter(unit %in% c("NR")) %>%
filter(time == 2012) %>%
filter(str_detect(geo, "^DK"))
pat_classified <- nuts_classify(
data = pat_n3.nr.12.dk,
nuts_code = "geo"
)
```
### Missing NUTS codes
`nuts_classify()` also checks whether the NUTS codes provided are complete (or values of a variable that the user wants to convert are missing for a region). Missing values in the input data will, by default, result in missing values for all affected transformed regions in the output data.
The example with Slovenia below illustrates this case.
```{r}
pat_n3_nr_12_si <- pat_n3 %>%
filter(unit %in% c("NR")) %>%
filter(time == 2012) %>%
filter(str_detect(geo, "^SI"))
pat_classified <- nuts_classify(
data = pat_n3_nr_12_si,
nuts_code = "geo"
)
```
`nuts_classify()` returns a warning that NUTS codes are missing in the input data. These codes can be inspected by calling `nuts_get_missing(pat_classified)`.
```{r}
nuts_get_missing(pat_classified)
```
The resulting conversion returns three missing values as the source code `SI011` transformed into `SI031` and the region `SI016` was split into `SI036` and `SI037`.
```{r}
nuts_convert_version(
data = pat_classified,
to_version = "2021",
variables = c("values" = "absolute")
) %>%
filter(is.na(values))
```
Users have the option `missing_weights_pct` to investigate the consequences of missing values in the converted data. Setting the argument to `TRUE` returns a variable that indicates the percentage of missing weights due to missing NUTS codes (or missing values in the variable). The data frame below shows three regions that could not be computed due to missing data. Values in region `SI036` could not be computed since 97.9% of the weights are missing. Values for region `SI037` are missing as well even though only 0.8% of its population-weighted area is missing.
```{r}
nuts_convert_version(
data = pat_classified,
to_version = "2021",
weight = "pop18",
variables = c("values" = "absolute"),
missing_weights_pct = TRUE
) %>%
arrange(desc(values_na_w))
```
Using the the share of missing weights in combination with the option `missing_rm`, the `nuts` package allows to recover some of the missing regions approximately. We can achieve this by setting `missing_rm` to `TRUE`, effectively assuming 0 for missing values. In the next step we remove regions with a high share of missing weights from the output data again. The data frame below shows that values for `SI037` could still be used assuming 0 patents for 0.8% of the missing population-weighted area to construct the region.
```{r}
nuts_convert_version(
data = pat_classified,
to_version = "2021",
weight = "pop18",
variables = c("values" = "absolute"),
missing_weights_pct = TRUE,
missing_rm = TRUE
) %>%
filter(to_code %in% c("SI031", "SI036", "SI037")) %>%
mutate(values_imp = ifelse(values_na_w < 1, values, NA))
```
### Multiple NUTS levels within groups
The package does not allow for the conversion of **multiple NUTS levels** at once. The classification function will throw an error in this case. The conversion needs to be conducted for every level separately.
```{r, error=TRUE}
patents %>%
filter(nchar(geo) %in% c(4, 5), grepl("^EL", geo)) %>%
distinct(geo, .keep_all = T) %>%
nuts_classify(nuts_code = "geo", data = .)
```
### Multiple NUTS versions within groups
Converting **multiple NUTS versions** within groups might lead to erroneous spatial interpolations since overlaps between regions of different versions are possible.
The example below illustrates this problem. We classify German and Italian manure storage facility data from EUROSTAT without specifying `group_vars`. Instead, we keep all unique NUTS codes to artificially create a data set containing different NUTS versions. `nuts_classify()` returns a warning and by inspecting the identified versions, we see that there are mixed versions within groups (the countries).
```{r}
man_deit <- manure %>%
filter(grepl("^DE|^IT", geo)) %>%
filter(nchar(geo) == 4, ) %>%
distinct(geo, .keep_all = T) %>%
nuts_classify(nuts_code = "geo", data = .)
nuts_get_data(man_deit) %>%
group_by(country, from_version) %>%
tally()
```
When proceeding to the conversion with either `nuts_convert_version()` or `nuts_aggregate()`, both functions will throw an error. For convenience, we added the option `multiple_versions` that subsets the supplied data to the dominant version within groups when specified with `most_frequent`. Hence, all codes from other, non-dominant versions are discarded.
Once we convert this data set, all NUTS regions unrecognized according to the 2006 (Germany) and 2021 (Italy) version are dropped automatically.
```{r}
man_deit_converted <- nuts_convert_version(
data = man_deit,
to_version = 2021,
variables = c("values" = "relative"),
multiple_versions = "most_frequent"
)
man_deit_converted %>%
group_by(country, to_version) %>%
tally()
```
# Spatial interpolation in detail {#method}
This section describes the spatial interpolation procedure. We first cover the logic of conversion tables and then explain the methods used in the package for converting versions and levels.
## Changes in administrative boundaries
Below, Norwegian NUTS-2 regions for the versions 2016 and 2021 are shown. All regions apart from Norway's most Northern region have been reorganized in this period.
```{r, echo=FALSE, message = FALSE, warning = FALSE, fig.cap="Norwegian NUTS-2 regions with boundary changes; Sources: [Shapefiles](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) from EUROSTAT; Created using the [sf](https://r-spatial.github.io/sf/) package.", fig.alt ="Two maps of Norwegian NUTS-2 regions in version 2016 and 2021. The most Eastern and Southern regions have been affected most by administrative redistricting."}
no_2016 <- read_sf("shapefiles/NUTS_RG_20M_2016_3857_NO.shp") %>%
filter(nchar(NUTS_ID) == 4) %>%
mutate(nuts = paste0(NUTS_ID, "\n", NUTS_NAME))
no_2021 <- read_sf("shapefiles/NUTS_RG_20M_2021_3857_NO.shp") %>%
filter(nchar(NUTS_ID) == 4, NUTS_ID != "NO0B") %>%
mutate(nuts = paste0(NUTS_ID, "\n", NUTS_NAME))
no_codes <- unique(c(no_2016$NUTS_ID, no_2021$NUTS_ID))
colorz = RColorBrewer::brewer.pal(length(no_codes), "Set3")
names(colorz) <- no_codes
b = 1700000
gg_2016 = ggplot() +
geom_sf(data = no_2016, aes(fill = NUTS_ID ) , color = 'grey' , linewidth = .5 ) +
scale_fill_manual(values = colorz) +
geom_sf_text(data = no_2016, aes(label = NUTS_ID)) +
theme_minimal( ) +
labs(subtitle = "2016 version") +
xlab("") + ylab("") +
coord_sf(xlim = c(504756.9,3441975-b), ylim = c(7965649,11442790-b))
gg_2021 = ggplot() +
geom_sf(data = no_2021, aes(fill = NUTS_ID ) , color = 'grey' , linewidth = .5 ) +
scale_fill_manual(values = colorz) +
geom_sf_text(data = no_2021, aes(label = NUTS_ID)) +
theme_minimal( ) +
labs(subtitle = "2021 version") +
xlab("") + ylab("") +
coord_sf(xlim = c(504756.9,3441975-b), ylim = c(7965649,11442790-b))
gg = ggarrange( gg_2016 , gg_2021 , nrow = 1 , legend = "none")
annotate_figure(gg) #, top = text_grob("Norwegian NUTS-2 regions with boundary changes"))
```
The changes between the two versions can be summarized as follows:
1. Boundary changes of regions with **continued** NUTS codes
- `NO02` ceases a small area to the new `NO08`
- `NO06` makes small area gains from `NO05`
2. Changes to regions with **discontinued** NUTS codes
- `NO01` is absorbed by `NO08`
- `NO03` is split up between `NO08` and `NO09`
- `NO04` divides into `NO0A` and `NO09`
- `NO05` largely becomes the new `NO0A`, and gives a small area to `NO06`
## Spatial interpolation and conversion tables
To keep track of these changes, the `nuts` package uses two data sets:
1. Stocks: data(`all_nuts_codes`) contains **all historical NUTS codes** by NUTS version and country
2. Flows: data(`cross_walks`) contains the **conversion tables** between NUTS versions
They are based on [data provided by the JRC](https://urban.jrc.ec.europa.eu/nutsconverter/#/). Both data sets can also be used by the user manually to explore specific conversion patterns more closely.
For Norway going from version 2016 to 2021 at NUTS level 2, the `cross_walks` can be easily subset as follows:
```{r}
no_walks <- cross_walks %>%
filter(nchar(from_code) == 4,
from_version == 2016,
to_version == 2021,
grepl("^NO", from_code))
```
Which results in the following conversion table:
```{r, echo=FALSE, message = FALSE, warning = FALSE}
kable(no_walks, "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
column_spec(1, background = 'azure') %>%
column_spec(2, background = 'aquamarine') %>%
column_spec(3, background = 'azure') %>%
column_spec(4, background = 'aquamarine') %>%
scroll_box(width = "100%")
```
In addition to tracing the evolution of NUTS codes, the table contains **flows** of area, population and artificial surfaces between regions and versions. These flows were computed by the JRC with granular [100m x 100m] geographic data. The `ggalluvial` plot below visualizes the flows of area size between the NUTS-2 regions mapped above.
```{r, echo=FALSE, message = FALSE, warning = FALSE, fig.cap= "Alluvial plot illustrating area size flows; Created using the [ggalluvial](https://corybrunson.github.io/ggalluvial/) package.", fig.alt ="The alluvial plot shows population flows from NUTS version 2016 to 2021."}
# Add names
no_2016_names <- read_sf("shapefiles/NUTS_RG_20M_2016_3857_NO.shp") %>%
dplyr::select(from_code = NUTS_ID, from_name = NUTS_NAME) %>%
st_set_geometry(NULL)
no_2021_names <- read_sf("shapefiles/NUTS_RG_20M_2021_3857_NO.shp") %>%
dplyr::select(to_code = NUTS_ID, to_name = NUTS_NAME) %>%
st_set_geometry(NULL)
no_walks <- no_walks %>%
inner_join(no_2016_names) %>%
inner_join(no_2021_names)
gg_pop_flows <- no_walks %>%
mutate(from = paste0(from_code, "\n", from_name),
to = paste0(to_code, "\n", to_name)) %>%
arrange(desc(to_code)) %>%
ggplot(data = .,
aes(axis1 = from, axis2 = to,
y = areaKm ^ 0.3)) +
geom_alluvium(aes(fill = from)) +
geom_stratum() +
scale_x_discrete(limits = c("v2016", "v2021")) +
ggfittext::geom_fit_text(stat = "stratum", width = 1/4, min.size = 3, aes(label = after_stat(stratum))) +
theme_minimal() +
theme(axis.text.y=element_blank(),
axis.ticks.y=element_blank())+
ylab("Area (sqkm)") +
theme(legend.position = "none")+
labs(title = "NUTS-2 Area Flows in Norway from versions 2016 to 2021",
caption = "Flow size is scaled for improved readability")
gg_pop_flows
```
To illustrate the main idea, the map below showcases **population densities** across NUTS-2 regions. As population is not uniformly distributed across space, weighting regions dependent on their area size comes with strong assumptions. For instance, region `NO01` in version 2016, that contains the city of Oslo, makes a relatively modest geographical contribution to the new region `NO08`, but significantly bolsters the population of the latter. Assuming that the variable to be converted is correlated with population across space, the conversion can thus be refined using population weights to account for flows between different versions.
```{r, echo=FALSE, message = FALSE, warning = FALSE, out.width = "100%", fig.width = 7, fig.cap= "Spatial distribution of population and boundary changes; Sources: [Shapefiles](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units/nuts) and [population raster](https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/population-distribution-demography/geostat) from EUROSTAT; Created using the [sf](https://r-spatial.github.io/sf/) and the [terra](https://rspatial.github.io/terra/reference/terra-package.html) packages.", fig.alt ="Two maps of Southern Norway with very granular population density and administrative boundaries of the 2016 and 2021 NUTS version. The region with the capital Olso and its adjacent region are highlighted in version 2016 that both contribute to a larger single region in version 2021."}
# pop <- raster("JRC_1K_POP_2018.tif")
# no_2016_1 <- no_2016 %>% st_transform(crs(pop))
# saveRDS( no_2016_1 , 'JRC_1K_POP_2018_2016_transformed_NO.rds' )
# no_2021_1 <- no_2021 %>% st_transform(crs(pop))
# saveRDS( no_2021_1 , 'JRC_1K_POP_2018_2021_transformed_NO.rds' )
# no_pop <- crop(x = pop, y = as_Spatial(no_2016_1))
# no_pop <- mask(no_pop, as_Spatial(no_2016_1))
# no_pop_df <- as.data.frame(no_pop, xy = TRUE) %>%
# filter(!is.na(JRC_1K_POP_2018))
no_pop_df <- readRDS( 'JRC_1K_POP_2018_NO.rds' )
no_2016_1 <- readRDS( 'JRC_1K_POP_2018_2016_transformed_NO.rds' )
no_2021_1 <- readRDS( 'JRC_1K_POP_2018_2021_transformed_NO.rds' )
c=500000
d=500000
no_2016_no01 <- filter(no_2016_1, NUTS_ID %in% c( 'NO01' , 'NO03' ))
no_2021_no08 <- filter(no_2021_1, NUTS_ID %in% c( 'NO08' ))
gg_pop = ggplot() +
geom_raster(data = no_pop_df, aes(x = x, y = y, fill = JRC_1K_POP_2018)) +
geom_sf(data = no_2016_1, color = "#636363", fill = NA, lwd = .5 ) +
geom_sf(data = no_2016_no01 , aes( color = NUTS_ID ) , fill = NA, lwd = .5 ) +
scale_fill_gradientn(name = "2018 Population", colors = terrain.colors(10),
na.value = NA) +
geom_label_repel(data = no_2016_no01, aes(label = NUTS_ID, geometry = geometry , color = NUTS_ID )
, stat = "sf_coordinates", size = 3.5
, point.padding = 15
, fill = alpha(c("white"),0.9)) +
scale_color_manual( values = c( 'orange' , 'red' )) +
coord_sf(xlim = c(4023000,5130000-c), ylim = c(3879000,5411000-d))+
theme_minimal()+
theme(legend.position = "none"
, axis.text = element_blank( )
, axis.ticks = element_blank( )
, axis.title = element_blank( ))+
xlab("") + ylab("")+
labs(title = "2016 NUTS version")
gg_pop2 = ggplot() +
geom_raster(data = no_pop_df, aes(x = x, y = y, fill = JRC_1K_POP_2018)) +
geom_sf(data = no_2021_1, color = "#636363", fill = NA, lwd = .5 ) +
geom_sf(data = no_2021_1 %>% filter( NUTS_ID %in% c( 'NO08' )), color = "blueviolet", fill = NA, lwd = .5 ) +
scale_fill_gradientn(name = "2018 Population", colors = terrain.colors(10),
na.value = NA) +
geom_label_repel(data = no_2021_no08, aes(label = NUTS_ID, geometry = geometry )
, stat = "sf_coordinates", size = 3.5
, point.padding = 15
, color = "blueviolet"
, fill = alpha(c("white"),0.9)) +
coord_sf(xlim = c(4023000,5130000-c), ylim = c(3879000,5411000-d))+
theme_minimal()+
theme(legend.position = "none"
, axis.text = element_blank( )
, axis.ticks = element_blank( )
, axis.title = element_blank( ))+
xlab("") + ylab("")+
labs(title = "2021 NUTS version")
ggpubr::ggarrange( gg_pop , gg_pop2 )
```
## Conversion methods
The following subsections describe the method used to convert absolute and relative values between versions and levels.
### Conversion of absolute values between versions
In this example, we transform **absolute** values, the number of patent applications (`NR`) in Norway, from **version** 2016 to 2021, utilizing spatial interpolation based on the population distribution in 2018.
The conversion employs the `cross_walks` table, which includes population flow data (expressed in thousands) between two NUTS-2 regions from the source version to the target version. The function joins the variable of interest, `NR`, which varies across the departing NUTS-2 codes (`from_code`). The function initially calculates a **weight** (`w`) equal to the population flow's share of the total population in the departing region in version 2016 (`from_code`):
```{r, include = F}
pat_n2_nrmhab_12_no <- patents %>%
filter(nchar(geo) == 4) %>% # NUTS-2 values
filter(unit %in% c("NR", "P_MHAB")) %>%
filter(time == 2012) %>% # 2012
filter(str_detect(geo, "^NO")) %>% # Norway
dplyr::select(-"time") %>%
tidyr::pivot_wider(id_cols = c("geo"), names_from = "unit",
values_from = "values")
classification <- pat_n2_nrmhab_12_no %>%
nuts_classify(nuts_code = "geo")
conversion_m_long <- nuts_get_data(classification) %>%
filter(!is.na(from_version)) %>%
inner_join(filter(cross_walks, to_version == 2021),
by = c("from_code", "from_version")) %>%
mutate(pop18 = round(pop18 / 1000, 2)) %>%
mutate_at(vars(NR, P_MHAB, pop18), list(~as.integer(.))) %>%
dplyr::select(from_code, to_code, from_version , to_version, NR, P_MHAB, pop18)
convert_abs <- conversion_m_long %>%
group_by(from_code, from_version) %>%
mutate(w = round(pop18 / sum(pop18), 2)) %>%
ungroup()
# Illustrate calculation
from_code_vec = unique(convert_abs$from_code)
calcs = list()
for(i in seq_along(from_code_vec)){
print(i)
convert_abs_sub = convert_abs %>%
filter(from_code %in% from_code_vec[i])
calcs[[i]] = paste(convert_abs_sub$pop18, collapse = " + ")
}
calcs <- data.frame(from_code = from_code_vec, denom = unlist(calcs))
convert_abs_calc <- convert_abs %>%
inner_join(calcs) %>%
mutate(w = paste0(pop18, "/(", denom, ") = ", w)) %>%
dplyr::select(-denom, -P_MHAB)
```
```{r, echo=FALSE, message = FALSE, warning = FALSE}
kable(convert_abs_calc, "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
column_spec(1, background = 'azure') %>%
column_spec(2, background = 'aquamarine') %>%
column_spec(3, background = 'azure') %>%
column_spec(4, background = 'aquamarine') %>%
column_spec(7, background = '#FFBBFF') %>%
scroll_box(width = "100%")
```
To obtain the number of patent applications at the desired 2021 version, the function summarizes the data for the new NUTS regions in version 2021 (`to_code`) by taking the **population-weighted sum** of all flows.
```{r, echo=FALSE, message = FALSE, warning = FALSE}
to_code_vec = unique(convert_abs$to_code)
calcs = list()
for (i in seq_along(to_code_vec)) {
convert_abs_sub = convert_abs %>%
filter(to_code %in% to_code_vec[i])
calcs[[i]] = paste(paste0(convert_abs_sub$NR, " x ", convert_abs_sub$w),
collapse = " + ")
}
calcs <- data.frame(to_code = to_code_vec, NR = unlist(calcs))
converted_abs <- convert_abs %>%
group_by(to_code, to_version) %>%
summarise(NR_res = sum(NR * w)) %>%
ungroup()
converted_abs_calc <- inner_join(converted_abs, calcs) %>%
mutate(NR = paste0(NR, " = " , NR_res)) %>%
dplyr::select(-NR_res)
kable(converted_abs_calc, "html") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
column_spec(1, background = 'aquamarine') %>%
column_spec(2, background = 'aquamarine') %>%
column_spec(3, background = '#FFF0F5') %>%
scroll_box(width = "100%")
```
### Conversion of relative values between versions
To convert **relative** values, such as the number of patent applications per 1000 inhabitants, `nuts_convert_version()` departs again from the conversion table seen above. We focus on the variable `P_MHAB`, patent applications per one million inhabitants. The function summarizes these relative values by computing the **weighted average** with respect to 2018 population flows.
```{r, include = F}
convert_rel <- conversion_m_long %>%
group_by(to_code, to_version) %>%
summarise(P_MHAB_conv = round(sum(P_MHAB * pop18) / sum(pop18), 0)) %>%
ungroup()
# Illustrate calculation
to_code_vec = unique(convert_abs$to_code)
nums = list()