# Combining variables using case distinctions

In [None]:
options(jupyter.rich_display=FALSE) # Create output as usual in R

The following makes use of the *memisc* package. You may need to install it
from [CRAN](https://cran.r-project.org/package=memisc) using the code
`install.packages("memisc")` if you want to run this on your computer. (The package is already installed on
the notebook container, however.)

In [1]:
library(memisc)

Loading required package: lattice

Loading required package: MASS


Attaching package: ‘memisc’


The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts


The following object is masked from ‘package:base’:

    as.array




The following code works with example data from the 2017 German Longitudinal
Election study: It code combines pre- and post-election variables in the to a single
party-preference variable for the first (candidate) vote and the second (list) vote.

In order to run this notebook successfully, you have to download from [GESIS](https://doi.org/10.4232/1.13236) and upload it to the virtual machine on which this notebook runs. To do this, 

1. pull down the "File" menu item and select "Open"
2. An overview of the folder that contains the notebook opens. 
3. The folder view has a button labelled "Upload". Use this to upload the file that you downloaded from the BES website. Its name should be `ZA6802_en_v3-0-1.sav`.

Note that the uploaded data will disappear, once you "Quit" the notebook (and the Jupyter instance).

In [2]:
gles2017.sav <- spss.system.file("ZA6802_en_v3-0-1.sav")
description(gles2017.sav[1:30])

File character set is 'UTF-8'.

Converting character set to the local 'utf-8'.

“10 variables have duplicated labels:
  vn2d, v10, vn141_i88, vn141_i08, vn150_i88, vn150_i08, vn158_i88,
  vn158_i08, vn163_i88, vn163_i08”



 study        'Study number (ZA-No.)'                                               
 version      'GESIS Archive Version'                                               
 doi          'Digital Object Identifier'                                           
 year         'Survey year'                                                         
 field        'Field period'                                                        
 glescomp     'GLES component'                                                      
 survey       'Survey/wave'                                                         
 survey1      'Survey/wave (dummy)'                                                 
 lfdn         'Serial number (Cumulation)'                                          
 vlfdn        'Serial number (Pre-election Cross Section)'                          
 nlfdn        'Serial number (Post-election Cross Section)'                         
 intnum       'Number of interviewer'                           

In [3]:
gles2017.vote <- subset(gles2017.sav,
                           select=c(
                               survey = survey1,
                               pre.turnout.int = v10,
                               post.turnout = n10,
                               pre.voteint.first = v11ab,
                               pre.voteint.second = v11bb,
                               post.vote.first = n11ab,
                               post.vote.second = n11bb,
                               pre.postvote.first = v12ab,
                               pre.postvote.second = v12bb
                      ))
codebook(gles2017.vote)


   survey 'Survey/wave (dummy)'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: -Inf - -1

   Values and labels         N Percent
                                      
   0   'Pre-election'     2179    50.8
   1   'Post-election'    2112    49.2


   pre.turnout.int 'Vote intention'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: -Inf - -1

   Values and labels                           N Valid Total
                                                            
   -98 M 'Don't know'                          7         0.2
   -97 M 'Not applicable'                     62         1.4
   -94 M 'Not in sampling frame'            2112        49.2
     1   'Certain to vote'                  1671  79.2  38.9
     2   'Likely to vote'                    151   7.2   3.5
     3   'Might vote'   

In [4]:
gles2017.vote <- within(gles2017.vote,{
  vote.first <- cases(
              survey == 0 & pre.turnout.int == 6 -> pre.postvote.first,
              survey == 0 & pre.turnout.int %in% 4:5 -> -85,
              survey == 0 & pre.turnout.int %in% 1:3 -> pre.voteint.first,
              survey == 1 & post.turnout ==1 -> post.vote.first,
              survey == 1 & post.turnout ==2 -> -85,
              TRUE -> -97
            )
  vote.second <- cases(
              survey == 0 & pre.turnout.int == 6 -> pre.postvote.second,
              survey == 0 & pre.turnout.int %in% 4:5 -> -85,
              survey == 0 & pre.turnout.int %in% 1:3 -> pre.voteint.second,
              survey == 1 & post.turnout ==1 -> post.vote.second,
              survey == 1 & post.turnout ==2 -> -85,
              TRUE -> -97
  )
  vote.first <- as.item(vote.first, labels = labels(pre.postvote.first))
  vote.second <- as.item(vote.second, labels = labels(pre.postvote.second))
  valid.range(vote.first) <- valid.range(vote.second) <- c(1,900)
})

“conditions are not mutually exclusive”
“conditions are not mutually exclusive”


In [5]:
codebook(gles2017.vote[c("vote.first","vote.second")])


   vote.first

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Valid range: 1 - 900

   Values and labels                    N Valid Total
                                                     
   -99 M 'No answer'                  187         4.4
   -98 M 'Don't know'                 263         6.1
   -97 M 'Not applicable'             150         3.5
   -85 M 'No turn out'                323         7.5
   -83 M 'Invalid vote'                25         0.6
     1   'CDU/CSU'                   1318  39.4  30.7
     2   'CDU'                          0   0.0   0.0
     3   'CSU'                          0   0.0   0.0
     4   'SPD'                        826  24.7  19.2
     5   'FDP'                        230   6.9   5.4
     6   'GRUENE'                     309   9.2   7.2
     7   'DIE LINKE'                  320   9.6   7.5
   126   'BP'                           3   0.1   0.1
   149   'GRAUE'   