Skip to content

The second example

Manuel Reif edited this page Oct 26, 2013 · 5 revisions

As a second example we use again the austrian PIAAC public use file but we are considering just persons who are labeled as employed or unemployed and skip those who are out of labour force. We are doing this because we want to estimate a logistic regression to get an idea which variables increase the risk of being unemployed and which don't. So we have to build a subsample of the data set in a first step as shown in the code below. We are using this data.frame now to build another survey design with the svrepdesign() function.

d3 <- subset(d2, subset=C_D05 != "Out of the labour force")

d3$emp <- droplevels(d3$C_D05)
sd3 <- svrepdesign(variables=d3, repweights=d3[,nr.SPFWT_rep], weights=d3[,nr.SPFWT0], type="JK1")

After we built this object, we have already everything we need to start our estimation procedure. We use the svyPVglm() function to fit a logistic regression with two variables. That's is not a very sophisticated model (the purpose is here to demonstrate the routine), but feel free to estimate any model you like - we try to keep the output as short as possible.

Note that the notation for this function is slightly different to the notation we have seen before. We have to provide a formula statement. In this example the variable emp should be explained by GENDER_R and PVLIT... The notation PVLIT.. means that the variable starts with PVLIT and for the ellipses the numbers of the vector placeholder is imputed. So this function successively uses PVLIT1, PVLIT2, ... , PVLIT10 as predictor variables. This results in 10 different estimations of the model. Fortunately the svyPVglm() function neatly summarizes this model into one output.

res2 <- svyPVglm(formula= emp  ~ GENDER_R +  PVLIT.., design=sd3, placeholder=1:10, family=binomial())

res2

$coef
                     mean        se   t.value       Pr.t
(Intercept)    -1.1739159 0.5220458 -2.248684 0.02731835
GENDER_RFemale  0.1801614 0.1796719  1.002724 0.31905533
PVLIT1         -0.0074055 0.0019518 -3.794150 0.00028875

$mod.fit
 Working.2.logLR          p
        16.55812 0.00062525

Keep in mind which was the reference category for the employment variable (you can check this by typing levels(d3$emp) - R uses the first category as reference category as default).

To pick out one result from this logistic regression model: When our skills in Literacy are increasing, our chance of being unemployed is decreasing. This sounds reasonable. For example, if we compare two guys who differ about 30 points on the Literacy scale, the risk for the 'smarter' guy being unemployed is smaller by a factor of ~ 0.80 compared to the less 'smarter' guy.

exp(-0.0074055*30)
[1] 0.8007832

Here ends the second tutorial - keep exploring the other functions which are estimating for instance:

  • proficiency levels
  • t-Test
  • ...

Clone this wiki locally