This repository has been archived by the owner on Apr 24, 2023. It is now read-only.
/
prepare.Rmd
160 lines (113 loc) · 5.78 KB
/
prepare.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
title: "Preparing data (reporttoolDT)"
author: "Kristian D. Olsen"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Preparing data}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
## Foreword
Before we can generate a report, we need to "prepare" the data with the necessary information. This vignette gets into some more detail on the hidden fields in the `Survey` class. In this example, we'll use the same dataset as was shown in the "introduction" vignette and complete a Survey for PLS-PM modelling.
#### 1. Read the data (seamless)
```{r}
require(reporttoolDT)
sav <- read_data(system.file("extdata", "raw_data.sav", package = "reporttoolDT"))
class(sav)
```
#### 2. Convert to `Survey`.
You can convert any `data.frame` to a `Survey` using `survey()`, or you can use one of the following functions which also coerces the input to a specific type of `data.frame`:
- `survey_df()`: A regular R `data.frame`.
- `survey_dt()` A `data.table`. Read more [here](https://github.com/Rdatatable/data.table/wiki).
- `survey_tbl()`, requires [dplyr](https://github.com/hadley/dplyr).
For this example, we can use a regular `data.frame`:
```{r}
srv <- survey_df(sav)
class(srv)
```
#### 3. Labels
When the data is read from SPSS, the labels are collected from the file. We can check the labels in several ways, but `model()` works best to get an overview:
```{r}
model(srv)
```
The output from model shows us the column number, name, type in parantheses, and their labels. Let's say we wanted to change the labels to say "Loyalty" instead of just "loyal". We can do the following:
```{r}
srv <- set_label(srv, q10 = "Loyalty 1", q15b = "Loyalty 2")
# To see that the labels have changed:
get_label(srv, c("q10", "q15b"))
```
If we wanted to change the label for several variables, we could also supply a list:
```{r}
new_labels <- list(q10 = "Loyal 1", q15b = "Loyal 2")
srv <- set_label(srv, .list = new_labels)
# Same result as above.
```
#### 4. Association
We also need to specify which variables are associated with which latent construct, we can specify them as follows:
```{r}
srv <- set_association(srv, image = c("q4a", "q4b"))
model(srv)
```
As you can see from the output, **q4a** and **q4b** have a star next to them which indicates that the variables have an association. `set_association()` can also look for common latent associations, based on the name of the variables:
```{r}
srv <- set_association(srv, .common = TRUE)
# Run model(srv) to see the result
```
Since all of our variables follow the naming convention, all associations have been identified.
#### 5. Config
We also need to set the config for the survey. Most importantly, we need to set the cutoff to use for valid observations:
```{r}
srv <- set_config(srv, name = "Example", segment = "B2C", cutoff = .3)
```
#### 6. Marketshares
In order to be able to weight variables, we also need to specify the marketshare for each company in our study. After the **mainentity** association is set, we can do this using `set_marketshare()`:
```{r}
srv <- set_marketshare(srv, CompanyA = .3, CompanyB = .5, CompanyC = .2)
entities(srv)
```
Above, I have used the `entities()` function which gives you an overview of the entities in the data, and their marketshare if it is set. For the column "Valid", the values are NA because we have not calculated the percentage of missing values on variables associated with latents.
#### 7. Latents (mean)
At this point we have two choices, calculate latent scores as a mean to do a "topline", or prepare it for PLS modelling using the PLS wizard, using the two functions `latents_pls()` and `latents_mean()`. The former does the following:
- Adds `EM` variables to the data (based on associations).
- Calculates the missing percentage (`percent_missing`).
- It adds a `coderesp` variable to the data (1 to N).
- Also, it calculates latents (mean) for each respondent.
```{r}
srv_mean <- latents_mean(srv)
# model(srv_mean)
```
After running `latents_mean()` we see that the labels are empty. To fix this we could have `set_translation()` before adding the latents, or `set_translation()` and use the `.auto` argument for `set_label()`:
```{r}
srv_mean <- set_translation(srv_mean, .language = "english")
srv_mean <- set_label(srv_mean, .auto = TRUE)
tail(model(srv_mean))
```
With missing percentage calculated, we can check `entities()` again to see the number of valid observations:
```{r}
entities(srv_mean)
```
#### 8. PLS-wizard
Let's instead run `latents_pls()` to generate input for the PLS-wizard:
```{r}
srv <- latents_pls(srv)
tail(model(srv))
```
Here, the EM variables and latents are not included - only `percent_missing` and `coderesp`. This (in addition to the associations we have set) is enough to create the input files for the PLS-wizard. To create the files, simply run:
```{r}
# write_survey(srv, file = getwd())
```
The second argument `file` is the path to where you would like to store the `Survey`. `write_survey()` will always write a separate `...(Survey).Rdata` file which contains all the hidden information for the Survey, which you can get back again by using `read_survey()`. If `file` is a directory, it will create a new directory **Data** and store the SPSS file, as well as **Input** which contains all the input files for the PLS-wizard.
## More information
[Introduction](https://github.com/itsdalmo/reporttoolDT/blob/master/vignettes/introduction.Rmd):
```{r, eval = FALSE}
vignette("introduction", package = "reporttoolDT")
```
[Survey-class](https://github.com/itsdalmo/reporttoolDT/blob/master/vignettes/survey.Rmd):
```{r, eval = FALSE}
vignette("survey", package = "reporttoolDT")
```
[Other functions](https://github.com/itsdalmo/reporttoolDT/blob/master/vignettes/other.Rmd):
```{r, eval = FALSE}
vignette("other", package = "reporttoolDT")
```