Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when the amount of factor levels are unequal #5

Closed
Taarnborg opened this issue Jul 25, 2013 · 8 comments
Closed

Error when the amount of factor levels are unequal #5

Taarnborg opened this issue Jul 25, 2013 · 8 comments

Comments

@Taarnborg
Copy link

Hi I get an error when creating a plot item using the likert function. The error occurs because the factor levels in each variable differs, This is however not a mistake but just a result of small survey data. And in theory it could happen with larger data sets as well. (see example below)

data  <- data.frame(
  q1 = ordered(sample(c("Strongly Agree", "Agree", "Disagree", "Strongly disagree"),500, replace=T)),
  q2 = ordered(sample(c("Strongly Agree", "Disagree", "Strongly disagree"),500, replace=T)),
  q3 = ordered(sample(c("Strongly Agree", "Agree", "Disagree", "Strongly disagree"),500, replace=T)),
  q4 = ordered(sample(c("Strongly Agree", "Agree", "Strongly disagree"),500, replace=T)))

# Doesn't work
item1          <- likert(data)

I would like to suggest adding to your function a way to deal with this. Below is a code that provides the "missing levels". However its not ready to be implemented with your code yet. But maybe you'll have a better take on that than me?

# Append missing levels
n.levels   <- sapply(data,nlevels)
max.levels <- levels(data[,which.max(n.levels)])

for (i in seq_along(data)) {
  mis.lev = which(!max.levels %in% levels(data[,i]))
  levels(data[,i]) = append(levels(data[,i]),max.levels[mis.lev])
}

# Now it plots, but values for q4 and q2 is wrong. 
item1          <- likert(data)
plot(item1)
@jbryer
Copy link
Owner

jbryer commented Jul 26, 2013

This is not a bug but we did add a check to verify the number of levels for all items are the same and providing a better error message.

@jbryer jbryer closed this as completed Jul 26, 2013
@Taarnborg
Copy link
Author

I understand it's not a bug and so it's just a suggestion which you can take or leave as you please.

I also apologize if It was wrong to raise as an issue when not being a bug. I'm new to Github and so I'm not aware of the "meta gaming" in here.

If I however am to use the functions in the package I have to add something like the one suggested above. Below I have inserted the code I use as a supplement to yours and a picture of the plot I get from it

data  <- data.frame(
  q1 = sample(c("Strongly Agree", "Agree", "Disagree", "Strongly Disagree"),500, replace=T),
  q2 = sample(c("Agree", "Disagree", "Strongly Disagree"),500, replace=T),
  q3 = sample(c("Strongly Agree", "Agree", "Disagree", "Strongly Disagree"),500, replace=T),
  q4 = sample(c("Strongly Agree", "Agree", "Strongly Disagree"),500, replace=T))

desired.order <- c("Strongly Disagree","Disagree", "Agree", "Strongly Agree")

# Append missing levels
n.levels   <- sapply(data,nlevels)
max.levels <- levels(data[,which.max(n.levels)])

for (i in seq_along(data)) {
  mis.lev = which(!max.levels %in% levels(data[,i]))
  levels(data[,i]) = append(levels(data[,i]),max.levels[mis.lev])
}

# Order levels
require(Epi)

for (i in seq_along(data)) {
  data[,i] = Relevel(data[,i],desired.order)   # desired.order must be specified beforehand
}

# Now it plots. 
item1          <- likert(data)
plot(item1)

image

@jbryer
Copy link
Owner

jbryer commented Jul 30, 2013

No worries on using the bug system. Seems like a fine way to share information. You are not the first to have this issue. I have checked in a demo (can see it here too: https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R) or type demo('UnusedLevels') to run from within R) that shows how to handle columns with unused levels.

With your specific example, the following would simplify what you have:

desired.order <- c("Strongly Disagree","Disagree", "Agree", "Strongly Agree")
data  <- data.frame(
      q1 = factor(sample(c("Strongly Agree", "Agree", "Disagree", "Strongly Disagree"),500, replace=T), levels=desired.order),
      q2 = factor(sample(c("Agree", "Disagree", "Strongly Disagree"),500, replace=T), levels=desired.order),
      q3 = factor(sample(c("Strongly Agree", "Agree", "Disagree", "Strongly Disagree"),500, replace=T), levels=desired.order),
      q4 = factor(sample(c("Strongly Agree", "Agree", "Strongly Disagree"),500, replace=T), levels=desired.order))

This will work now:

plot(likert(data))

Side note, you can also add ordered=TRUE to the factor call to be even more correct, but internally we consider the factor ordered even if it isn't by using levels(myitem).

@jbryer
Copy link
Owner

jbryer commented Jul 30, 2013

One more thing... have a look at the recode and reverse.levels functions in the package. They may help too.

@Taarnborg
Copy link
Author

Thanks a lot, the demo('UnusedLevels') worked rly well for me. I have two very small enhancement suggestions to make in regards to the plot;

  1. Make it possible to sort by positiv, negativ or item (i.e. not sort)
  2. If the axis labels are long insert line breaks.

below is a chunk of code i have used earlier to insert line breaks in levels before plotting. it beats strwrap because it don't mess up the encoding. It functions as follows: After 20 characters, find first whitespace and insert a linebreak. repeat until end

levels(df$v1)  <- gsub('(.{20})\\s+', '\\1\n\\2',levels(df$v1))

@jbryer
Copy link
Owner

jbryer commented Aug 1, 2013

I'm glad the demo helped. Regarding your suggestions, both features are
there. There are a lot of parameters on the likert.bar.plot function (the
?plot.likert should point to ?likert.bar.plot) so I understand it easy to
miss. I believe the ordered=FALSE will achieve your first point. The wrap
and wrap.grouping will wrap long labels for items and group names,
respectively. The default is 50 or 100 so may be too long for your figure.

On Thu, Aug 1, 2013 at 7:42 AM, Taarnborg notifications@github.com wrote:

Thanks a lot, the `demo('UnusedLevels') worked rly well for me. I have
two very small enhancement suggestions to make in regards to the plot;

  1. Make it possible to sort by positiv, negativ or item (i.e. not sort)
  2. If the axis labels are long insert line breaks.

below is a chunk of code i have used earlier to insert line breaks in
levels before plotting. it beats strwrap because it don't mess up the
encoding. It functions as follows: After 20 characters, find first
whitespace and insert a linebreak. repeat until end

levels(df$v1) <- gsub('(.{20})\s+', '\1\n\2',levels(df$v1))


Reply to this email directly or view it on GitHubhttps://github.com//issues/5#issuecomment-21930809
.

@jcpsantiago
Copy link

I've been wrapping my head around this for quite some hours now so i give up.

My data.frame:

head(a)

# A tibble: 6 × 21
                        `1`                    `2`                       `3`                       `4`                    `5`                       `6`
                     <fctr>                 <fctr>                    <fctr>                    <fctr>                 <fctr>                    <fctr>
1       Stimme ein wenig zu     Stimme ziemlich zu    Stimme einigermaßen zu    Stimme einigermaßen zu     Stimme ziemlich zu        Stimme ziemlich zu
2       Stimme ein wenig zu    Stimme ein wenig zu Stimme überhaupt nicht zu Stimme überhaupt nicht zu Stimme einigermaßen zu       Stimme ein wenig zu
3 Stimme überhaupt nicht zu    Stimme ein wenig zu Stimme überhaupt nicht zu Stimme überhaupt nicht zu Stimme einigermaßen zu Stimme überhaupt nicht zu
4 Stimme überhaupt nicht zu    Stimme ein wenig zu Stimme überhaupt nicht zu Stimme überhaupt nicht zu    Stimme ein wenig zu Stimme überhaupt nicht zu
5   Stimme voll und ganz zu     Stimme ziemlich zu       Stimme ein wenig zu Stimme überhaupt nicht zu     Stimme ziemlich zu Stimme überhaupt nicht zu
6        Stimme ziemlich zu Stimme einigermaßen zu       Stimme ein wenig zu Stimme überhaupt nicht zu    Stimme ein wenig zu Stimme überhaupt nicht zu
# ... with 15 more variables: `7` <fctr>, `8` <fctr>, `9` <fctr>, `10` <fctr>, `11` <fctr>, `12` <fctr>, `13` <fctr>, `14` <fctr>, `15` <fctr>, `16` <fctr>,
#   `17` <fctr>, `18` <fctr>, `19` <fctr>, `20` <fctr>, `21` <fctr>

str(a)

Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	44 obs. of  21 variables:
 $ 1 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 2 1 1 5 4 2 3 2 3 ...
 $ 2 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 4 2 2 2 4 3 4 3 3 3 ...
 $ 3 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 1 1 1 2 2 1 2 1 2 ...
 $ 4 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 1 1 1 1 1 1 2 1 1 ...
 $ 5 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 4 3 3 2 4 2 3 4 3 4 ...
 $ 6 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 4 2 1 1 1 1 3 4 2 4 ...
 $ 7 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 2 2 1 4 3 5 4 3 3 ...
 $ 8 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 1 1 2 1 1 2 1 1 ...
 $ 9 : Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 3 2 2 5 5 5 5 4 4 ...
 $ 10: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 5 2 1 1 5 5 4 3 2 3 ...
 $ 11: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 2 1 1 1 1 4 4 2 3 ...
 $ 12: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 2 1 2 4 3 1 3 3 ...
 $ 13: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 2 1 1 1 1 1 2 2 2 ...
 $ 14: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 2 1 1 3 2 3 5 4 4 ...
 $ 15: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 3 3 2 1 4 4 5 5 3 4 ...
 $ 16: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 1 1 3 4 2 2 1 2 ...
 $ 17: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 1 1 1 1 4 3 3 3 ...
 $ 18: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 1 1 3 3 4 4 2 3 ...
 $ 19: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 1 1 1 4 3 3 3 1 2 ...
 $ 20: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 1 1 2 1 4 5 5 5 3 3 ...
 $ 21: Factor w/ 5 levels "Stimme überhaupt nicht zu",..: 2 2 1 1 5 3 2 3 3 3 ...

going through the examples in the demo nothing says I don't have the same number of levels, so I don't understand why the likert command always complains about different levels.

Previously this dataframe looked like this:


# A tibble: 6 × 9
  subjectNumber expDay      bmi treatment tones    hour          realHour   item                     value
          <dbl>  <chr>    <dbl>    <fctr> <dbl>   <dbl>             <chr> <fctr>                    <fctr>
1             1     N2 22.53086   Control     0 0.34375 0.338194444444444      1       Stimme ein wenig zu
2             2     N1 22.53086   Control     0 0.34375             10:59      1 Stimme überhaupt nicht zu
3             3     N1 21.06674   Control     0 0.34375 0.343055555555555      1   Stimme voll und ganz zu
4             4     N1 21.53491   Control     0 0.34375             08:04      1       Stimme ein wenig zu
5             5     N2 19.16735   Control     0 0.34375              8:00      1       Stimme ein wenig zu
6             6     N1 24.85837   Control     0 0.34375              7:59      1       Stimme ein wenig zu

and I used dplyr to modify it into the likert-friendly format above:

a <- PFS %>%
  arrange(subjectNumber) %>%
  select(c(subjectNumber, treatment, item, value)) %>%
  spread(item, value) %>%
  select(-c(subjectNumber, treatment))

As soon as the data was imported into R, the "value" variable was turned into a factor with 5 levels.

@jbryer
Copy link
Owner

jbryer commented Jan 4, 2017

Couple of notes:

  • Yes, each likert call should only contain factors of the same type in both number of levels and level labels. Regardless if you use this package or not it does not make sense create a bar chart (or any other plot type in fact) of differing factor structures.

  • The likert function does some data quality checks, not sure why you are complaining about this. There is a demo that demonstrates this functionality: https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R

  • The tibble package is relatively new and I have not tested it with this package. My advise is to use a base data.frame.

  • Not sure how you read in your data, but look for stringAsFactors parameters. My advise, set it to FALSE and convert your qualitative variables to factors manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants