Mismatches between significance test result of R and SPSS #100

khanhhtt · 2022-11-24T06:46:02Z

Thank you for the great package. This thread is just for asking question than reporting an issue.
I have a couple of questions regarding the mismatches between significance test output of R and SPSS that need your help as below:

1. Rounding: it appears that when the cell percentage have exactly number 5 behind the decimal - e.g 12.5, then it is rounded half down to 12 instead of round half up to 13 in the output of significance test. The rounding numbers still work well when we don't perform the test.
This is R-script I used:

# ==============================================================================
# Required packages
# ==============================================================================
library(haven)
library(expss)

# ==============================================================================
# Required functions
# ==============================================================================
# Source: https://github.com/gdemin/expss/issues/28
empty_to_zero = function(tbl, value_to_add = 0, digits = get_expss_digits()){
  # for numerics
  if_na(tbl) = value_to_add
  # for characters after significance testing
  for(i in seq_along(tbl)[-1]){
    if(is.character(tbl[[i]])){
      empty = grepl("^\\s*$", tbl[[i]])
      max_padding = max(nchar(gsub("^(.+?)(\\s*)$", "\\2", tbl[[i]], perl = TRUE)))
      replacement = paste(c(format(value_to_add, nsmall = digits), rep(" ", max_padding)), collapse = "")
      tbl[[i]][empty] = sub("\\s?", replacement, tbl[[i]][empty], perl = TRUE)
    }
  }
  tbl
}

add_percent = function(x, digits = get_expss_digits(), excluded_rows = "count", ...){
  nas = is.na(x)
  x[nas] = ""
  
  cols_idx = 2:dim(x)[2]
  
  for (col in cols_idx) {
    for (row in 1:dim(x)[1]){
      if (!grepl(excluded_rows, x[row, 1], perl = TRUE)){
        if (suppressWarnings(is.na(as.numeric(as.character(x[row,col]))))) {
          x[row,col] = sub(" ", "% ", trimws(x[row,col]))
        }
        else {
          x[row,col] = paste0(trimws(x[row,col]), "%")
        }
      }  
    }
  }
  x <- x[!grepl("Std. dev.", x$row_labels),]
  x <- x[!grepl("Unw. valid N", x$row_labels),]
  x
}

# ==============================================================================
# Report
# ==============================================================================
### Get data
df <- haven::read_sav("01. Data - Cleaned.sav") 

### Results sigtest rounded to 0 decimal

tbl_1 <- df %>% tab_cols(total(), q3_BG) %>%
  tab_cells(q4) %>%
  tab_stat_cpct(total_label = c("Total"), 
                total_statistic = c("u_cases"),
                total_row_position = "above") %>%
  tab_last_sig_cpct(digits = 0,
                    subtable_marks = "greater",
                    sig_labels = LETTERS) %>%
  tab_pivot() %>%
  empty_to_zero(digits = 0) %>%
  add_percent(excluded_rows = "#")
tbl_1

### Results without sigtest

tbl_1 <- df %>% tab_cols(total(), q3_BG) %>%
  tab_cells(q4) %>%
  tab_stat_cpct(total_label = c("Total"), 
                total_statistic = c("u_cases"),
                total_row_position = "above") %>%
  # tab_last_sig_cpct(digits = 10,
  #                   subtable_marks = "greater",
  #                   sig_labels = LETTERS) %>%
  tab_pivot() %>%
  empty_to_zero(digits = 0) %>%
  add_percent(excluded_rows = "#")
tbl_1

And here is the SPSS syntax..

GET FILE '01. Data - Cleaned.sav'.

COMPUTE Totalt = 1.
EXECUTE.

* Sigtest.
CTABLES
    /VLABELS VARIABLES=q4 DISPLAY=LABEL
    /VLABELS VARIABLES=q3_BG DISPLAY=LABEL
    /TABLE q4 [C][COLPCT.COUNT PCT40.0, TOTALS [COUNT F40.0]] by (Totalt + q3_BG) [C]
    /SLABELS POSTION=ROW VISIBLE = NO
    /CATEGORIES VARIABLES=q4 EMPTY=INCLUDE TOTAL=YES POSITION=BEFORE
    /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

The comparison results of significance test will then

The results without significance test are still fine

2. There are some pair comparison that marks as significant in R but not the same case in SPSS.
Especially, when a proportion is 1, the significance test is also performed in R.

However, the document of SPSS Statistic Algorithms 22 - page 264 states that the test will not be performed in this case

I think I have used the R function in an inefficient way so that it leads to the mismatched.
Could you please help take a look and give me some advise on this matter?
In the attachment, there are data file, R script, SPSS script, and comparison results between R and SPSS for your reference.
The first sheet of the Excel file is the significance results and the second sheet is the results without the test.

Thank you in advance!

Reproducible examples.zip

The text was updated successfully, but these errors were encountered:

gdemin · 2022-11-25T13:13:52Z

Hi!
Thank you for the detailed report.

expss uses built-in R round function. It's behaviour is documented:

Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2.

Personally, I wish it would be left as is to be consistent with other parts of R. But if you consider it is a serious issue I can change this behaviour.

I will investigate it - perhaps it is a bug. However, it is rather strange that SPSS ignores 0% and 100% values. For sufficiently large bases R function which I use internally calculates significances for this edge case without any issues:

prop.test(x = c(9, 19), n = c(85, 19)) # x is number of successes, n - number of trials

#	2-sample test for equality of proportions with continuity correction
#
# data:  c(9, 19) out of c(85, 19)
# X-squared = 58.636, df = 1, p-value = 1.897e-14
# alternative hypothesis: two.sided
# 95 percent confidence interval:
#  -0.9917263 -0.7965090
# sample estimates:
#    prop 1    prop 2 
# 0.1058824 1.0000000

khanhhtt · 2022-11-26T07:51:52Z

Hi @gdemin,

Thank you for quick and informative response.

This is also a surprise for me when I get used to R that the built-in R round function has a different behaviour than what I usually practice. I know that it is great if all packages of R could be consistency, and that behaviour of the round function might be useful for people in other fields. But it's also great if there could be an option so that I can choose the behaviour of rounding that is suitable for my purpose. And I think it will ease any concerns when people use SPSS to compare the result.
Looking forward to hearing more news from you soon on this.

khanhhtt · 2022-12-21T06:42:13Z

Hi @gdemin,

I hope you are doing well 😊
I just would like to know if there is any news on your side.
Do you have the plan regarding the adjustment of rounding in expss? Or if there is not, could you please help suggest some work around that could solve my concern?

Many thanks!

gdemin · 2022-12-25T19:22:21Z

Hi @khanhhtt

I will add option about rounding in the next version. But I cant promise anything about when it will be ready.

As for workaround, you can set expss_digits(3) and then round numbers with code below:

library(expss)
round2 = function(x, digits = 0) {
    posneg = sign(x)
    z = abs(x)*10^digits
    z = z + 0.5 + sqrt(.Machine$double.eps)
    z = trunc(z)
    z = z/10^digits
    z*posneg
}

round_table_values = function(tbl, digits = 1){
    col_index = seq_along(tbl)[-1]
    cell_pattern = "^(.*?)([-0-9.]+)(.*?)$"
    for(i in col_index){
        curr = tbl[[i]]
        if(is.character(curr)){
           numeric_values = suppressWarnings(as.numeric(gsub(cell_pattern, "\\2", curr)))
           numeric_values = round2(numeric_values, digits = digits)
           not_na_index = which(!is.na(numeric_values))
           curr[not_na_index] = sapply(not_na_index, 
                                                 function(cell_index)
                                                     gsub(cell_pattern, 
                                                          paste0("\\1", numeric_values[cell_index], "\\3"), 
                                                          curr[cell_index])
                                                 )
        } else {
            curr = round2(curr, digits = digits)
        }
        tbl[[i]] = curr
        
    }
    tbl
}

data(mtcars)
expss_digits(3)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (lb/1000)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

mtcars_table = cross_cpct(mtcars, 
                          list(cyl, gear),
                          list(total(), vs, am)
)

res = significance_cpct(mtcars_table)

round_table_values(res)

khanhhtt · 2022-12-27T07:08:30Z

Hi @gdemin,

That's great! Thank you so much for spending time on Christmas day to give me the workaround solution.
This is all I need for now 😊

gdemin · 2023-07-16T21:51:17Z

Fixed in version 0.11.6
Rounding is set with expss_round_half_to_even(FALSE).
For SPSS significance there is an argument as_spss in significance_cpct and others.

gdemin added a commit that referenced this issue Jun 24, 2023

Add option for rounding type Issue #100

480a18f

gdemin added a commit that referenced this issue Jun 24, 2023

Add function for rounding half to largest ("math rounding"). Issue #100

d884661

gdemin added a commit that referenced this issue Jun 25, 2023

Update documenation for expss.options. #100

69efacf

gdemin added a commit that referenced this issue Jul 2, 2023

Add argument 'as_spss' to 'significance_cpct' (issue #100)

9af112b

gdemin closed this as completed Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatches between significance test result of R and SPSS #100

Mismatches between significance test result of R and SPSS #100

khanhhtt commented Nov 24, 2022

gdemin commented Nov 25, 2022

khanhhtt commented Nov 26, 2022

khanhhtt commented Dec 21, 2022

gdemin commented Dec 25, 2022

khanhhtt commented Dec 27, 2022

gdemin commented Jul 16, 2023

Mismatches between significance test result of R and SPSS #100

Mismatches between significance test result of R and SPSS #100

Comments

khanhhtt commented Nov 24, 2022

gdemin commented Nov 25, 2022

khanhhtt commented Nov 26, 2022

khanhhtt commented Dec 21, 2022

gdemin commented Dec 25, 2022

khanhhtt commented Dec 27, 2022

gdemin commented Jul 16, 2023