significance testing results different than SPSS #32

tracyyuqichen · 2021-03-22T06:56:12Z

In comparing results from SPSS and Quantipy, we discovered that using the dataset.crosstab() function gives different significance results for comparing categorical distribution than SPSS. I see in the sandbox.py document that chisq is used for calculating significance, which should give us the same results. Could this be because Quantipy does not recognize multivariates that are ordinal (Likert scale)?

geirfreysson · 2021-03-22T09:46:25Z

Quantipy should most certainly work with Likert scale-type variables. The significance code has been tested quite a bit using Unicom/Dimensions and the default code replicates the Dimensions settings. The parameters can be tweaked quite a bit which can affect the results - could you share the exact parameters you use when you run the sig-diff in SPSS?

tracyyuqichen · 2021-03-22T10:48:10Z

Thanks for responding so quickly! Someone else actually created the references in SPSS as I am not very familiar with SPSS myself, what parameters would they be looking for?

geirfreysson · 2021-03-22T15:05:52Z

I'm not familiar with sig-diff testing in SPSS. All I can do is get the chi-square results in the Analyze>Crosstabs menu and that doesn't test each category, it just tests the overall number (I think).

I've added an SPSS file to make testing/comparison easier with SPSS: tests/Example Data (A).sav

This is the exact same data we use to test Quantipy.

Ask your colleague to recreate what they are seeing with that file and post here so maybe we can figure out what parameters SPSS uses for their tests and then maybe we can see if the SPSS results are replicated with Quantipy.

tracyyuqichen · 2021-03-25T10:30:36Z

Ok so I think I have some idea of what might not be working. I recoded locality in both Python and SPSS, and I think the issue lies in the sig calculation after recoding? Here's the recode in SPSS for locality -> Region

and then I run the syntax

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /CRITERIA CILEVEL=95
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

and get

replicating the process in python:

meta['columns']['Region']={
    'type': 'delimited set',
    'text': {'en-GB': 'Locality Un-duped'},
    'values': [
        {'value': 1, 'text': {'en-GB': '1'}},
        {'value': 2, 'text': {'en-GB': '2'}},
        {'value': 3, 'text': {'en-GB': '3'}}
    ]
}
# recode
data['Region']=recode(
    meta, data,
    target='Region',
    mapper={
        1: {'locality': 1},
        2: {'locality': frange('2-3')},
        3: {'locality': frange('4-5')}
    },
    append=False
)
ds.crosstab('q2b','Region',sig_level=0.05)

and I get

Although the difference is slight here, when I apply it to multiple undup's to 20+ x's, I get a lot more significant Test-IDs than I do in SPSS. It seems that you do have Bonferroni correction in place somewhere in sandbox.py, so I'm not sure why the discrepancies are happening.

geirfreysson · 2021-03-25T17:11:31Z

Thanks for the very detailed report. I don't think this has anything to do with the recode itself, but the Bonferroni correction, which isn't implemented in Quantipy.

If I run your SPSS script without the Bonferroni correction, the result matches Quantipy.

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CRITERIA CILEVEL=95
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

I didn't write the sig-testing code myself, but I imagine the Bonferroni correction would happen somewhere here:
https://github.com/Quantipy/quantipy3/blob/master/quantipy/core/quantify/engine.py#L1977

and could be done with statsmodels
https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

I'll probably close this ticket for now but create a new one called "implement bonferroni corrections in sig-tests" or something like that.

tracyyuqichen · 2021-03-26T02:37:41Z

Great, that sounds great! thank you so much for pointing me to the engine.py doc

tracyyuqichen · 2021-04-08T02:41:05Z

Hi Geir, two new issues regarding the sig results:

In the following screenshot you'll see that group A has a count of zero, but the sig test returns group A as one of the groups that has a significant difference compared to other groups, which should not be the case
I tried applying bonferroni correction in the sig_level argument in the crosstab() function, since what bonferroni correction does is that it changes the alpha level at which the null hypothesis is rejected; the new p-value is 0.00625, which should work as far as I can see, but the crosstab() function just returns the crosstab without the sig view, and I'm genuinely stumped. Do you know whom I can consult with on this specific issue? Thanks!

geirfreysson · 2021-04-08T09:01:29Z

The A means that columns B,D and E are significantly higher than column A. This doesn't have to be incorrect, if you do a political poll and 0 people say they're voting for a fringe party and 100 people are voting for a mainstream party, the mainstream party has a significantly higher following than the one with 0 counts.
Can you send me a code example of what exactly you are doing?

tracyyuqichen · 2021-04-08T09:56:37Z

sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

geirfreysson · 2021-04-08T11:15:27Z

Thanks for that.

I can see now that there is a bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

The result doesn't look as nice, but you can use the following to get the sig-test results

x = 'q5_3'
y = 'gender'
stack = qp.Stack(name='sig', 
                 add_data={'sig': {'meta': dataset.meta(), 
                                   'data': dataset.data()}})
stack.add_link(data_keys=['sig'], 
               x=x, 
               y=y, 
               views=['c%', 'counts'])
link = stack['sig']['no_filter'][x][y]
test = qp.Test(link, 'x|f|:|||counts')

test = test.set_params(level=sig_level)
df = test.run()

	Question	gender
	Values	1	2
Question	Values
q5_3	1	NaN	NaN
	2	NaN	NaN
	3	NaN	NaN
	4	NaN	NaN
	5	NaN	NaN
	97	NaN	NaN
	98	NaN	NaN

tracyyuqichen · 2021-04-09T08:19:50Z

Fantastic, thank you for your help! Should I open up a new issue for this bug?

geirfreysson · 2021-04-09T10:25:54Z

Yes please, that would be great!

tracyyuqichen · 2021-04-20T03:01:40Z

hi @geirfreysson, sorry to bother again, but is there a book or reference I can check out for the methodologies behind the sig test? Or at least a name of the methodology? Also, is there a way I can specify that I want to perform a chi sq test instead of the default t test? Thank you!

geirfreysson · 2021-04-23T09:21:27Z

Hi @tc423, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand.

The SPSS documentation is here: COMPARETEST.

You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here).

tracyyuqichen · 2021-04-27T02:54:30Z

I see, and does the sig-test take into consideration if the question is a multiple response? I know that there is data processing for data type == delimited sets, but I'm wondering if it currently affects the sig-test logic?

…

On Fri, Apr 23, 2021 at 5:21 PM Geir Freysson ***@***.***> wrote: Hi @tc423 <https://github.com/tc423>, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand. The SPSS documentation is here: COMPARETEST <https://www.ibm.com/docs/en/spss-statistics/24.0.0?topic=stcc-pairwise-comparisons-proportions-means-comparetest-subcommand-ctables-command> . You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here <https://github.com/Quantipy/quantipy3/blob/8a9cc67d10d08e279143333afddb18b6e789be85/quantipy/sandbox/sandbox.py#L6389> ). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACDDLR4AVUYAKELXW63UHD3TKE32XANCNFSM4ZSQOVEA> .

tracyyuqichen · 2021-04-27T06:05:25Z

I see that there's a weight engine, and I assume this is to help adjust the data overlap issue in delimited sets data type, and gives the weight of each individual response? On a somewhat related note, is there a way I can see the p-values between each group?

geirfreysson · 2021-04-28T11:37:25Z

The sig-tests deal correctly with multiple response variables.
The weight engine is the library that runs the RIM weighting algorithm. The sig-tests can correct for data overlap.
You can't display them easily in the output, you'd have to go into the engine library itself and output the values there.

Hope that helps!

tracyyuqichen · 2021-04-29T07:59:55Z

I see, I also just realized that all of this was commented in the engine.py file that I am now trying to edit, so my apologies. Looks like we're back to the problem of not wanting to compare A to B if B is zero, I tried tweaking the source code but it doesn't seem to be working:

def set_params():
...
if self.metric == 'proportions':
    ...
    self.valdiffs = np.array([p1 - p2 if (p1!=0) & (p2!=0) else 0 for p1, p2 in combinations(props, 2)]).T

The only difference is that I added an if... else... statement when calculating p1-p2, but now when I run crosstab() with sig_level=0.05 it simply does not show any sig. I don't think it's a math error because plenty of my data has a count of 0, hence a difference of zero and there was never a problem.

geirfreysson · 2021-04-29T08:33:06Z

I'm glad you're making progress with this! If you manage to make it work you can add tests for it and a pull request and the additions will then be available to everyone.

I would try and see if your code is working first with the code I mentioned in a previous comment in this thread.

The crosstab method itself uses the "paint" mechanism that is used to make results pretty, so there are a few more steps where things can go wrong if you use crosstab than if you use the code mentioned above. Try that first and if you get that to work, then the next step is to see why the crosstab mechanism isn't showing the results.

geirfreysson closed this as completed Mar 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

significance testing results different than SPSS #32

significance testing results different than SPSS #32

tracyyuqichen commented Mar 22, 2021

geirfreysson commented Mar 22, 2021

tracyyuqichen commented Mar 22, 2021

geirfreysson commented Mar 22, 2021 •

edited

Loading

tracyyuqichen commented Mar 25, 2021 •

edited

Loading

geirfreysson commented Mar 25, 2021 •

edited

Loading

tracyyuqichen commented Mar 26, 2021

tracyyuqichen commented Apr 8, 2021

geirfreysson commented Apr 8, 2021

tracyyuqichen commented Apr 8, 2021 •

edited

Loading

geirfreysson commented Apr 8, 2021

tracyyuqichen commented Apr 9, 2021

geirfreysson commented Apr 9, 2021

tracyyuqichen commented Apr 20, 2021 •

edited

Loading

geirfreysson commented Apr 23, 2021

tracyyuqichen commented Apr 27, 2021 via email

tracyyuqichen commented Apr 27, 2021 •

edited

Loading

geirfreysson commented Apr 28, 2021

tracyyuqichen commented Apr 29, 2021 •

edited

Loading

geirfreysson commented Apr 29, 2021

significance testing results different than SPSS #32

significance testing results different than SPSS #32

Comments

tracyyuqichen commented Mar 22, 2021

geirfreysson commented Mar 22, 2021

tracyyuqichen commented Mar 22, 2021

geirfreysson commented Mar 22, 2021 • edited Loading

tracyyuqichen commented Mar 25, 2021 • edited Loading

geirfreysson commented Mar 25, 2021 • edited Loading

tracyyuqichen commented Mar 26, 2021

tracyyuqichen commented Apr 8, 2021

geirfreysson commented Apr 8, 2021

tracyyuqichen commented Apr 8, 2021 • edited Loading

geirfreysson commented Apr 8, 2021

tracyyuqichen commented Apr 9, 2021

geirfreysson commented Apr 9, 2021

tracyyuqichen commented Apr 20, 2021 • edited Loading

geirfreysson commented Apr 23, 2021

tracyyuqichen commented Apr 27, 2021 via email

tracyyuqichen commented Apr 27, 2021 • edited Loading

geirfreysson commented Apr 28, 2021

tracyyuqichen commented Apr 29, 2021 • edited Loading

geirfreysson commented Apr 29, 2021

geirfreysson commented Mar 22, 2021 •

edited

Loading

tracyyuqichen commented Mar 25, 2021 •

edited

Loading

geirfreysson commented Mar 25, 2021 •

edited

Loading

tracyyuqichen commented Apr 8, 2021 •

edited

Loading

tracyyuqichen commented Apr 20, 2021 •

edited

Loading

tracyyuqichen commented Apr 27, 2021 •

edited

Loading

tracyyuqichen commented Apr 29, 2021 •

edited

Loading