Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

significance testing results different than SPSS #32

Closed
tracyyuqichen opened this issue Mar 22, 2021 · 19 comments
Closed

significance testing results different than SPSS #32

tracyyuqichen opened this issue Mar 22, 2021 · 19 comments

Comments

@tracyyuqichen
Copy link

In comparing results from SPSS and Quantipy, we discovered that using the dataset.crosstab() function gives different significance results for comparing categorical distribution than SPSS. I see in the sandbox.py document that chisq is used for calculating significance, which should give us the same results. Could this be because Quantipy does not recognize multivariates that are ordinal (Likert scale)?

@geirfreysson
Copy link
Collaborator

Quantipy should most certainly work with Likert scale-type variables. The significance code has been tested quite a bit using Unicom/Dimensions and the default code replicates the Dimensions settings. The parameters can be tweaked quite a bit which can affect the results - could you share the exact parameters you use when you run the sig-diff in SPSS?

@tracyyuqichen
Copy link
Author

Thanks for responding so quickly! Someone else actually created the references in SPSS as I am not very familiar with SPSS myself, what parameters would they be looking for?

@geirfreysson
Copy link
Collaborator

geirfreysson commented Mar 22, 2021

I'm not familiar with sig-diff testing in SPSS. All I can do is get the chi-square results in the Analyze>Crosstabs menu and that doesn't test each category, it just tests the overall number (I think).

I've added an SPSS file to make testing/comparison easier with SPSS: tests/Example Data (A).sav

This is the exact same data we use to test Quantipy.

Ask your colleague to recreate what they are seeing with that file and post here so maybe we can figure out what parameters SPSS uses for their tests and then maybe we can see if the SPSS results are replicated with Quantipy.

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Mar 25, 2021

Ok so I think I have some idea of what might not be working. I recoded locality in both Python and SPSS, and I think the issue lies in the sig calculation after recoding? Here's the recode in SPSS for locality -> Region
image
and then I run the syntax

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /CRITERIA CILEVEL=95
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=BONFERRONI ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

and get
image

replicating the process in python:

meta['columns']['Region']={
    'type': 'delimited set',
    'text': {'en-GB': 'Locality Un-duped'},
    'values': [
        {'value': 1, 'text': {'en-GB': '1'}},
        {'value': 2, 'text': {'en-GB': '2'}},
        {'value': 3, 'text': {'en-GB': '3'}}
    ]
}
# recode
data['Region']=recode(
    meta, data,
    target='Region',
    mapper={
        1: {'locality': 1},
        2: {'locality': frange('2-3')},
        3: {'locality': frange('4-5')}
    },
    append=False
)
ds.crosstab('q2b','Region',sig_level=0.05)

and I get
image
Although the difference is slight here, when I apply it to multiple undup's to 20+ x's, I get a lot more significant Test-IDs than I do in SPSS. It seems that you do have Bonferroni correction in place somewhere in sandbox.py, so I'm not sure why the discrepancies are happening.

@geirfreysson
Copy link
Collaborator

geirfreysson commented Mar 25, 2021

Thanks for the very detailed report. I don't think this has anything to do with the recode itself, but the Bonferroni correction, which isn't implemented in Quantipy.

If I run your SPSS script without the Bonferroni correction, the result matches Quantipy.

CTABLES
  /VLABELS VARIABLES=locality ethnicity gender DISPLAY=LABEL
  /TABLE q2b [C][COUNT] BY Region [C]
  /CRITERIA CILEVEL=95
  /CATEGORIES VARIABLES= q2b Region ORDER=A KEY=VALUE EMPTY=INCLUDE
  /COMPARETEST TYPE=PROP ALPHA=0.05 ADJUST=NONE ORIGIN=COLUMN INCLUDEMRSETS=YES 
    CATEGORIES=ALLVISIBLE MERGE=YES STYLE=SIMPLE SHOWSIG=NO.

Screen Shot 2021-03-25 at 17 06 50

I didn't write the sig-testing code myself, but I imagine the Bonferroni correction would happen somewhere here:
https://github.com/Quantipy/quantipy3/blob/master/quantipy/core/quantify/engine.py#L1977

and could be done with statsmodels
https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html

I'll probably close this ticket for now but create a new one called "implement bonferroni corrections in sig-tests" or something like that.

@tracyyuqichen
Copy link
Author

Great, that sounds great! thank you so much for pointing me to the engine.py doc

@tracyyuqichen
Copy link
Author

Hi Geir, two new issues regarding the sig results:

  1. In the following screenshot you'll see that group A has a count of zero, but the sig test returns group A as one of the groups that has a significant difference compared to other groups, which should not be the case
    image
  2. I tried applying bonferroni correction in the sig_level argument in the crosstab() function, since what bonferroni correction does is that it changes the alpha level at which the null hypothesis is rejected; the new p-value is 0.00625, which should work as far as I can see, but the crosstab() function just returns the crosstab without the sig view, and I'm genuinely stumped. Do you know whom I can consult with on this specific issue? Thanks!

@geirfreysson
Copy link
Collaborator

  1. The A means that columns B,D and E are significantly higher than column A. This doesn't have to be incorrect, if you do a political poll and 0 people say they're voting for a fringe party and 100 people are voting for a mainstream party, the mainstream party has a significantly higher following than the one with 0 counts.

  2. Can you send me a code example of what exactly you are doing?

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Apr 8, 2021

sig_level=0.05/28
ds.crosstab('Q7','Region',sig_level=sig_level)

@geirfreysson
Copy link
Collaborator

Thanks for that.

I can see now that there is a bug in the crosstab method when styling the output that makes tests with alpha < 0.01 not show up in the results.

The result doesn't look as nice, but you can use the following to get the sig-test results

x = 'q5_3'
y = 'gender'
stack = qp.Stack(name='sig', 
                 add_data={'sig': {'meta': dataset.meta(), 
                                   'data': dataset.data()}})
stack.add_link(data_keys=['sig'], 
               x=x, 
               y=y, 
               views=['c%', 'counts'])
link = stack['sig']['no_filter'][x][y]
test = qp.Test(link, 'x|f|:|||counts')

test = test.set_params(level=sig_level)
df = test.run()
Question gender
Values 1 2
Question Values
q5_3 1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
97 NaN NaN
98 NaN NaN

@tracyyuqichen
Copy link
Author

Fantastic, thank you for your help! Should I open up a new issue for this bug?

@geirfreysson
Copy link
Collaborator

Yes please, that would be great!

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Apr 20, 2021

hi @geirfreysson, sorry to bother again, but is there a book or reference I can check out for the methodologies behind the sig test? Or at least a name of the methodology? Also, is there a way I can specify that I want to perform a chi sq test instead of the default t test? Thank you!

@geirfreysson
Copy link
Collaborator

Hi @tc423, no bother at all. The default sig-tests mimic SPSS, Dimensions and Askia and are pairwise comparisons like the SPSS command COMPARETEST. ChiSquare tests are also available, but I don't have any code examples at hand.

The SPSS documentation is here: COMPARETEST.

You should be able to use the "sandbox" in Quantipy to do a chi square test, there is a method there calld chi_sq (here).

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Apr 27, 2021 via email

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Apr 27, 2021

I see that there's a weight engine, and I assume this is to help adjust the data overlap issue in delimited sets data type, and gives the weight of each individual response? On a somewhat related note, is there a way I can see the p-values between each group?

@geirfreysson
Copy link
Collaborator

  1. The sig-tests deal correctly with multiple response variables.
  2. The weight engine is the library that runs the RIM weighting algorithm. The sig-tests can correct for data overlap.
  3. You can't display them easily in the output, you'd have to go into the engine library itself and output the values there.

Hope that helps!

@tracyyuqichen
Copy link
Author

tracyyuqichen commented Apr 29, 2021

I see, I also just realized that all of this was commented in the engine.py file that I am now trying to edit, so my apologies. Looks like we're back to the problem of not wanting to compare A to B if B is zero, I tried tweaking the source code but it doesn't seem to be working:

def set_params():
...
if self.metric == 'proportions':
    ...
    self.valdiffs = np.array([p1 - p2 if (p1!=0) & (p2!=0) else 0 for p1, p2 in combinations(props, 2)]).T

The only difference is that I added an if... else... statement when calculating p1-p2, but now when I run crosstab() with sig_level=0.05 it simply does not show any sig. I don't think it's a math error because plenty of my data has a count of 0, hence a difference of zero and there was never a problem.

@geirfreysson
Copy link
Collaborator

I'm glad you're making progress with this! If you manage to make it work you can add tests for it and a pull request and the additions will then be available to everyone.

I would try and see if your code is working first with the code I mentioned in a previous comment in this thread.

The crosstab method itself uses the "paint" mechanism that is used to make results pretty, so there are a few more steps where things can go wrong if you use crosstab than if you use the code mentioned above. Try that first and if you get that to work, then the next step is to see why the crosstab mechanism isn't showing the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants