Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas 1.5 issue for chisquare test. #1587

Closed
robertness opened this issue Dec 31, 2022 · 3 comments
Closed

Pandas 1.5 issue for chisquare test. #1587

robertness opened this issue Dec 31, 2022 · 3 comments

Comments

@robertness
Copy link

Subject of the issue

Running chi_square fails locally on dataframes from pandas 1.5. Worked again when using pandas 1.4.

This was found by one of my students on a long standing coding assignment. The student suspects that the issue is how pgmpy's power_divergence method iterates through a pandas groupby and assumes that a tuple will be returned when doing so - but pandas 1.5 changed the output to be length 1, so this line fails

Your environment

They student got the error locally, using pandas>=1.5 and pgmpy==0.1.21. I'll ping them to update this issue with their local Python and OS.

Steps to reproduce

test_result = chi_square(X=X, Y=Y, Z=Z, data=data, boolean=True, significance_level=significance)

Expected behaviour

Get a Boolean output.

Actual behaviour

Error.

@ankurankan
Copy link
Member

@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:

In [16]: import pandas as pd

In [17]: from pgmpy.utils import get_example_model

In [18]: from pgmpy.estimators.CITests import chi_square

In [19]: pd.__version__
Out[19]: '1.5.2'

In [20]: model = get_example_model('asia')

In [21]: data = model.simulate()
Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s]

In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05)
Out[22]: False

@kylejcaron
Copy link

@robertness Thanks for reporting the issue, but I don't seem to be able to reproduce the error on pandas 1.5.2. Could you possibly ask your student if they could share some reproducible code where they got the error? Here's my test script where it seems to work fine:

In [16]: import pandas as pd

In [17]: from pgmpy.utils import get_example_model

In [18]: from pgmpy.estimators.CITests import chi_square

In [19]: pd.__version__
Out[19]: '1.5.2'

In [20]: model = get_example_model('asia')

In [21]: data = model.simulate()
Generating for node: dysp: 100%|█████████████████████████████████████████| 8/8 [00:00<00:00, 1676.05it/s]

In [22]: chi_square('asia', 'smoke', ['tub', 'either', 'dysp'], data, significance_level=0.05)
Out[22]: False

Hey @ankurankan, after reinstalling pandas I'm failing to see the same error. Not sure what changed on my end as I originally thought I had tested on 1.5,1.5.1, and 1.5.2, but they all seem to be working fine for me now (just with a regular futurewarning). Here's the code I was using to test below

from pgmpy.estimators.CITests import chi_square
import numpy as np
import pandas as pd
import pgmpy 

chi_square(
    X="X",
    Y="Y",
    Z=["Z"],
    data=pd.DataFrame({
        "X":[1.5, 1.2, 1.4],
        "Y":[3.1, 2.3, 1.4],
        "Z":[1,1,0]
    }),
    significance_level=0.05
)

apologies on this one, looks like everything is working fine!

@ankurankan
Copy link
Member

@kylejcaron Thanks for confirming. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants