Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tukey.plot_simultaneous doesn't work properly with the option comparison_name #3584

Closed
ranophoenix opened this issue Mar 28, 2017 · 10 comments
Closed

Comments

@ranophoenix
Copy link

This code:

#Inspired by: http://hamelg.blogspot.com.br/2015/11/python-for-data-analysis-part-16_23.html
import numpy as np
import scipy.stats as stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
%matplotlib inline

np.random.seed(12)

races =   ["asian","black","hispanic","other","white"]

# Generate random data
voter_race = np.random.choice(a= races,
                              p = [0.05, 0.15 ,0.25, 0.05, 0.5],
                              size=1000)

# Use a different distribution for white ages
white_ages = stats.poisson.rvs(loc=18, 
                              mu=32,
                              size=1000)

voter_age = stats.poisson.rvs(loc=18,
                              mu=30,
                              size=1000)

voter_age = np.where(voter_race=="white", white_ages, voter_age)




#tukey = pairwise_tukeyhsd(endog=voter_frame['age'],     # Data
#                          groups=voter_frame['race'],   # Groups
#                          alpha=0.05)          # Significance level

tukey = pairwise_tukeyhsd(endog=voter_age,     # Data
                          groups=voter_race,   # Groups
                          alpha=0.05)          # Significance level

tukey.plot_simultaneous(comparison_name = 'white')    # Plot group confidence intervals
tukey.summary()

With numpy 1.11.x gives this warnings:

 c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py:735: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  if (min(maxrange[i], maxrange[midx]) -
 c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py:736: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  max(minrange[i], minrange[midx]) < 0):
 c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py:743: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  ax1.plot([minrange[midx]]*2, [-1, self._multicomp.ngroups],
c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py:745: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future
  ax1.plot([maxrange[midx]]*2, [-1, self._multicomp.ngroups],

With numpy 1.12.x gives this error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-35117e389add> in <module>()
     35                           alpha=0.05)          # Significance level
     36 
---> 37 tukey.plot_simultaneous(comparison_name = 'white')    # Plot group confidence intervals
     38 tukey.summary()
     39 print(np.version.version)

c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py in plot_simultaneous(self, comparison_name, ax, figsize, xlabel, ylabel)
    733                 if self.groupsunique[i] == comparison_name:
    734                     continue
--> 735                 if (min(maxrange[i], maxrange[midx]) -
    736                                          max(minrange[i], minrange[midx]) < 0):
    737                     sigidx.append(i)

TypeError: only integer scalar arrays can be converted to a scalar index

In c:\Anaconda3\lib\site-packages\statsmodels\sandbox\stats\multicomp.py, line 731, I have changed this:

midx = np.where(self.groupsunique==comparison_name)[0]

to this:

midx = np.where(self.groupsunique==comparison_name)[0][0]

And now It's working without errors or warnings.

@josef-pkt
Copy link
Member

thanks for reporting, and expecially for providing a full example. I will look into it tomorrow.

Is this with statsmodels 0.8.0 or 0.6.1?

Numpy deprecation problems were supposed to be fixed in 0.8.0 but plot functions have insufficient test coverage so we might have missed some.

@ranophoenix
Copy link
Author

I've tested in both versions with the same results.

@josef-pkt
Copy link
Member

there were no labels added, so it got lost

temporary prio-high to check

@josef-pkt josef-pkt added this to the 0.9 milestone Oct 4, 2017
@josef-pkt josef-pkt added this to bugs in 0.9 Oct 6, 2017
@josef-pkt
Copy link
Member

With numpy '1.11.2' I get a VisibleDeprecationWarning, but it still works

Needs to be fixed as compatibility fix for 0.9

@omarkhursheed
Copy link

omarkhursheed commented Nov 10, 2017

Hi @josef-pkt . I'd like to work on this. I'm just getting used to the codebase. This seems like a good place to start.

@josef-pkt
Copy link
Member

based on the traceback: midx is currently a float, but it should be and int.
So I guess we need to

  • create midx as an int
  • make sure we don't have integer division problems on python 2
  • add unit test, for plots smoketests are enough

@abhijeetpanda12
Copy link

Hi @josef-pkt , I'm interested in solving this issue. From what I have gone through, this issue is not a bug,
this appears to be an issue with the latest version of Numpy. A recent change made it an error to treat a single-element array as a scalar for the purposes of indexing.
So as suggested in the description, if we change
midx = np.where(self.groupsunique==comparison_name)[0]
to
midx = np.where(self.groupsunique==comparison_name)[0][0]
in the statsmodels/sandbox/stats/multicomp.py, line 731,
The issue is resolved.

@josef-pkt
Copy link
Member

@abhijeetpanda12 I guess I misread the issue.

Can you prepare a PR with the fix and a smoke test for the plot?

@abhijeetpanda12
Copy link

@josef-pkt I have created the PR for the fix, can you help me out on the smoke test for the plot?

@josef-pkt
Copy link
Member

PR for fixing this is #4290

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
0.9
DONE
Development

Successfully merging a pull request may close this issue.

4 participants