ENH: using human-readable group names instead of integer ids in MultiComparison #1472

Merged
merged 6 commits into from Apr 4, 2014

Projects

None yet

3 participants

@mbatchkarov
Contributor

In the current pip version, the output of line 145 in statsmodels.examples.try_tukey_hsd.py looks like this:

Multiple Comparison of Means - Tukey HSD,FWER=0.05
============================================
group1 group2 meandiff  lower  upper  reject
--------------------------------------------
  0      1      1.5     0.3217 2.6783  True 
  0      2      1.0    -0.1783 2.1783 False 
  1      2      -0.5   -1.6783 0.6783 False 
--------------------------------------------

Note the first and second column contain integer indices even though the tukeyhsd is passed a string array. This makes the table hard to read. With my changes, the output looks like this:

Multiple Comparison of Means - Tukey HSD,FWER=0.05
===============================================
 group1  group2  meandiff  lower  upper  reject
-----------------------------------------------
medical  mental    1.5     0.3217 2.6783  True 
medical physical   1.0    -0.1783 2.1783 False 
 mental physical   -0.5   -1.6783 0.6783 False 
-----------------------------------------------

I've applied the came change to MultiComparison.allpairtest

@josef-pkt
Member

Thanks, that would be a good improvement.

Did you miss any commits in this PR? There is only one import change here.

@mbatchkarov
Contributor

I did indeed. That's what happens when one commits late at night.

@josef-pkt
Member

I think this looks good, as far as I can see from browsing the code.

Can you add a unit test for this? just a simple example as in the description of this issue.

@josef-pkt
Member

possible as extra unittest: MultiComparison has a group_order keyword that might not have unit tests.

define group_order in non-alphabetical order eg ['physical', 'medical', 'mental'] and see if the results get rearranged correctly.

@mbatchkarov
Contributor

I was just going to ask if there is a particular module that the tests should go to. I'll add a unit test for both my changes and group_order in the coming days (a little busy right now)

@mbatchkarov
Contributor

Done.

@mbatchkarov
Contributor

Not sure why the build is failing. All tests in that particular file pass on my machine under both Python 2.7 and 3.3 on my machine

@josef-pkt
Member

@mbatchkarov Thanks for this

The test failure on python 3 is because we are mixing bytes and strings.
My guess is that the underlying data is read as bytes while in the test we use strings

my guess we have to put a asbytes from statsmodels.compatnp in the test, or a asstring in the print, so the code works on python 2 and 3 at the same time. I can look into that.

@josef-pkt
Member

also we need a smoke test for the __str__ of t or res._results_table
My guess is that method also mixes bytes and strings on python 3, and might not work correctly, i.e.
print res might fail on python 3 (or maybe it doesn't if everything is ASCII)

aside: Skipper was working on unicode support in another pull request that might affect how we handle this, but we can finish this PR without it.

@mbatchkarov
Contributor

It might be worth considering six to avoid such issues in the future. scikit-learn has been using it quite happily for a while now. The obvious problem is that there is a lot of existing code in statsmodels that might have to be changed.

@josef-pkt
Member

The plan is to move to common codebase for py2 py3 soon. Actually, bashtage started to work on it two days ago. (My impression is that we will not be using six, scipy dropped most of the code from their copy of six, and I prefer the alternative where we write idiomatic py3 as much as possible, and get it to work on py 2, instead of the other way around.)

@josef-pkt
Member

It might be worth considering six to avoid such issues in the future

It doesn't avoid the issue, we just get additional helper functions. We still have to decide in each case whether we want to use unicode or string on python 2 and string or bytes on python 3.

@mbatchkarov
Contributor

I've altered the test slightly in the hope it will pass under Python 3.

On a related note, I don't understand how any of the tests pass under Python 3. I'm unable to get the same environment that Travis uses on my machine, but when I manually running nosetests under Python 3 I get a whole host of issues. There are a lot of print statements, xrange, etc, that do not work in Python 3. I'm guessing you were referring to these changes earlier.

@josef-pkt
Member

for running python 3: did you install it with setupy.py install?

The source is correctly translated by 2to3 to get it to run on python 3.

However, the example folders are not included in the 2to3 translation, (One of the example folders is included in the source distribution and causes many warnings, but that doesn't affect the non-examples code, and example folders are not used by the test suite.)

@josef-pkt
Member

with the latest changes, both py 2.7 and py 3.2 are green.

@mbatchkarov
Contributor

Yes, it looks like the PR is ready to merge now (assuming the community is happy with the test, of course)

@jseabold
Member
jseabold commented Apr 4, 2014

Can you rebase this on master? We've upgraded travis and it will give us a better idea of testing.

@josef-pkt
Member

Everything should be fine with this.

The only change we might want to do: Usually we don't use TestCase, and there might be a nose, numpy equivalent for the test. I don't remember the details well enough to suggest an alternative.
numpy.assert_equal might work (or not)

@jseabold
Member
jseabold commented Apr 4, 2014

Ok, then I'm going to merge. We can fix any unanticipated failures on fuller test suite.

@jseabold jseabold merged commit b2cefd2 into statsmodels:master Apr 4, 2014

1 check passed

default The Travis CI build passed
Details
@jseabold
Member
jseabold commented Apr 4, 2014

Merged. Thanks.

@jseabold
Member
jseabold commented Apr 4, 2014

Suspected there might be some 2.6 problems here...

@josef-pkt josef-pkt referenced this pull request Apr 5, 2014
Open

Multiple Comparisons #852

@josef-pkt josef-pkt added the PR label Apr 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment