Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: using human-readable group names instead of integer ids in MultiComparison #1472

Merged
merged 6 commits into from Apr 4, 2014

Conversation

Projects
None yet
3 participants
@mbatchkarov
Copy link
Contributor

commented Mar 13, 2014

In the current pip version, the output of line 145 in statsmodels.examples.try_tukey_hsd.py looks like this:

Multiple Comparison of Means - Tukey HSD,FWER=0.05
============================================
group1 group2 meandiff  lower  upper  reject
--------------------------------------------
  0      1      1.5     0.3217 2.6783  True 
  0      2      1.0    -0.1783 2.1783 False 
  1      2      -0.5   -1.6783 0.6783 False 
--------------------------------------------

Note the first and second column contain integer indices even though the tukeyhsd is passed a string array. This makes the table hard to read. With my changes, the output looks like this:

Multiple Comparison of Means - Tukey HSD,FWER=0.05
===============================================
 group1  group2  meandiff  lower  upper  reject
-----------------------------------------------
medical  mental    1.5     0.3217 2.6783  True 
medical physical   1.0    -0.1783 2.1783 False 
 mental physical   -0.5   -1.6783 0.6783 False 
-----------------------------------------------

I've applied the came change to MultiComparison.allpairtest

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 13, 2014

Thanks, that would be a good improvement.

Did you miss any commits in this PR? There is only one import change here.

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 13, 2014

I did indeed. That's what happens when one commits late at night.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 13, 2014

I think this looks good, as far as I can see from browsing the code.

Can you add a unit test for this? just a simple example as in the description of this issue.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 13, 2014

possible as extra unittest: MultiComparison has a group_order keyword that might not have unit tests.

define group_order in non-alphabetical order eg ['physical', 'medical', 'mental'] and see if the results get rearranged correctly.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 14, 2014

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 14, 2014

I was just going to ask if there is a particular module that the tests should go to. I'll add a unit test for both my changes and group_order in the coming days (a little busy right now)

mbatchkarov added some commits Mar 25, 2014

Added unit test for the group_order parameter of MultiComparison. It …
…also tests the human-readable labels in TukeyHSDResults are in the right order
@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 25, 2014

Done.

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

Not sure why the build is failing. All tests in that particular file pass on my machine under both Python 2.7 and 3.3 on my machine

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

@mbatchkarov Thanks for this

The test failure on python 3 is because we are mixing bytes and strings.
My guess is that the underlying data is read as bytes while in the test we use strings

my guess we have to put a asbytes from statsmodels.compatnp in the test, or a asstring in the print, so the code works on python 2 and 3 at the same time. I can look into that.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

also we need a smoke test for the __str__ of t or res._results_table
My guess is that method also mixes bytes and strings on python 3, and might not work correctly, i.e.
print res might fail on python 3 (or maybe it doesn't if everything is ASCII)

aside: Skipper was working on unicode support in another pull request that might affect how we handle this, but we can finish this PR without it.

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

It might be worth considering six to avoid such issues in the future. scikit-learn has been using it quite happily for a while now. The obvious problem is that there is a lot of existing code in statsmodels that might have to be changed.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

The plan is to move to common codebase for py2 py3 soon. Actually, bashtage started to work on it two days ago. (My impression is that we will not be using six, scipy dropped most of the code from their copy of six, and I prefer the alternative where we write idiomatic py3 as much as possible, and get it to work on py 2, instead of the other way around.)

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

It might be worth considering six to avoid such issues in the future

It doesn't avoid the issue, we just get additional helper functions. We still have to decide in each case whether we want to use unicode or string on python 2 and string or bytes on python 3.

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2014

I've altered the test slightly in the hope it will pass under Python 3.

On a related note, I don't understand how any of the tests pass under Python 3. I'm unable to get the same environment that Travis uses on my machine, but when I manually running nosetests under Python 3 I get a whole host of issues. There are a lot of print statements, xrange, etc, that do not work in Python 3. I'm guessing you were referring to these changes earlier.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

for running python 3: did you install it with setupy.py install?

The source is correctly translated by 2to3 to get it to run on python 3.

However, the example folders are not included in the 2to3 translation, (One of the example folders is included in the source distribution and causes many warnings, but that doesn't affect the non-examples code, and example folders are not used by the test suite.)

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

with the latest changes, both py 2.7 and py 3.2 are green.

@mbatchkarov

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2014

Yes, it looks like the PR is ready to merge now (assuming the community is happy with the test, of course)

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Can you rebase this on master? We've upgraded travis and it will give us a better idea of testing.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Everything should be fine with this.

The only change we might want to do: Usually we don't use TestCase, and there might be a nose, numpy equivalent for the test. I don't remember the details well enough to suggest an alternative.
numpy.assert_equal might work (or not)

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Ok, then I'm going to merge. We can fix any unanticipated failures on fuller test suite.

jseabold added a commit that referenced this pull request Apr 4, 2014

Merge pull request #1472 from mbatchkarov/readable_tables
ENH: Human-readable group names instead of integers in MultiComparison

@jseabold jseabold merged commit b2cefd2 into statsmodels:master Apr 4, 2014

1 check passed

default The Travis CI build passed
Details
@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Merged. Thanks.

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 4, 2014

Suspected there might be some 2.6 problems here...

@josef-pkt josef-pkt referenced this pull request Apr 5, 2014

Open

Multiple Comparisons #852

@josef-pkt josef-pkt added the PR label Apr 14, 2014

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1472 from mbatchkarov/readable_tables
ENH: Human-readable group names instead of integers in MultiComparison
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.