Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up stats._support, close statistics review issues #2778

Merged
merged 2 commits into from
Sep 15, 2013

Conversation

rgommers
Copy link
Member

Summary: all junk in that file. Two functions are used in glm/itemfreq, the former is deprecated and the latter rewritten in this PR. See gh-599 on discussion about keeping itemfreq.

Issues closed:
gh-599
gh-706
gh-707
gh-708
gh-710
gh-711
gh-712

@josef-pkt
Copy link
Member

_chk_asarray was move to from stats to _support during the python 2 to 3 conversion to avoid some circular imports IIRC, needs checking I guess

@josef-pkt
Copy link
Member

Some of the functions would theoretically be interesting, some of them are obsolete because pandas is doing it better.

unique is essentially a unique_rows that doesn't directly exist in numpy.
collapse has something of a function that creates a pivot table, (pandas does much more)
colex, adm and linexand are completely obsolete and were most likely left over from when stats was also using lists of lists instead of numpy arrays

We made a try several years ago to clean up the _support functions, but there was not really enough interest.

@rgommers
Copy link
Member Author

I don't think we want a collapse for something that people really should be using pandas for, right?

unique_rows might make sense as a name and as a utility function, but why would we want to add it to stats (it's not a public function now)?

…pport

Closes scipygh-599 (stats.itemfreq statistics review)
Closes scipygh-706 (stats.abut statistics review - removed)
Closes scipygh-707 (stats.unique statistics review - removed)
@rgommers
Copy link
Member Author

OK updated, stats._support is now completely removed and itemfreq reimplemented based on @WarrenWeckesser's suggestion.

@WarrenWeckesser
Copy link
Member

The rewrite of itemfreq is fine, but I'm now strongly in favor of also deprecating it. :) The last step in which the unique items and the counts are stuck into a 2-D array is just wrong. I can see no reason for it, and look what happens with a string array:

In [2]: a = array(['foo', 'bar', 'foo', 'one', 'two', 'two'])

In [3]: itemfreq(a)
Out[3]: 
array([['bar', '1'],
       ['foo', '2'],
       ['one', '1'],
       ['two', '2']], 
      dtype='|S3')

It really should just return the two arrays separately, but that's an API change. So how about also giving it a deprecation warning that states that the return value will change to the tuple of 1D array (items, freqs) in version 0.XX? (Or create a new function now, and warn that itemfreq will be removed, or...?)

@rgommers
Copy link
Member Author

I don't think changing the number of return values is desirable; a deprecation and adding a new function (count_unique?) would be better. That's an enhancement though, so can be done in a separate PR I think.

@josef-pkt
Copy link
Member

Question ( I haven't looked at the details)
Why don't you return a structured dtype with first column in type of uniques, and second in int?
likely problem doesn't allow column indexing, but structured dtype could be returned only if uniques are not numeric (int or float or complex)

+1 on deprecation and a good replacement, if that can count rows.

@josef-pkt
Copy link
Member

another thing I just remembered:
We added some functions to scipy that are based on numpy.histogram (including 2d and nd, binned stats), AFAIR it also does count.
It's a possible alternative recommendation or see also, but I think it can only handle numeric values and does binning (which can be fragile with integers on the bin limits.)

@WarrenWeckesser
Copy link
Member

This looks good and works for me, so I'll merge it.

The deprecation of itemfreq can be handled as a separate issue and/or PR.

WarrenWeckesser added a commit that referenced this pull request Sep 15, 2013
Clean up stats._support, close statistics review issues
@WarrenWeckesser WarrenWeckesser merged commit 640b182 into scipy:master Sep 15, 2013
@rgommers
Copy link
Member Author

Thanks.

@rgommers rgommers deleted the stats-cleanup branch September 16, 2013 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants