value_counts() can now compute relative frequencies. #2710

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
4 participants
Contributor

lexual commented Jan 19, 2013

No description provided.

Owner

wesm commented Jan 19, 2013

Can you please add a test case?

Contributor

lexual commented Jan 20, 2013

Test case now provided ;)

Owner

wesm commented Jan 22, 2013

Deferring til next release. Need to think about whether relative or normalize is the right argument name (should consult other sources for guidance)

Contributor

hayd commented Feb 10, 2013

I think normalize is more appropriate here. An individual frequency may be relative (hence the phrase "Relative frequency histogram"), but the overall is normalized. [citation needed]

Contributor

lexual commented Feb 10, 2013

Here's what matplotlib & numpy do:

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist
has a "normed" argument.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html
has a "normed" argument.
apparently this is being deprecated for numpy 2.0 and will be using the "density" argument instead.

Both of the above links are worth reading for further clarification of how they handle this.

Contributor

hayd commented Feb 11, 2013

It seems a bit weird to talk about "density" with the discrete-sounding value_ counts, but I can see how it could make other things consistent... density would certainly be a good addition when bucketing.

normed is better than normalize (and has the same meaning)

Contributor

y-p commented Mar 14, 2013

Barring loud outcry, I plan to merge this using "normalize" as the keyword very soon.

Owner

wesm commented Mar 14, 2013

+1

Contributor

y-p commented Mar 14, 2013

Thanks @lexual, merged as b143a21. Apologies for the delay.

Saw your django repo - good feature. Suggest you see if there's common ground with #2717.

y-p closed this Mar 14, 2013

Contributor

lexual commented Mar 14, 2013

Thanks @y-p .

Do you have an opinion on whether something like https://github.com/lexual/pandas-love-ponies should live outside Pandas or inside Pandas? I never got any responses to my post to the mailing list a month ago.

Contributor

y-p commented Mar 14, 2013

My opinion is that pandas is in constant feature-creep japordy, and this
is not close to home. However, playing nice with SQL, especially under
a unified interface through SQL alchemy would be a win, so if django
fits in there as a "flavor", 👍 , else, probably not core material.

others may disagree.

Contributor

lexual commented Mar 14, 2013

My code is completely Django specific so imagine it would be separate to sql alchemy work (although I haven't had a close look at the sql alchemy stuff). So I imagine you're probably right that it should live outside of Pandas unless it starts getting widespread use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment