ENH: Add normalize to crosstab #12569

nickeubank · 2016-03-09T06:25:36Z

It'd be great to have a simple normalization option for cross tab to get shares rather than frequencies.

Something that would do something like:

def normalize(x):
    return len(x)/len(w_mobile.language)

pd.crosstab(w_mobile.language,w_mobile.carrier, values=w_mobile.language, aggfunc=normalize)

as just an option.

(The ability to do row-normalizations and column normalizations would also be great -- so all entries in a row add to 1 or all entries in a column add to 1). Similar in behavior (for row normalizations) as:

l = list()
df = pd.DataFrame({'carrier':['a','a','b','b','b'], 'language':['english','spanish', 'english','spanish','spanish']})

for i in df.carrier.unique():
    temp = df.query('carrier=="{}"'.format(i)).language.value_counts(normalize=True)
    temp.name = i
    l.append(temp)

ctab = pd.concat(l, axis=1)


Out[1]: 
                a         b
 english  0.5  0.333333
spanish  0.5  0.666667

But with a command like: pd.crosstab(df.language, df.carrier, normalization='row')

The text was updated successfully, but these errors were encountered:

sinhrks · 2016-03-09T10:59:39Z

I also use this type of operation in pivot_table. Is it can be a part of aggfunc, maybe pct, pct_row and pct_col?

nickeubank · 2016-03-09T20:06:29Z

I like it. I'll try a few things and submit a PR.

nickeubank · 2016-03-09T21:33:18Z

Relatedly, crosstab also has a bug -- it counts np.nan in margin totals even when dropna=True.

df = pd.DataFrame({'a':[1,2,2,2,2,np.nan],'b':[3,3,4,4,4,4]})
pd.crosstab(df.a,df.b, margins=True)
Out[233]: 
b    3  4  All
a             
1.0  1  0    1
2.0  1  3    4
All  2  4    6

Not related to #12558 i don't think

jreback · 2016-03-09T21:35:31Z

might be #4003

nickeubank · 2016-03-09T21:43:41Z

@jreback don't think so -- that's double counting. This is only happens on columns with np.nan and increments by num of np.nans.

jreback · 2016-03-09T21:48:16Z

ok if you can't find a related one, then pls open a new issue

jreback · 2016-03-09T21:49:27Z

in fact, if you can, pls open a new issue (we'll call it and i'll tag it master), and can list a checkbox for all of the crosstab issues. (each individual one has an issue and we just refernce things) like #11485

nickeubank · 2016-03-09T22:00:53Z

Posted PR to #12578 . Input welcome @sinhrks

sinhrks added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 9, 2016

jreback added Enhancement Difficulty Novice labels Mar 9, 2016

jreback added this to the Next Major Release milestone Mar 9, 2016

This was referenced Mar 9, 2016

BUG: Crosstab margins ignoring dropna #12577

Closed

Add normalization to crosstab #12578

Closed

jreback modified the milestones: 0.18.1, Next Major Release Apr 4, 2016

jreback closed this as completed in bb494b7 Apr 25, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add normalize to crosstab #12569

ENH: Add normalize to crosstab #12569

nickeubank commented Mar 9, 2016

sinhrks commented Mar 9, 2016

nickeubank commented Mar 9, 2016

nickeubank commented Mar 9, 2016

jreback commented Mar 9, 2016

nickeubank commented Mar 9, 2016

jreback commented Mar 9, 2016

jreback commented Mar 9, 2016

nickeubank commented Mar 9, 2016

ENH: Add normalize to crosstab #12569

ENH: Add normalize to crosstab #12569

Comments

nickeubank commented Mar 9, 2016

sinhrks commented Mar 9, 2016

nickeubank commented Mar 9, 2016

nickeubank commented Mar 9, 2016

jreback commented Mar 9, 2016

nickeubank commented Mar 9, 2016

jreback commented Mar 9, 2016

jreback commented Mar 9, 2016

nickeubank commented Mar 9, 2016