New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add normalize to crosstab #12569

Closed
nickeubank opened this Issue Mar 9, 2016 · 8 comments

Comments

Projects
None yet
3 participants
@nickeubank
Contributor

nickeubank commented Mar 9, 2016

It'd be great to have a simple normalization option for cross tab to get shares rather than frequencies.

Something that would do something like:

def normalize(x):
    return len(x)/len(w_mobile.language)

pd.crosstab(w_mobile.language,w_mobile.carrier, values=w_mobile.language, aggfunc=normalize)

as just an option.

(The ability to do row-normalizations and column normalizations would also be great -- so all entries in a row add to 1 or all entries in a column add to 1). Similar in behavior (for row normalizations) as:

l = list()
df = pd.DataFrame({'carrier':['a','a','b','b','b'], 'language':['english','spanish', 'english','spanish','spanish']})

for i in df.carrier.unique():
    temp = df.query('carrier=="{}"'.format(i)).language.value_counts(normalize=True)
    temp.name = i
    l.append(temp)

ctab = pd.concat(l, axis=1)


Out[1]: 
                a         b
 english  0.5  0.333333
spanish  0.5  0.666667

But with a command like: pd.crosstab(df.language, df.carrier, normalization='row')

@sinhrks

This comment has been minimized.

Show comment
Hide comment
@sinhrks

sinhrks Mar 9, 2016

Member

I also use this type of operation in pivot_table. Is it can be a part of aggfunc, maybe pct, pct_row and pct_col?

Member

sinhrks commented Mar 9, 2016

I also use this type of operation in pivot_table. Is it can be a part of aggfunc, maybe pct, pct_row and pct_col?

@jreback jreback added this to the Next Major Release milestone Mar 9, 2016

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Mar 9, 2016

Contributor

I like it. I'll try a few things and submit a PR.

Contributor

nickeubank commented Mar 9, 2016

I like it. I'll try a few things and submit a PR.

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Mar 9, 2016

Contributor

Relatedly, crosstab also has a bug -- it counts np.nan in margin totals even when dropna=True.

df = pd.DataFrame({'a':[1,2,2,2,2,np.nan],'b':[3,3,4,4,4,4]})
pd.crosstab(df.a,df.b, margins=True)
Out[233]: 
b    3  4  All
a             
1.0  1  0    1
2.0  1  3    4
All  2  4    6

Not related to #12558 i don't think

Contributor

nickeubank commented Mar 9, 2016

Relatedly, crosstab also has a bug -- it counts np.nan in margin totals even when dropna=True.

df = pd.DataFrame({'a':[1,2,2,2,2,np.nan],'b':[3,3,4,4,4,4]})
pd.crosstab(df.a,df.b, margins=True)
Out[233]: 
b    3  4  All
a             
1.0  1  0    1
2.0  1  3    4
All  2  4    6

Not related to #12558 i don't think

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 9, 2016

Contributor

might be #4003

Contributor

jreback commented Mar 9, 2016

might be #4003

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Mar 9, 2016

Contributor

@jreback don't think so -- that's double counting. This is only happens on columns with np.nan and increments by num of np.nans.

Contributor

nickeubank commented Mar 9, 2016

@jreback don't think so -- that's double counting. This is only happens on columns with np.nan and increments by num of np.nans.

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 9, 2016

Contributor

ok if you can't find a related one, then pls open a new issue

Contributor

jreback commented Mar 9, 2016

ok if you can't find a related one, then pls open a new issue

@jreback

This comment has been minimized.

Show comment
Hide comment
@jreback

jreback Mar 9, 2016

Contributor

in fact, if you can, pls open a new issue (we'll call it and i'll tag it master), and can list a checkbox for all of the crosstab issues. (each individual one has an issue and we just refernce things) like #11485

Contributor

jreback commented Mar 9, 2016

in fact, if you can, pls open a new issue (we'll call it and i'll tag it master), and can list a checkbox for all of the crosstab issues. (each individual one has an issue and we just refernce things) like #11485

@nickeubank

This comment has been minimized.

Show comment
Hide comment
@nickeubank

nickeubank Mar 9, 2016

Contributor

Posted PR to #12578 . Input welcome @sinhrks

Contributor

nickeubank commented Mar 9, 2016

Posted PR to #12578 . Input welcome @sinhrks

@jreback jreback modified the milestones: 0.18.1, Next Major Release Apr 4, 2016

@jreback jreback closed this in bb494b7 Apr 25, 2016

nps added a commit to nps/pandas that referenced this issue May 17, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment