Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement frequency table function a la table in R #170

Closed
wesm opened this issue Sep 25, 2011 · 7 comments

Comments

@wesm
Copy link
Member

commented Sep 25, 2011

No description provided.

@wesm

This comment has been minimized.

Copy link
Member Author

commented Sep 25, 2011

A cut function would also be nice

@gregglind

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2012

What is the right way of doing a simple counts crosstab with marginals? For bonus points, all vars vs all vars.?

@wesm

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2012

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call

@gregglind

This comment has been minimized.

Copy link
Contributor

commented Jan 13, 2012

So, supposing I have columns 'a','b', what is the simplest call to get
the crosstab table? For some reason, pivot_table is tough for me!

On Fri, Jan 13, 2012 at 3:33 PM, Wes McKinney
reply@reply.github.com
wrote:

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call


Reply to this email directly or view it on GitHub:
https://github.com/wesm/pandas/issues/170#issuecomment-3486638

@wesm

This comment has been minimized.

Copy link
Member Author

commented Jan 13, 2012

example:


In [10]: wp
Out[10]: 
    breaks wool tension
1   26     A    L      
2   30     A    L      
3   54     A    L      
4   25     A    L      
5   70     A    L      
6   52     A    L      
7   51     A    L      
8   26     A    L      
9   67     A    L      
10  18     A    M      
11  21     A    M      
12  29     A    M      
13  17     A    M      
14  12     A    M      
15  18     A    M      
16  35     A    M      
17  30     A    M      
18  36     A    M      
19  36     A    H      
20  21     A    H      
21  24     A    H      
22  18     A    H      
23  10     A    H      
24  43     A    H      
25  28     A    H      
26  15     A    H      
27  26     A    H      
28  27     B    L      
29  14     B    L      
30  29     B    L      
31  19     B    L      
32  29     B    L      
33  31     B    L      
34  41     B    L      
35  20     B    L      
36  44     B    L      
37  42     B    M      
38  26     B    M      
39  19     B    M      
40  16     B    M      
41  39     B    M      
42  28     B    M      
43  21     B    M      
44  39     B    M      
45  29     B    M      
46  20     B    H      
47  21     B    H      
48  24     B    H      
49  17     B    H      
50  13     B    H      
51  15     B    H      
52  15     B    H      
53  16     B    H      
54  28     B    H      

In [11]: wp.pivot_table('breaks', rows='wool', cols='tension', aggfunc='count')
Out[11]: 
tension  H  L  M
wool            
A        9  9  9
B        9  9  9

I'll have a look at R's table function and add a simple crosstab function or something

@wesm

This comment has been minimized.

Copy link
Member Author

commented Jan 14, 2012

Just wrote a blog post here: http://wesmckinney.com/blog/?p=443. I don't think it's necessary to add any more functions

@wesm

This comment has been minimized.

Copy link
Member Author

commented Jan 16, 2012

OK Gregg, I'll bite:

In [7]: a
Out[7]: 
array([1, 2, 6, 6, 4, 0, 2, 0, 4, 3, 5, 1, 1, 2, 6, 3, 4, 4, 5, 4, 4, 5, 5,
       2, 1, 1, 6, 3, 5, 2, 5, 6, 2, 2, 5, 1, 1, 3, 1, 4, 1, 6, 0, 1, 3, 3,
       1, 4, 2, 1, 0, 5, 0, 5, 1, 1, 5, 0, 2, 4, 2, 4, 2, 2, 2, 6, 2, 0, 1,
       4, 6, 1, 4, 0, 5, 5, 3, 5, 5, 6, 0, 6, 6, 5, 0, 2, 4, 2, 2, 0, 5, 0,
       5, 6, 5, 6, 4, 5, 0, 4])

In [8]: b
Out[8]: 
array([0, 0, 0, 2, 0, 0, 2, 1, 1, 1, 2, 2, 0, 1, 0, 0, 2, 2, 1, 0, 0, 2, 1,
       1, 0, 2, 2, 1, 2, 1, 1, 1, 2, 1, 2, 0, 2, 1, 1, 0, 0, 0, 0, 2, 1, 1,
       2, 0, 0, 1, 1, 1, 2, 2, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 0, 2, 0,
       0, 1, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 2, 1,
       2, 0, 1, 1, 2, 0, 0, 0])

In [9]: c
Out[9]: 
array([3, 3, 4, 1, 1, 3, 4, 4, 1, 0, 2, 2, 4, 2, 3, 0, 1, 0, 2, 0, 4, 1, 3,
       1, 0, 1, 1, 0, 1, 4, 1, 4, 2, 3, 3, 0, 3, 3, 1, 3, 0, 1, 4, 4, 3, 1,
       3, 1, 1, 4, 1, 0, 0, 3, 1, 3, 3, 3, 2, 2, 1, 2, 3, 4, 0, 3, 1, 3, 3,
       0, 4, 3, 0, 3, 0, 2, 4, 3, 1, 0, 4, 1, 3, 0, 1, 1, 4, 0, 0, 3, 2, 1,
       4, 2, 3, 2, 2, 1, 2, 0])

In [10]: result = crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'),
                          margins=True)

In [11]: result
Out[11]: 
b    0               1               2              All
c    0  1  2  3   4  0  1  2  3   4  0  1  2  3  4     
0    0  0  1  4   1  0  3  0  0   2  1  0  0  1  0  13 
1    3  0  0  3   1  0  2  0  1   1  0  1  1  2  1  16 
2    0  3  1  1   0  1  1  1  2   2  2  1  1  0  1  17 
3    1  0  0  0   0  2  1  0  2   1  0  0  0  0  0  7  
4    3  2  2  1   1  0  1  0  0   1  2  1  1  0  0  15 
5    0  1  0  0   0  3  1  1  4   0  0  3  3  2  1  19 
6    1  2  1  1   1  0  0  1  1   2  0  2  0  1  0  13 
All  8  8  5  10  4  6  9  3  10  9  5  8  6  6  3  100

I think that's pretty slick

@wesm wesm closed this Jan 16, 2012

yarikoptic added a commit to neurodebian/pandas that referenced this issue Jan 19, 2012
Merge branch 'master' into debian
* master: (313 commits)
  TST: more Python 2.5 sadness
  TST: Python 2.5 float formatting changed
  TST: cast to i8 when checking margins
  BUG: DataFrame.join on keys produce wrong result, does not preserve order
  DOC: release notes
  ENH: xs level can take multiple levels, pass multiple levels to MultiIndex.droplevel, GH pandas-dev#371
  BUG: fix bugs related to comments in pandas-dev#371
  BUG: fix TextParser with list buglet, enable parsing of DataFrame output with index names
  BUG: convert tuples in concat to MultiIndex
  BUG: don't lose index names when adding row margin
  ENH: add margins to crosstab
  ENH: add crosstab function and test
  ENH: crosstab prototype function, API needs fleshing out, GH pandas-dev#170
  BUG: fix buglet with xs with level, GH pandas-dev#371
  TST: add test_sql.py module
  TST: testing, cleanup of io.sql module
  TST: indexing testing with minor Series.__getitem__ refactoring
  ENH: hack toward pandas-dev#629
  BUG: check for non-contiguous memory in SeriesGrouper, causing segfault
  ENH: add ability to pass list of dicts to DataFrame.append (GH pandas-dev#464)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.