passing a function to DataFrameGroupBy.agg - confusing documentation/behaviour

#### Problem description

For clarity, I'll split this into 3 sub-issues related to DataFrameGroupBy.agg and its documentation.

##### 1. Not clear what kind of custom function should be provided

The docstring on [DataFrameGroupBy.agg](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.DataFrameGroupBy.agg.html#pandas.core.groupby.DataFrameGroupBy.agg) says that you can pass in a function, but it's unclear what that function should expect to receive as input, and how what it returns relates to the return value of agg.

After some poking around, it seems to me that the typical pattern is to pass a function that takes a series and returns a scalar, with the dataframe returned by agg having the property that `aggregated.loc[group_foo, col_bar]` is the result of calling the function on the series that is the column `col_bar` for rows belonging to `group_foo`.

If that is the expected behaviour, it should be explained in the docs.

##### 2. Passing an arbitrary kwarg changes the function behaviour

I stumbled on this weirdness while trying to figure out how agg worked:

```python
>>> df = pd.DataFrame(np.random.rand(4,2), columns=['b', 'c'])
>>> df['a'] = [1, 0, 0, 0]
>>> g = df.groupby('a')
>>> g.agg(lambda x: np.product(x.shape))
     b    c
a          
0  3.0  3.0
1  1.0  1.0
>>> g.agg(lambda x, foo=0: np.product(x.shape), foo=0)
   b  c
a      
0  9  9
1  3  3
```

In other words, by passing in any meaningless kwarg, my function is now called `ngroups` times with a dataframe for each group, rather than being called with a series `ngroups * ncolumns` times.

Also, when called in this way, it seems my function can return a list or series of length `ncolumns` which gets expanded (this doesn't work in the non-kwarg version):

```python
>>> g.agg(lambda x, foo=0: [20, 17], foo=0)
     b   c
a        
0  20  17
1  20  17
```

##### 3. Note on numpy special casing is confusing

The docstring has this note:

> Numpy functions mean/median/prod/sum/std/var are special cased so the default behavior is applying the function along axis=0 (e.g., np.mean(arr_2d, axis=0)) as opposed to mimicking the default Numpy behavior (e.g., np.mean(arr_2d)).

Which is confusing because passing in `lambda x: np.mean(x)` does seem to give the 'right' answer (the same one as passing `lambda x: np.mean(x, axis=0)`, or `np.mean`, or `'mean'`). This is true whether or not I throw in a kwarg.

#### Output of ``pd.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 32
OS: Linux
OS-release: 3.13.0-107-generic
machine: i686
processor: i686
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 25.1.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 1.5.7
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.1.3
html5lib: 0.999
httplib2: 0.8
apiclient: None
sqlalchemy: 0.8.0
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: None
pandas_datareader: None
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

passing a function to DataFrameGroupBy.agg - confusing documentation/behaviour #15304

Problem description

1. Not clear what kind of custom function should be provided

2. Passing an arbitrary kwarg changes the function behaviour

3. Note on numpy special casing is confusing

Output of `pd.show_versions()`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

passing a function to DataFrameGroupBy.agg - confusing documentation/behaviour #15304

Description

Problem description

1. Not clear what kind of custom function should be provided

2. Passing an arbitrary kwarg changes the function behaviour

3. Note on numpy special casing is confusing

Output of pd.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`