[ It's difficult to predict what DataFrame.groupby().apply() will return: #9867 ](https://github.com/pydata/pandas/issues/9867)

To summarize, in ticket 9867 there are six code examples.  The examples are the argument for `apply()`

1) `lambda x:x`

2) `lambda x:x[:]`

3) `lambda x:x.b + x.c`

4) `lambda x:(x.b + x.c).reset_index(drop=True)`

5) `lambda x:(x.b + x.c).to_frame()`

6) `lambda x:(x.b + x.c).to_frame()[:]`

7) `lambda x:x[["b", "c"]]`

###  Create a basic dataframe

In [67]:
##  setting up, and creating a dataset

import pandas as pd
import numpy as np

data = ({"a":[0,1,2,4], "b":[5,8,9,10], "c":[15,16,17,18], "d": [20,25,21,30]})

In [68]:
df = pd.DataFrame(data)
df

Unnamed: 0,a,b,c,d
0,0,5,15,20
1,1,8,16,25
2,2,9,17,21
3,4,10,18,30


###  Example 1: 

`lambda x:x`

This is an identity example, because the `Dataframe` returns itself

In [16]:
df_example1 = df.groupby('a').apply(lambda x:x)
df_example1

Unnamed: 0,a,b,c,d
0,0,5,15,20
1,1,8,16,25
2,2,9,17,21
3,4,10,18,30


In [17]:
df == df_example1

Unnamed: 0,a,b,c,d
0,True,True,True,True
1,True,True,True,True
2,True,True,True,True
3,True,True,True,True


###  Example 2:
`apply(lambda x:x[:])`

Shows that indexing changes, compared to example 1.

In [15]:
df_example2 = df.groupby("a").apply(lambda x:x[:])
df_example2

Unnamed: 0_level_0,Unnamed: 1_level_0,a,b,c,d
a,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0,0,5,15,20
1,1,1,8,16,25
2,2,2,9,17,21
4,3,4,10,18,30


In [12]:
##  Getting the indices of df and df_example2

df_index = df.index
df_example2_index = df_example2.index

In [73]:
##  The indices are of different type

print type(df_index)
print type(df_example2_index)

<class 'pandas.core.index.Int64Index'>
<class 'pandas.core.index.MultiIndex'>


###  Example 3:  

`lambda x:x.b + x.c`

Summing two keys within the `apply()`

Shows the index example, which seems to me similar to Example 2

In [69]:
df_example3 = df.groupby('a').apply(lambda x:x.b + x.c)
df_example3

a   
0  0    20
1  1    24
2  2    26
4  3    28
dtype: int64

In [74]:
df_example3_index = df_example3.index

In [76]:
df_example2_index

MultiIndex(levels=[[0, 1, 2, 4], [0, 1, 2, 3]],
           labels=[[0, 1, 2, 3], [0, 1, 2, 3]],
           names=[u'a', None])

In [77]:
df_example3_index

MultiIndex(levels=[[0, 1, 2, 4], [0, 1, 2, 3]],
           labels=[[0, 1, 2, 3], [0, 1, 2, 3]],
           names=[u'a', None])

In [75]:
df_example2_index == df_example3_index

AttributeError: 'NoneType' object has no attribute 'view'

In [71]:
##  I could not compare indexes directly, because of a None error.  So I tried to convert them to tuples.

ex3 = tuple(df_example3_index)
ex2 = tuple(df_example2_index)

In [72]:
##  The tuples of the indexes are equal to one another

ex2 == ex3

True

###  Example 4: 
`apply(lambda x:(x.b + x.c).reset_index(drop=True))`

In this example, the values are series.  But I checked and that is true for Example1.  So I am not really sure this is an example of new behavior?  Talking to Phil about this one.

Example 4 seems like Example3 and then calling reset_index

In [31]:
df_example4 = df.groupby("a").apply(lambda x:(x.b + x.c).reset_index(drop=True))
df_example4

Unnamed: 0_level_0,0
a,Unnamed: 1_level_1
0,20
1,24
2,26
4,28


In [42]:
type(df_example4)

pandas.core.frame.DataFrame

In [54]:
print type(df.ix[2])
print type(df_example4.ix[2])
df_example4.ix[2]

<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


0    26
Name: 2, dtype: int64

###  Example 5: 

`lambda x:(x.b + x.c).to_frame()`

The behavior in this example shows that a group index can be made to dissappear by calling `to_frame()` from within the `apply()` statement.

Should you called a `to_frame` from within an `apply()` or should it be called afterwards?

Seems like Example3 again, and then calling `to_frame()`

In [63]:
##  First, example three again

df_example3

a   
0  0    20
1  1    24
2  2    26
4  3    28
dtype: int64

In [64]:
df_example5 = df.groupby("a").apply(lambda x:(x.b + x.c).to_frame())
df_example5

Unnamed: 0,0
0,20
1,24
2,26
3,28


In [65]:
df_my_example = df.groupby("a").apply(lambda x:(x.b + x.c)).to_frame()
df_my_example

Unnamed: 0_level_0,Unnamed: 1_level_0,0
a,Unnamed: 1_level_1,Unnamed: 2_level_1
0,0,20
1,1,24
2,2,26
4,3,28


### Example 6: sorting again in `lambda`, and key 'a' dissappears

`lambda x:x[["b", "c"]]`

So you can do this, and get the `'a' key` to drop.  

Not sure what the behaviour should be here.  I think the code is saying drop the `'a' key` which it does.  

So the preference would be to not allow the key to be dropped? Talked to Phil, and he said he needs to discuss this one with the other panda devs.

In [78]:
df_example6 = df.groupby("a").apply(lambda x:x[["b", "c"]])
df_example6

Unnamed: 0,b,c
0,5,15
1,8,16
2,9,17
3,10,18
