By "group by" we are referring to a process involving one or more  of the following step
- Splitting the data into groups based on some criteria
- Applying a function to each group independently
- Combining the results into a data structure

of these, the split step is the most straightforward. in fact, in many situations you may wish to split the data set into groups and do someting with those groups yourself. in the apply step, we might wish to following:

- Aggregation: computing a summary statistic (or statistic) about each group. Some examples:
     - Compute group sums or means
     - Compute group sizes / counts

- Transfromation: perform some group-specific computations and return a like-indexed. Some example
    - Standardizing data ( zscore) within group
    - Filling NAs within groups with a value derived from each group

- Filtation: discard sum groups, according to group-wise computation that evaluate True or Fasle. Some example
    - Discarding data that belongs to groups with only a few members
    - Filtering out data based on the group sum or mean
    

- Some combination of the above: GroupBy will examine the results of the apply step and try to return a sensibly combined result if it dosen't fit into either of the above two categories 

Since the set of object instance method on pandas data sturcture are generally rice and expressvie, we often simple want to invoke, say , a DataFrame function on each group. The name GroupBu should be quite familiar to those whe have used a SQL-based tool (or itertools), in which you can write code like:


'''
SELECT column1, column2
FROM someTable
GROUP BY column1
'''

we aim to make operation like the natural and easy to express using pandas. We'll address each area of GroupBy functionality then provied some none-trivial examples/ use cases

### Spltting an object into groups

pandas object can be split on any their axes. The abstract definition of grouping is to provide a mapping of lables to group names. To create a GroupBy object(more on what the GroupBy object is later), you do folling

> grouped = obj.groupby(key)

> grouped = obj.groupby(key, axis=1)

> goruped = obj.groupby([key1, key2])

The mapping can be specified many different ways:
    - A Python function, to be called on each of the axis labels
    - A list or NumPy arrays of the same length as the seleced axis
    - a dict or Sereis, providing a "label" -> "group name" mapping
    - For DataFrame object, a string indicating a columns to be used to group. of coure df.groupby('A') is just syntatic sugar for df.groupby(df['A']), but it makes life simpler
    - A list of any of the above things
   
Collectively, we refre to grouping objects as the keys. For example. consider following DataFrame:

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'tow', 'two', 'one', 'three'],
                    'C': np.random.randn(8),
                    'D': np.random.randn(8),
                  })
df

Unnamed: 0,A,B,C,D
0,foo,one,-1.124636,0.162121
1,bar,one,1.302839,0.989354
2,foo,two,0.173484,-0.652413
3,bar,three,-0.092287,-2.535168
4,foo,tow,1.69513,-0.070428
5,bar,two,-0.728722,1.378912
6,foo,one,1.475332,0.770038
7,foo,three,0.474175,-0.751786


We could naturally group by either A or B columns or both:

In [3]:
grouped = df.groupby('A')
grouped = df.groupby(['A', 'B'])

these will split the DataFrame on its index(rows). we could split by the columns

In [10]:
def get_letter_type(letter):
    print letter
    if letter.lower() in 'aeiou':
        return 'vowel'
    else:
        return 'consonant'

print get_letter_type('a')
print get_letter_type('b')
# print get_letter_type(1)

a
vowel
b
consonant


In [13]:
grouped = df.groupby(get_letter_type, axis=1)
#grouped = df.groupby(get_letter_type, axis=0) index 가 들어간다 칼럼이 아니라

A
B
C
D


Starting with 0.8 pandas index object now supports duplicate values. if a non-unique index used as the group key in a groupby operation. all values for the same index value  will be considered to be in one group and the the output of aggregation functions will be only contain unique index values:

In [32]:
lst = [1, 2, 3, 1, 2, 3]
s = pd.Series([4,20,6,10,5,30], lst)
grouped = s.groupby(level=0)# if mulit index level 0 is the first colum
print grouped.first()# 각 그룹의 첫번째 로우를 리턴 return first row at each group
print grouped.last()# 각 그룹의 마지막 로우를 리턴 return last row at each group
print grouped.sum()

1     4
2    20
3     6
dtype: int64
1    10
2     5
3    30
dtype: int64
1    14
2    25
3    36
dtype: int64


Note that no splitting occurs until it's need. Creating the GroupBy object only verify that you've passed a valid mapping

### Group By sorting

By default the group keys are sorting during group by operation. You may however pass sort=False for potentail speedups:

In [37]:
df2 = pd.DataFrame({'X' : ['B','A', 'B', 'A', 'A'], 'Y' : [1, 10, 2, 3, 4]})
print df2

print df2.groupby('X').sum()
print df2.groupby('X', sort=False).sum()

   X   Y
0  B   1
1  A  10
2  B   2
3  A   3
4  A   4
    Y
X    
A  17
B   3
    Y
X    
B   3
A  17


Note that groupby will be preserve the order in which observations are sorted within each group. For eaxmple, the groups created by groupby() below are in the order the appeared in the original DataFrame

In [42]:
df3 = pd.DataFrame({'X' : ['A', 'B', 'A', 'B'], 'Y' : [1, 4, 3, 2]})
print df3.groupby('X').get_group('A')
print df3.groupby('X').get_group('B')


   X  Y
0  A  1
2  A  3
   X  Y
1  B  4
3  B  2


### GroupBy objects Attributes

the groups attribute is a dict whose keys are the computed unique groups and corresponging values being the axis labels beloinging to each group. in the above example we have:

In [48]:
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',
                         'foo', 'bar', 'foo', 'foo'],
                   'B': ['one', 'one', 'two', 'three',
                         'tow', 'two', 'one', 'three'],
                    'C': np.random.randn(8),
                    'D': np.random.randn(8)
                   })
print df
print df.groupby('A').groups
print df.groupby(get_letter_type, axis=1).groups

     A      B         C         D
0  foo    one  0.681416 -0.503083
1  bar    one  0.939797  0.530915
2  foo    two  0.134388 -1.002735
3  bar  three -1.987432 -0.006334
4  foo    tow  1.516278 -0.526926
5  bar    two -0.973147 -0.081288
6  foo    one -0.462026 -0.167095
7  foo  three -1.694132  0.255509
{'foo': [0, 2, 4, 6, 7], 'bar': [1, 3, 5]}
A
B
C
D
{'consonant': ['B', 'C', 'D'], 'vowel': ['A']}


calling the standard Python len function on the groupby object just return the length of the groups dict. so it is largely just a convenience

In [52]:
grouped = df.groupby(['A', 'B'])
print grouped.groups
print len(grouped)

{('foo', 'three'): [7], ('bar', 'two'): [5], ('foo', 'one'): [0, 6], ('bar', 'one'): [1], ('foo', 'tow'): [4], ('bar', 'three'): [3], ('foo', 'two'): [2]}
7


GroupBy will tab complete colum names (and other attributes)