# Pandas GroupBy Operations

本Note目標
- 了解GroupBy的Background運作情形
- 了解GroupBy Objects是怎樣的東西

## Understanding GroupBy Objects

In [1]:
import pandas as pd

In [2]:
titanic = pd.read_csv("../data/titanic_ver01.csv")

In [3]:
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Cabin
0,1,0,3,male,22.0,1,0,7.25,S,
1,2,1,1,female,38.0,1,0,71.2833,C,C
2,3,1,3,female,26.0,0,0,7.925,S,
3,4,1,1,female,35.0,1,0,53.1,S,C
4,5,0,3,male,35.0,0,0,8.05,S,


In [4]:
titanic.tail()

Unnamed: 0,PassengerId,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked,Cabin
886,887,0,2,male,27.0,0,0,13.0,S,
887,888,1,1,female,19.0,0,0,30.0,S,B
888,889,0,3,female,,1,2,23.45,S,
889,890,1,1,male,26.0,0,0,30.0,C,C
890,891,0,3,male,32.0,0,0,7.75,Q,


In [5]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Fare           891 non-null float64
Embarked       889 non-null object
Cabin          204 non-null object
dtypes: float64(2), int64(5), object(3)
memory usage: 69.7+ KB


In [6]:
# titanic.columns #可以看欄位的順序。
titanic_slice = titanic.iloc[:10, [3,4]]
# titanic_slice = titanic.loc[:9, ["Sex", "Age"]] #這樣才會和上面一樣!!

In [7]:
titanic_slice

Unnamed: 0,Sex,Age
0,male,22.0
1,female,38.0
2,female,26.0
3,female,35.0
4,male,35.0
5,male,
6,male,54.0
7,male,2.0
8,female,27.0
9,female,14.0


In [8]:
titanic_slice.groupby(by = "Sex") #可以看到這是一個GroupBy Object

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000021F76410E80>

In [9]:
gbo = titanic_slice.groupby("Sex")

In [10]:
type(gbo)

pandas.core.groupby.generic.DataFrameGroupBy

In [11]:
gbo.groups

{'female': Int64Index([1, 2, 3, 8, 9], dtype='int64'),
 'male': Int64Index([0, 4, 5, 6, 7], dtype='int64')}

---
可以transform our GroupBy Object to a list

In [12]:
l =list(gbo)

In [13]:
l

[('female',       Sex   Age
  1  female  38.0
  2  female  26.0
  3  female  35.0
  8  female  27.0
  9  female  14.0), ('male',     Sex   Age
  0  male  22.0
  4  male  35.0
  5  male   NaN
  6  male  54.0
  7  male   2.0)]

可以看到有兩個element，也可以用len()去看。

In [14]:
len(l)

2

抓出第一個element。

In [15]:
l[0]

('female',       Sex   Age
 1  female  38.0
 2  female  26.0
 3  female  35.0
 8  female  27.0
 9  female  14.0)

因為有小括弧，所以感覺是tuple~

In [16]:
type(l[0])

tuple

選出tuple的第一個element。  
第二個element看起來像是dataframe

In [17]:
l[0][0]

'female'

In [18]:
l[0][1]

Unnamed: 0,Sex,Age
1,female,38.0
2,female,26.0
3,female,35.0
8,female,27.0
9,female,14.0


注意到第二個element是dataframe，且index保留原始dataframe的index

In [19]:
type(l[0][1])

pandas.core.frame.DataFrame

In [20]:
l[1]

('male',     Sex   Age
 0  male  22.0
 4  male  35.0
 5  male   NaN
 6  male  54.0
 7  male   2.0)

---
### 小總結
所以GroupBy實際做的就是split our dataframe into two dataframe  
所以當然我們還可以繼續slice下去，抓出我們感興趣的那一部分。

In [21]:
titanic_slice.loc[titanic_slice.Sex == "female"]

Unnamed: 0,Sex,Age
1,female,38.0
2,female,26.0
3,female,35.0
8,female,27.0
9,female,14.0


In [22]:
titanic_slice_f = titanic_slice.loc[titanic_slice.Sex == "female"]
titanic_slice_f

Unnamed: 0,Sex,Age
1,female,38.0
2,female,26.0
3,female,35.0
8,female,27.0
9,female,14.0


In [23]:
titanic_slice_m = titanic_slice.loc[titanic_slice.Sex == "female"]
titanic_slice_m

Unnamed: 0,Sex,Age
1,female,38.0
2,female,26.0
3,female,35.0
8,female,27.0
9,female,14.0


---
### 確認GroupBy切出是否相同
來確認一下切出來是不是真的是我們要的那一部分!!

In [24]:
titanic_slice_f.equals(l[0][1])

True

---
### 把GroupBy的每一組都列印出來

In [25]:
# 印出每一組GroupBy的第一個element，index = 0
# for element in gbo:
#     print(element[0])
# 
# 印出每一組GroupBy的第二個element，index = 1
for element in gbo:
    print(element[1]) 

      Sex   Age
1  female  38.0
2  female  26.0
3  female  35.0
8  female  27.0
9  female  14.0
    Sex   Age
0  male  22.0
4  male  35.0
5  male   NaN
6  male  54.0
7  male   2.0


![Creating_a_GroupBy_object_the_split](../pic/Creating_a_GroupBy_object_the_split.jpg "Creating a GroupBy object_the split")

split by the key "Sex"!!

---
## Splitting with many Keys

split dataframe by 許多的欄位~  
下一節GOGO