# pandas

pandas はデータ分析によく用いられるパッケージであり、データの操作や解析、可視化などを行うための機能を提供します。

In [1]:
import pandas as pd

## pandasのデータ構造

pandasには```Series```と```DataFrame```の２つの種類のオブジェクト型があります。

### Series

```Series```は一次元の配列ようなオブジェクトです。```Series```には値とそれに関連付けられたインデックスというデータラベルの配列が含まれます。

In [27]:
obj=pd.Series([1,2,3,4,5])
obj

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [28]:
obj.values

array([1, 2, 3, 4, 5])

In [29]:
obj.index

RangeIndex(start=0, stop=5, step=1)

```Series```から一つの値や複数の値を参照する時にインデックスのラベルを使って指定することができます。

In [30]:
obj[0]

1

In [31]:
obj[[0,3]]

0    1
3    4
dtype: int64

条件指定によるフィルタリング、スカラー値の掛け算、数学的な関数の適用など```NumPy```の関数と似ているような操作が可能です。

そこで、インデックスと値の関連が常に保持されます。

In [32]:
obj[obj>3]

3    4
4    5
dtype: int64

In [33]:
obj*2

0     2
1     4
2     6
3     8
4    10
dtype: int64

辞書形式のデータから```Series```を作成することも可能です。

In [34]:
population_dict = {
    '東京': 13929286,
    '横浜': 3723392,
    '大阪': 2691004,
    '名古屋': 2390411,
    '札幌': 1952356,
    '神戸': 1538316,
    '京都': 1474570,
    '福岡': 1532591,
    '広島': 1196564,
    '仙台': 1098330
}

# 辞書からSeriesを作成
population_series = pd.Series(population_dict)

population_series

東京     13929286
横浜      3723392
大阪      2691004
名古屋     2390411
札幌      1952356
神戸      1538316
京都      1474570
福岡      1532591
広島      1196564
仙台      1098330
dtype: int64

In [35]:
population_series[["東京","仙台"]]

東京    13929286
仙台     1098330
dtype: int64

### DataFrame

データフレームはテーブル形式のデータ構造になって、行と列の両方のインデックスを持っています。

![](./Figure/dataframe.svg)


データフレームを作成する方法はたくさんありますが、最も一般的な方法は、同じ長さを持つリスト型のバリューを持った辞書から作成します。

In [36]:
data_dict = {
    'City': ['東京', '横浜', '大阪', '名古屋', '札幌', '神戸', '京都', '福岡', '広島', '仙台'],
    'Population': [13929286, 3723392, 2691004, 2390411, 1952356, 1538316, 1474570, 1532591, 1196564, 1098330],
    'Income': [754, 602, 615, 530, 535, 535, 490, 477, 457, 444]
}

# 辞書からDataFrameを作成
df = pd.DataFrame(data_dict)

作成されたデータフレームは、```Serise```と同じように自動的にインデックスが代入されます。

In [37]:
df

Unnamed: 0,City,Population,Income
0,東京,13929286,754
1,横浜,3723392,602
2,大阪,2691004,615
3,名古屋,2390411,530
4,札幌,1952356,535
5,神戸,1538316,535
6,京都,1474570,490
7,福岡,1532591,477
8,広島,1196564,457
9,仙台,1098330,444


## データの読み書き

pandasの特徴は、テーブル形式のデータをデータフレームオブジェクトとして読み込む[関数](https://pandas.pydata.org/docs/reference/io.html)がたくさんあることです。

例えば、CSV ファイルを読み込むための```pd.read_csv()``という関数が用意されています。 こちらを使って CSV ファイルを読み込みます。

In [168]:
df=pd.read_csv("./Data/titanic.csv")

In [169]:
df

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0000,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1
1,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0000,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.5000,1.0,0.0,2665,14.4542,,C,,328.0,,0
1305,3.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5000,0.0,0.0,2656,7.2250,,C,,304.0,,0
1307,3.0,"Zakarian, Mr. Ortin",male,27.0000,0.0,0.0,2670,7.2250,,C,,,,0


データフレームをファイルに書き出せます。

In [127]:
#df.to_csv("./Data/titanic.csv")

## 統計量の算出

データフレームには、中のデータに対し統計量を計算するための[メソッド](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mean.html)も用意されています。

In [170]:
df["age"].mean()

29.8811345124283

In [171]:
df["age"].var()

207.74897359969756

In [172]:
df["age"].sum()

31255.6667

In [173]:
# 頻度
df["age"].value_counts()

24.0000    47
22.0000    43
21.0000    41
30.0000    40
18.0000    39
           ..
0.3333      1
22.5000     1
70.5000     1
0.6667      1
26.5000     1
Name: age, Length: 98, dtype: int64

## インデックス参照、選択、フィルタリング

In [174]:
df["sex"]

0       female
1         male
2       female
3         male
4       female
         ...  
1304    female
1305    female
1306      male
1307      male
1308      male
Name: sex, Length: 1309, dtype: object

In [175]:
df[["sex","age"]]

Unnamed: 0,sex,age
0,female,29.0000
1,male,0.9167
2,female,2.0000
3,male,30.0000
4,female,25.0000
...,...,...
1304,female,14.5000
1305,female,
1306,male,26.5000
1307,male,27.0000


In [176]:
df[:5]

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2.0,,"St Louis, MO",1
1,1.0,"Allison, Master. Hudson Trevor",male,0.9167,1.0,2.0,113781,151.55,C22 C26,S,11.0,,"Montreal, PQ / Chesterville, ON",1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.55,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.55,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0


In [177]:
df[df["age"]>70]

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived
9,1.0,"Artagaveytia, Mr. Ramon",male,71.0,0.0,0.0,PC 17609,49.5042,,C,,22.0,"Montevideo, Uruguay",0
14,1.0,"Barkworth, Mr. Algernon Henry Wilson",male,80.0,0.0,0.0,27042,30.0,A23,S,B,,"Hessle, Yorks",1
61,1.0,"Cavendish, Mrs. Tyrell William (Julia Florence...",female,76.0,1.0,0.0,19877,78.85,C46,S,6,,"Little Onn Hall, Staffs",1
135,1.0,"Goldschmidt, Mr. George B",male,71.0,0.0,0.0,PC 17754,34.6542,A5,C,,,"New York, NY",0
727,3.0,"Connors, Mr. Patrick",male,70.5,0.0,0.0,370369,7.75,,Q,,171.0,,0
1235,3.0,"Svensson, Mr. Johan",male,74.0,0.0,0.0,347060,7.775,,S,,,,0


```loc```と```iloc```を使うことで、データフレームから行や列の一部分を選択することができます。
- 軸ラベルを使うときは```loc```
- 整数のインデックス位置による参照を使うときは```iloc```

In [178]:
df.loc[96,['name','age']]

name    Douglas, Mr. Walter Donald
age                           50.0
Name: 96, dtype: object

In [181]:
df.iloc[96,[1,3]]

name    Douglas, Mr. Walter Donald
age                           50.0
Name: 96, dtype: object

条件を指定して選択した要素に対し、値の書き換えを行うことができます。

In [182]:
df.loc[df["age"]<1, ['age']] = 1

````{tab-set}
```{tab-item} 実習問題
年齢が30-50歳の男性のデータを取り出し、Name, Age, Sexを示してください。
```
````

## ソート

pandasは、データを一定の基準でソートする機能を提供しています。

行や列のインデックスをソートするためには、```sort_index()```メソッドを使います。

In [139]:
df.sort_index(ascending=False)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.7500,,Q
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [140]:
df.sort_index(axis=1)

Unnamed: 0,Age,Cabin,Embarked,Fare,Name,Parch,PassengerId,Pclass,Sex,SibSp,Survived,Ticket
0,22.0,,S,7.2500,"Braund, Mr. Owen Harris",0,1,3,male,1,0,A/5 21171
1,38.0,C85,C,71.2833,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,2,1,female,1,1,PC 17599
2,26.0,,S,7.9250,"Heikkinen, Miss. Laina",0,3,3,female,0,1,STON/O2. 3101282
3,35.0,C123,S,53.1000,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",0,4,1,female,1,1,113803
4,35.0,,S,8.0500,"Allen, Mr. William Henry",0,5,3,male,0,0,373450
...,...,...,...,...,...,...,...,...,...,...,...,...
886,27.0,,S,13.0000,"Montvila, Rev. Juozas",0,887,2,male,0,0,211536
887,19.0,B42,S,30.0000,"Graham, Miss. Margaret Edith",0,888,1,female,0,1,112053
888,,,S,23.4500,"Johnston, Miss. Catherine Helen ""Carrie""",2,889,3,female,1,0,W./C. 6607
889,26.0,C148,C,30.0000,"Behr, Mr. Karl Howell",0,890,1,male,0,1,111369


値によってソートするためには、```sort_values()```メソッドを使います。

デフォルトでは、欠損値が末尾にソートされます。

In [183]:
df.sort_values(by="age")

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived
427,2.0,"Hamalainen, Master. Viljo",male,1.0,1.0,1.0,250649,14.5000,,S,4,,"Detroit, MI",1
895,3.0,"Johnson, Miss. Eleanor Ileen",female,1.0,1.0,1.0,347742,11.1333,,S,15,,,1
826,3.0,"Goodwin, Master. Sidney Leonard",male,1.0,5.0,2.0,CA 2144,46.9000,,S,,,"Wiltshire, England Niagara Falls, NY",0
1111,3.0,"Peacock, Master. Alfred Edward",male,1.0,1.0,1.0,SOTON/O.Q. 3101315,13.7750,,S,,,,0
478,2.0,"Laroche, Miss. Louise",female,1.0,1.0,2.0,SC/Paris 2123,41.5792,,C,14,,Paris / Haiti,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1293,3.0,"Williams, Mr. Howard Hugh 'Harry'",male,,0.0,0.0,A/5 2466,8.0500,,S,,,,0
1297,3.0,"Wiseman, Mr. Phillippe",male,,0.0,0.0,A/4. 34244,7.2500,,S,,,,0
1302,3.0,"Yousif, Mr. Wazli",male,,0.0,0.0,2647,7.2250,,C,,,,0
1303,3.0,"Yousseff, Mr. Gerious",male,,0.0,0.0,2627,14.4583,,C,,,,0


```sort_values()```に複数なソートキーを指定することもできます。

In [184]:
df.sort_values(by=["age","embarked"])

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived
478,2.0,"Laroche, Miss. Louise",female,1.0,1.0,2.0,SC/Paris 2123,41.5792,,C,14,,Paris / Haiti,1
492,2.0,"Mallet, Master. Andre",male,1.0,0.0,2.0,S.C./PARIS 2079,37.0042,,C,10,,"Paris / Montreal, PQ",1
657,3.0,"Baclini, Miss. Eugenie",female,1.0,2.0,1.0,2666,19.2583,,C,C,,"Syria New York, NY",1
658,3.0,"Baclini, Miss. Helene Barbara",female,1.0,2.0,1.0,2666,19.2583,,C,C,,"Syria New York, NY",1
1048,3.0,"Nakid, Miss. Maria ('Mary')",female,1.0,0.0,2.0,2653,15.7417,,C,C,,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1284,3.0,"Webber, Mr. James",male,,0.0,0.0,SOTON/OQ 3101316,8.0500,,S,,,,0
1291,3.0,"Willer, Mr. Aaron ('Abi Weller')",male,,0.0,0.0,3410,8.7125,,S,,,,0
1292,3.0,"Willey, Mr. Edward",male,,0.0,0.0,S.O./P.P. 751,7.5500,,S,,,,0
1293,3.0,"Williams, Mr. Howard Hugh 'Harry'",male,,0.0,0.0,A/5 2466,8.0500,,S,,,,0


## マッピング

- ```map()```メソッドは、Seriesオブジェクト内の各要素に対して、指定した辞書や関数を適用して新しい値を返す方法です。

In [185]:
male={"male":0,
      "female":1}
df["male"]=df["sex"].map(male)
df

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1,1
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1,0
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,1
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,0
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.5,1.0,0.0,2665,14.4542,,C,,328.0,,0,1
1305,3.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,,0,1
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656,7.2250,,C,,304.0,,0,0
1307,3.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670,7.2250,,C,,,,0,0


In [186]:
def male_dummay(sex):
    if sex=="male":
        return 1
    elif sex=="female":
        return 0
    else:
        return sex
    
df["male"]=df["sex"].map(male_dummay)
df

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1,1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.5,1.0,0.0,2665,14.4542,,C,,328.0,,0,0
1305,3.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656,7.2250,,C,,304.0,,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670,7.2250,,C,,,,0,1


- ```apply()```メソッドは、DataFrameオブジェクトの列に対して関数を適用する方法です。

In [187]:
def male_dummay(sex):
    if sex=="male":
        return 1
    elif sex=="female":
        return 0
    else:
        return sex
    
df["male"]=df["sex"].apply(male_dummay)
df

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1,1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.5,1.0,0.0,2665,14.4542,,C,,328.0,,0,0
1305,3.0,"Zabour, Miss. Thamine",female,,1.0,0.0,2665,14.4542,,C,,,,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656,7.2250,,C,,304.0,,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670,7.2250,,C,,,,0,1


````{tab-set}
```{tab-item} 実習問題1
ラムダ関数を適用する形で男性ダミー変数を作成しよう。
```

```{tab-item} 実習問題2
Cabin変数によって、頭文字で新しいカテゴリ変数を作成しよう。例えば、C105の場合はCに変更します。
```

````

## 欠損値の取扱い
欠損値を含むデータの場合、一部の行の値が欠損している列に ```NaN``` (Not a Number)、```None```、```NaT``` (Not a Time) などが含まれる場合があります。 
### 欠損値を削除する

In [188]:
# df[df.notnull()]
df.dropna()

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male


行ではなく列を削除する場合は、```axis=1```を指定します。

In [189]:
df.dropna(axis=1)

Unnamed: 0,pclass,name,sex,sibsp,parch,ticket,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,0.0,0.0,24160,1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,2.0,113781,1,1
2,1.0,"Allison, Miss. Helen Loraine",female,1.0,2.0,113781,0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,1.0,2.0,113781,0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,1.0,2.0,113781,0,0
...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,1.0,0.0,2665,0,0
1305,3.0,"Zabour, Miss. Thamine",female,1.0,0.0,2665,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,0.0,0.0,2656,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,0.0,0.0,2670,0,1


特定の列に基づく欠損値を削除することも可能です。

In [191]:
df.dropna(subset=["age"])

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1,1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1301,3.0,"Youseff, Mr. Gerious",male,45.5,0.0,0.0,2628,7.2250,,C,,312.0,,0,1
1304,3.0,"Zabour, Miss. Hileni",female,14.5,1.0,0.0,2665,14.4542,,C,,328.0,,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656,7.2250,,C,,304.0,,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670,7.2250,,C,,,,0,1


### 欠損値の穴埋め

欠損値への対策としては、欠損値を特定の値で補完するという方法が考えられます。

```fillna```メソッドに何らかの値を引数として与えて呼び出すと、その値で欠損値を置き換えることができます。

In [192]:
df.fillna(0)

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.0,0.0,0.0,24160,211.3375,B5,S,2,0.0,"St Louis, MO",1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.0,1.0,2.0,113781,151.5500,C22 C26,S,11,0.0,"Montreal, PQ / Chesterville, ON",1,1
2,1.0,"Allison, Miss. Helen Loraine",female,2.0,1.0,2.0,113781,151.5500,C22 C26,S,0,0.0,"Montreal, PQ / Chesterville, ON",0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1.0,2.0,113781,151.5500,C22 C26,S,0,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1.0,2.0,113781,151.5500,C22 C26,S,0,0.0,"Montreal, PQ / Chesterville, ON",0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.5,1.0,0.0,2665,14.4542,0,C,0,328.0,0,0,0
1305,3.0,"Zabour, Miss. Thamine",female,0.0,1.0,0.0,2665,14.4542,0,C,0,0.0,0,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.5,0.0,0.0,2656,7.2250,0,C,0,304.0,0,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,27.0,0.0,0.0,2670,7.2250,0,C,0,0.0,0,0,1


平均を使った欠損値の補完する方法がよく用いられます。

In [193]:
mean_age = df["age"].mean() # まずは、補完に使用する平均値の計算を行います
df["age"].fillna(mean_age,inplace=True) # inplace=Trueで元のデータを変更
df

Unnamed: 0,pclass,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,survived,male
0,1.0,"Allen, Miss. Elisabeth Walton",female,29.000000,0.0,0.0,24160,211.3375,B5,S,2,,"St Louis, MO",1,0
1,1.0,"Allison, Master. Hudson Trevor",male,1.000000,1.0,2.0,113781,151.5500,C22 C26,S,11,,"Montreal, PQ / Chesterville, ON",1,1
2,1.0,"Allison, Miss. Helen Loraine",female,2.000000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
3,1.0,"Allison, Mr. Hudson Joshua Creighton",male,30.000000,1.0,2.0,113781,151.5500,C22 C26,S,,135.0,"Montreal, PQ / Chesterville, ON",0,1
4,1.0,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.000000,1.0,2.0,113781,151.5500,C22 C26,S,,,"Montreal, PQ / Chesterville, ON",0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1304,3.0,"Zabour, Miss. Hileni",female,14.500000,1.0,0.0,2665,14.4542,,C,,328.0,,0,0
1305,3.0,"Zabour, Miss. Thamine",female,29.884799,1.0,0.0,2665,14.4542,,C,,,,0,0
1306,3.0,"Zakarian, Mr. Mapriededer",male,26.500000,0.0,0.0,2656,7.2250,,C,,304.0,,0,1
1307,3.0,"Zakarian, Mr. Ortin",male,27.000000,0.0,0.0,2670,7.2250,,C,,,,0,1


## データのグループ化