## STEP 1: 문제 정의 및 목표 설정

**목표:**
- 타이타닉호 승객데이터를 기반으로 생존에 영향을 미치는 요인을 분석합니다.

**주요 과제:**
- 기본 EDA (Exploratory Data Analysis)
- 데이터 전처리 (불필요한 데이터 삭제, 추가, 변경)
- 인사이트 발굴

## STEP 2: 모듈 import

In [2]:
from IPython.display import Image
import numpy as np
import pandas as pd
import seaborn as sns

## STEP 3: 데이터셋 로드

In [3]:
df = sns.load_dataset('titanic')
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


**컬럼(columns) 설명**

- survivied: 생존여부 (1: 생존, 0: 사망)
- pclass: 좌석 등급 (1등급, 2등급, 3등급)
- sex: 성별
- age: 나이
- sibsp: 형제 + 배우자 수
- parch: 부모 + 자녀 수
- fare: 좌석 요금
- embarked: 탑승 항구 (S, C, Q)
- class: pclass와 동일
- who: 성별과 동일
- adult_male: 성인 남자 여부
- deck: 데크 번호 (알파벳 + 숫자 혼용)
- embark_town: 탑승 항구 이름
- alive: 생존여부 (yes, no)
- alone: 혼자 탑승 여부

## STEP 4: 기본 데이터 조회

### 상위 5개의 행을 출력

In [4]:
df.head(5)


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>survived</th>
      <th>pclass</th>
      <th>sex</th>
      <th>age</th>
      <th>sibsp</th>
      <th>parch</th>
      <th>fare</th>
      <th>embarked</th>
      <th>class</th>
      <th>who</th>
      <th>adult_male</th>
      <th>deck</th>
      <th>embark_town</th>
      <th>alive</th>
      <th>alone</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>22.0</td>
      <td>1</td>
      <td>0</td>
      <td>7.2500</td>
      <td>S</td>
      <td>Third</td>
      <td>man</td>
      <td>True</td>
      <td>NaN</td>
      <td>Southampton</td>
      <td>no</td>
      <td>False</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>38.0</td>
      <td>1</td>
      <td>0</td>
      <td>71.2833</td>
      <td>C</td>
      <td>First</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>Cherbourg</td>
      <td>yes</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>3</td>
      <td>female</td>
      <td>26.0</td>
      <td>0</td>
      <td>0</td>
      <td>7.9250</td>
      <td>S</td>
      <td>Third</td>
      <td>woman</td>
      <td>False</td>
      <td>NaN</td>
      <td>Southampton</td>
      <td>yes</td>
      <td>True</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>35.0</td>
      <td>1</td>
      <td>0</td>
      <td>53.1000</td>
      <td>S</td>
      <td>First</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>Southampton</td>
      <td>yes</td>
      <td>False</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>35.0</td>
      <td>0</td>
      <td>0</td>
      <td>8.0500</td>
      <td>S</td>
      <td>Third</td>
      <td>man</td>
      <td>True</td>
      <td>NaN</td>
      <td>Southampton</td>
      <td>no</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>

### 하위 5개의 행을 출력

In [5]:
# 코드를 입력해 주세
df.tail(6)


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
885,0,3,female,39.0,0,5,29.125,Q,Third,woman,False,,Queenstown,no,False
886,0,2,male,27.0,0,0,13.0,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.45,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0,C,First,man,True,C,Cherbourg,yes,True
890,0,3,male,32.0,0,0,7.75,Q,Third,man,True,,Queenstown,no,True


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>survived</th>
      <th>pclass</th>
      <th>sex</th>
      <th>age</th>
      <th>sibsp</th>
      <th>parch</th>
      <th>fare</th>
      <th>embarked</th>
      <th>class</th>
      <th>who</th>
      <th>adult_male</th>
      <th>deck</th>
      <th>embark_town</th>
      <th>alive</th>
      <th>alone</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>886</th>
      <td>0</td>
      <td>2</td>
      <td>male</td>
      <td>27.0</td>
      <td>0</td>
      <td>0</td>
      <td>13.00</td>
      <td>S</td>
      <td>Second</td>
      <td>man</td>
      <td>True</td>
      <td>NaN</td>
      <td>Southampton</td>
      <td>no</td>
      <td>True</td>
    </tr>
    <tr>
      <th>887</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>19.0</td>
      <td>0</td>
      <td>0</td>
      <td>30.00</td>
      <td>S</td>
      <td>First</td>
      <td>woman</td>
      <td>False</td>
      <td>B</td>
      <td>Southampton</td>
      <td>yes</td>
      <td>True</td>
    </tr>
    <tr>
      <th>888</th>
      <td>0</td>
      <td>3</td>
      <td>female</td>
      <td>NaN</td>
      <td>1</td>
      <td>2</td>
      <td>23.45</td>
      <td>S</td>
      <td>Third</td>
      <td>woman</td>
      <td>False</td>
      <td>NaN</td>
      <td>Southampton</td>
      <td>no</td>
      <td>False</td>
    </tr>
    <tr>
      <th>889</th>
      <td>1</td>
      <td>1</td>
      <td>male</td>
      <td>26.0</td>
      <td>0</td>
      <td>0</td>
      <td>30.00</td>
      <td>C</td>
      <td>First</td>
      <td>man</td>
      <td>True</td>
      <td>C</td>
      <td>Cherbourg</td>
      <td>yes</td>
      <td>True</td>
    </tr>
    <tr>
      <th>890</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>32.0</td>
      <td>0</td>
      <td>0</td>
      <td>7.75</td>
      <td>Q</td>
      <td>Third</td>
      <td>man</td>
      <td>True</td>
      <td>NaN</td>
      <td>Queenstown</td>
      <td>no</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>

### 데이터는 몇개의 행과 열로 이루어져 있는지 확인

In [6]:
# 코드를 입력해 주세요
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB


<p><strong>[출력 결과]</strong></p><pre>(891, 15)</pre>

### 컬럼 별 데이터의 dtype과 개수를 확인

In [None]:
# 코드를 입력해 주세요




<p><strong>[출력 결과]</strong></p><pre><class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dtype   
---  ------       --------------  -----   
 0   survived     891 non-null    int64   
 1   pclass       891 non-null    int64   
 2   sex          891 non-null    object  
 3   age          714 non-null    float64 
 4   sibsp        891 non-null    int64   
 5   parch        891 non-null    int64   
 6   fare         891 non-null    float64 
 7   embarked     889 non-null    object  
 8   class        891 non-null    category
 9   who          891 non-null    object  
 10  adult_male   891 non-null    bool    
 11  deck         203 non-null    category
 12  embark_town  889 non-null    object  
 13  alive        891 non-null    object  
 14  alone        891 non-null    bool    
dtypes: bool(2), category(2), float64(2), int64(4), object(5)
memory usage: 80.6+ KB
</pre>

### 데이터의 컬럼별 결측치를 확인

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64</pre>

### 생존자와 사망자의 분포를 확인해 주세요

0: 사망, 1: 생존

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>0    549
1    342
Name: survived, dtype: int64</pre>

## STEP 5: 탐색적 데이터 분석 (EDA)

### 항구별 생존자

[힌트]: `groupby()`를 활용하세요.

#### 합계

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>embarked
C     93
Q     30
S    217
Name: survived, dtype: int64</pre>

#### 생존율

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>embarked
C    0.553571
Q    0.389610
S    0.336957
Name: survived, dtype: float64</pre>

#### 합계 & 생존율 동시 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>embarked</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>C</th>
      <td>93</td>
      <td>0.553571</td>
    </tr>
    <tr>
      <th>Q</th>
      <td>30</td>
      <td>0.389610</td>
    </tr>
    <tr>
      <th>S</th>
      <td>217</td>
      <td>0.336957</td>
    </tr>
  </tbody>
</table>
</div>

### 성별 합계 & 생존율 동시 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>sex</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>female</th>
      <td>233</td>
      <td>0.742038</td>
    </tr>
    <tr>
      <th>male</th>
      <td>109</td>
      <td>0.188908</td>
    </tr>
  </tbody>
</table>
</div>

### 혼자인 사람과 혼자가 아닌 경우 합계 & 생존율 동시 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>alone</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>False</th>
      <td>179</td>
      <td>0.505650</td>
    </tr>
    <tr>
      <th>True</th>
      <td>163</td>
      <td>0.303538</td>
    </tr>
  </tbody>
</table>
</div>

### 등급(pclass)별 생존자 합계 & 생존율 동시 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>pclass</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>1</th>
      <td>136</td>
      <td>0.629630</td>
    </tr>
    <tr>
      <th>2</th>
      <td>87</td>
      <td>0.472826</td>
    </tr>
    <tr>
      <th>3</th>
      <td>119</td>
      <td>0.242363</td>
    </tr>
  </tbody>
</table>
</div>

### 성별, 등급별 생존자 합계 & 생존율 동시 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>sex</th>
      <th>pclass</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">female</th>
      <th>1</th>
      <td>91</td>
      <td>0.968085</td>
    </tr>
    <tr>
      <th>2</th>
      <td>70</td>
      <td>0.921053</td>
    </tr>
    <tr>
      <th>3</th>
      <td>72</td>
      <td>0.500000</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">male</th>
      <th>1</th>
      <td>45</td>
      <td>0.368852</td>
    </tr>
    <tr>
      <th>2</th>
      <td>17</td>
      <td>0.157407</td>
    </tr>
    <tr>
      <th>3</th>
      <td>47</td>
      <td>0.135447</td>
    </tr>
  </tbody>
</table>
</div>

### 성별, pclass 별 생존자 합계(sum)를 출력 하세요

- `pivot_table()`를 사용합니다.

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead tr th {
        text-align: left;
    }

    .dataframe thead tr:last-of-type th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr>
      <th></th>
      <th colspan="3" halign="left">sum</th>
      <th colspan="3" halign="left">mean</th>
    </tr>
    <tr>
      <th>pclass</th>
      <th>1</th>
      <th>2</th>
      <th>3</th>
      <th>1</th>
      <th>2</th>
      <th>3</th>
    </tr>
    <tr>
      <th>sex</th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>female</th>
      <td>91</td>
      <td>70</td>
      <td>72</td>
      <td>0.968085</td>
      <td>0.921053</td>
      <td>0.500000</td>
    </tr>
    <tr>
      <th>male</th>
      <td>45</td>
      <td>17</td>
      <td>47</td>
      <td>0.368852</td>
      <td>0.157407</td>
      <td>0.135447</td>
    </tr>
  </tbody>
</table>
</div>

### 혼자인 경우 / 성별 합계 & 생존율 동시 출력

- `groupby()` 를 사용합니다.

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>alone</th>
      <th>sex</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="2" valign="top">False</th>
      <th>female</th>
      <td>134</td>
      <td>0.712766</td>
    </tr>
    <tr>
      <th>male</th>
      <td>45</td>
      <td>0.271084</td>
    </tr>
    <tr>
      <th rowspan="2" valign="top">True</th>
      <th>female</th>
      <td>99</td>
      <td>0.785714</td>
    </tr>
    <tr>
      <th>male</th>
      <td>64</td>
      <td>0.155718</td>
    </tr>
  </tbody>
</table>
</div>

### who, 등급별 생존자 합계 & 생존율 동시 출력

- `groupby()` 를 사용합니다.

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>sum</th>
      <th>mean</th>
    </tr>
    <tr>
      <th>who</th>
      <th>pclass</th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">child</th>
      <th>1</th>
      <td>5</td>
      <td>0.833333</td>
    </tr>
    <tr>
      <th>2</th>
      <td>19</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>3</th>
      <td>25</td>
      <td>0.431034</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">man</th>
      <th>1</th>
      <td>42</td>
      <td>0.352941</td>
    </tr>
    <tr>
      <th>2</th>
      <td>8</td>
      <td>0.080808</td>
    </tr>
    <tr>
      <th>3</th>
      <td>38</td>
      <td>0.119122</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">woman</th>
      <th>1</th>
      <td>89</td>
      <td>0.978022</td>
    </tr>
    <tr>
      <th>2</th>
      <td>60</td>
      <td>0.909091</td>
    </tr>
    <tr>
      <th>3</th>
      <td>56</td>
      <td>0.491228</td>
    </tr>
  </tbody>
</table>
</div>

### 다음을 수행하세요

위의 결과를 토대로
1. 별도의 DataFrame로 생성 
2. 인덱스 초기화 `reset_index()` 
3. 생존율 내림차순 정렬

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>who</th>
      <th>pclass</th>
      <th>sum</th>
      <th>mean</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>child</td>
      <td>2</td>
      <td>19</td>
      <td>1.000000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>woman</td>
      <td>1</td>
      <td>89</td>
      <td>0.978022</td>
    </tr>
    <tr>
      <th>2</th>
      <td>woman</td>
      <td>2</td>
      <td>60</td>
      <td>0.909091</td>
    </tr>
    <tr>
      <th>3</th>
      <td>child</td>
      <td>1</td>
      <td>5</td>
      <td>0.833333</td>
    </tr>
    <tr>
      <th>4</th>
      <td>woman</td>
      <td>3</td>
      <td>56</td>
      <td>0.491228</td>
    </tr>
    <tr>
      <th>5</th>
      <td>child</td>
      <td>3</td>
      <td>25</td>
      <td>0.431034</td>
    </tr>
    <tr>
      <th>6</th>
      <td>man</td>
      <td>1</td>
      <td>42</td>
      <td>0.352941</td>
    </tr>
    <tr>
      <th>7</th>
      <td>man</td>
      <td>3</td>
      <td>38</td>
      <td>0.119122</td>
    </tr>
    <tr>
      <th>8</th>
      <td>man</td>
      <td>2</td>
      <td>8</td>
      <td>0.080808</td>
    </tr>
  </tbody>
</table>
</div>

### child의 나이는 몇 세부터 몇 세까지 정의되었는지 확인

hint: `.loc`와 `agg`를 활용합니다.

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>min     0.42
max    15.00
Name: age, dtype: float64</pre>

### 등급별(pclass) / 연령별(who) 평균 요금 비교

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>fare</th>
    </tr>
    <tr>
      <th>pclass</th>
      <th>who</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="3" valign="top">1</th>
      <th>child</th>
      <td>139.382633</td>
    </tr>
    <tr>
      <th>man</th>
      <td>65.951086</td>
    </tr>
    <tr>
      <th>woman</th>
      <td>104.317995</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">2</th>
      <th>child</th>
      <td>28.323905</td>
    </tr>
    <tr>
      <th>man</th>
      <td>19.054124</td>
    </tr>
    <tr>
      <th>woman</th>
      <td>20.868624</td>
    </tr>
    <tr>
      <th rowspan="3" valign="top">3</th>
      <th>child</th>
      <td>23.220190</td>
    </tr>
    <tr>
      <th>man</th>
      <td>11.340213</td>
    </tr>
    <tr>
      <th>woman</th>
      <td>15.354351</td>
    </tr>
  </tbody>
</table>
</div>

### 부자는 살았을까? (fare 요금 기준 상위 10%의 생존율 확인)

- [참고] **부자**는 `fare`요금을 상위 10% 이상 낸 승객으로 정의합니다.

fare 요금 기준 상위 10% 기준 요금 확인

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>77.9583</pre>

부자의 데이터 개수와 생존율 확인

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>count    90.000000
mean      0.766667
Name: survived, dtype: float64</pre>

### 생존자의 평균 나이와 사망자의 평균 나이 비교

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>survived
0    30.626179
1    28.343690
Name: age, dtype: float64</pre>

### deck 정보가 NaN인 경우와 채워져 있는 경우 생존율 비교

deck 정보가 결측치인 경우 생존율

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>0.29941860465116277</pre>

deck 정보가 결측치가 아닌 경우 생존율

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>0.6699507389162561</pre>

## STEP 6: 전처리 (pre-processing)

### 결측치 처리

a) 결측치 확인

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64</pre>

b) `embarked` 컬럼의 결측치에 대하여 **최빈값(mode)**으로 채워 주세요

In [None]:
# 코드를 입력해 주세요


In [None]:
# 코드검증
# Cell 실행시 에러가 나지 않아야 함
assert 0 == df['embarked'].isnull().sum()

c) `age` 컬럼의 결측치 처리
- 남자라면 => 남자 평균 나이로 채웁니다.
- 여자라면 => 여자 평균 나이로 채웁니다.

In [None]:
# 코드를 입력해 주세요


In [None]:
# 코드검증
# Cell 실행시 에러가 나지 않아야 함
assert df.groupby('sex')['age'].mean()['female'].round(4) == 27.9157
assert df.groupby('sex')['age'].mean()['male'].round(4) == 30.7266
assert df['age'].isnull().sum() == 0

d) `deck` 컬럼 결측치는 **No Data**로 채웁니다.

CategoricalDtype 임을 염두해 주세요

In [None]:
df['deck'].dtype

`No Data`를 카테고리에 먼저 **추가**하기

In [None]:
# 코드를 입력해 주세요


`No Data`로 결측치 채우기

In [None]:
# 코드를 입력해 주세요


In [None]:
# 검증코드
df['deck'].value_counts()

<p><strong>[출력 결과]</strong></p><pre>No Data    688
C           59
B           47
D           33
E           32
A           15
F           13
G            4
Name: deck, dtype: int64</pre>

### 중복된 컬럼 제거

제거 대상 컬럼
- `class`, `embark_town`, `alive`

In [None]:
df.head()

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>survived</th>
      <th>pclass</th>
      <th>sex</th>
      <th>age</th>
      <th>sibsp</th>
      <th>parch</th>
      <th>fare</th>
      <th>embarked</th>
      <th>who</th>
      <th>adult_male</th>
      <th>deck</th>
      <th>alone</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>22.0</td>
      <td>1</td>
      <td>0</td>
      <td>7.2500</td>
      <td>S</td>
      <td>man</td>
      <td>True</td>
      <td>No Data</td>
      <td>False</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>38.0</td>
      <td>1</td>
      <td>0</td>
      <td>71.2833</td>
      <td>C</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>False</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>3</td>
      <td>female</td>
      <td>26.0</td>
      <td>0</td>
      <td>0</td>
      <td>7.9250</td>
      <td>S</td>
      <td>woman</td>
      <td>False</td>
      <td>No Data</td>
      <td>True</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>35.0</td>
      <td>1</td>
      <td>0</td>
      <td>53.1000</td>
      <td>S</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>False</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>35.0</td>
      <td>0</td>
      <td>0</td>
      <td>8.0500</td>
      <td>S</td>
      <td>man</td>
      <td>True</td>
      <td>No Data</td>
      <td>True</td>
    </tr>
  </tbody>
</table>
</div>

### 특성 공학 (feature engineering)

a) 가족의 숫자는 `sibsp` + `parch` 숫자 입니다. `family` 컬럼을 만들고 **sibsp + parch 더한 값을 입력**해 주세요

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>survived</th>
      <th>pclass</th>
      <th>sex</th>
      <th>age</th>
      <th>sibsp</th>
      <th>parch</th>
      <th>fare</th>
      <th>embarked</th>
      <th>who</th>
      <th>adult_male</th>
      <th>deck</th>
      <th>alone</th>
      <th>family</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>22.0</td>
      <td>1</td>
      <td>0</td>
      <td>7.2500</td>
      <td>S</td>
      <td>man</td>
      <td>True</td>
      <td>No Data</td>
      <td>False</td>
      <td>1</td>
    </tr>
    <tr>
      <th>1</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>38.0</td>
      <td>1</td>
      <td>0</td>
      <td>71.2833</td>
      <td>C</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>False</td>
      <td>1</td>
    </tr>
    <tr>
      <th>2</th>
      <td>1</td>
      <td>3</td>
      <td>female</td>
      <td>26.0</td>
      <td>0</td>
      <td>0</td>
      <td>7.9250</td>
      <td>S</td>
      <td>woman</td>
      <td>False</td>
      <td>No Data</td>
      <td>True</td>
      <td>0</td>
    </tr>
    <tr>
      <th>3</th>
      <td>1</td>
      <td>1</td>
      <td>female</td>
      <td>35.0</td>
      <td>1</td>
      <td>0</td>
      <td>53.1000</td>
      <td>S</td>
      <td>woman</td>
      <td>False</td>
      <td>C</td>
      <td>False</td>
      <td>1</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0</td>
      <td>3</td>
      <td>male</td>
      <td>35.0</td>
      <td>0</td>
      <td>0</td>
      <td>8.0500</td>
      <td>S</td>
      <td>man</td>
      <td>True</td>
      <td>No Data</td>
      <td>True</td>
      <td>0</td>
    </tr>
  </tbody>
</table>
</div>

다음을 수행하세요

- 성별(sex), 가족수(family)별 생존율을 확인
- `reset_index()`후 생존율 내림차순 정렬

b) 성별 가족수 별 생존율 확인

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th>survived</th>
    </tr>
    <tr>
      <th>sex</th>
      <th>family</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th rowspan="9" valign="top">female</th>
      <th>0</th>
      <td>0.785714</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.816092</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.775510</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.842105</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.250000</td>
    </tr>
    <tr>
      <th>5</th>
      <td>0.375000</td>
    </tr>
    <tr>
      <th>6</th>
      <td>0.375000</td>
    </tr>
    <tr>
      <th>7</th>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>10</th>
      <td>0.000000</td>
    </tr>
    <tr>
      <th rowspan="9" valign="top">male</th>
      <th>0</th>
      <td>0.155718</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.243243</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.396226</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.500000</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>5</th>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>6</th>
      <td>0.250000</td>
    </tr>
    <tr>
      <th>7</th>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>10</th>
      <td>0.000000</td>
    </tr>
  </tbody>
</table>
</div>

c) 생존율 TOP 5 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sex</th>
      <th>family</th>
      <th>survived</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>female</td>
      <td>3</td>
      <td>0.842105</td>
    </tr>
    <tr>
      <th>1</th>
      <td>female</td>
      <td>1</td>
      <td>0.816092</td>
    </tr>
    <tr>
      <th>2</th>
      <td>female</td>
      <td>0</td>
      <td>0.785714</td>
    </tr>
    <tr>
      <th>3</th>
      <td>female</td>
      <td>2</td>
      <td>0.775510</td>
    </tr>
    <tr>
      <th>4</th>
      <td>male</td>
      <td>3</td>
      <td>0.500000</td>
    </tr>
  </tbody>
</table>
</div>

d) 생존율 하위 TOP 10 출력

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sex</th>
      <th>family</th>
      <th>survived</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>female</td>
      <td>4</td>
      <td>0.250000</td>
    </tr>
    <tr>
      <th>1</th>
      <td>male</td>
      <td>6</td>
      <td>0.250000</td>
    </tr>
    <tr>
      <th>2</th>
      <td>male</td>
      <td>1</td>
      <td>0.243243</td>
    </tr>
    <tr>
      <th>3</th>
      <td>male</td>
      <td>0</td>
      <td>0.155718</td>
    </tr>
    <tr>
      <th>4</th>
      <td>female</td>
      <td>10</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>5</th>
      <td>male</td>
      <td>4</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>6</th>
      <td>male</td>
      <td>5</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>7</th>
      <td>female</td>
      <td>7</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>8</th>
      <td>male</td>
      <td>7</td>
      <td>0.000000</td>
    </tr>
    <tr>
      <th>9</th>
      <td>male</td>
      <td>10</td>
      <td>0.000000</td>
    </tr>
  </tbody>
</table>
</div>

e) apply 함수를 활용하여, 남자는 1, 여자는 0으로 값을 변경하고 `gender` 컬럼을 새로 만들어 적용하세요

In [None]:
# 코드를 입력해 주세요


In [None]:
# 코드 검증
df['gender'].value_counts()

<p><strong>[출력 결과]</strong></p><pre>1    577
0    314
Name: gender, dtype: int64</pre>

f) 요금을 5구간으로 나누어 `fare_bin` 컬럼을 새로 만들어 적용하세요 (동일한 분포를 갖도록 `pd.qcut()`을 사용합니다)

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>(7.854, 10.5]        184
(21.679, 39.688]     180
(-0.001, 7.854]      179
(39.688, 512.329]    176
(10.5, 21.679]       172
Name: fare_bin, dtype: int64</pre>

g) 나이를 10구간으로 나누어 `age_bin` 컬럼을 새로 만들어 적용하세요 (동일한 구간을 갖도록 `pd.cut()`을 사용합니다.)

In [None]:
# 코드를 입력해 주세요


<p><strong>[출력 결과]</strong></p><pre>(24.294, 32.252]    346
(16.336, 24.294]    177
(32.252, 40.21]     118
(40.21, 48.168]      70
(0.34, 8.378]        54
(8.378, 16.336]      46
(48.168, 56.126]     45
(56.126, 64.084]     24
(64.084, 72.042]      9
(72.042, 80.0]        2
Name: age_bin, dtype: int64</pre>