## 결측치를 제어해서 조건문을 단순화 시키기
- 조건문이 복잡해지기 전에 미리 데이터를 정리하는 연습을 해보자.
- isin, isnull, notnull 메소드는 Series, DataFrame 모두 있다.

**isin**
- https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isin.html

**isnull**
- https://pandas.pydata.org/docs/reference/api/pandas.Series.isnull.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html

**notnull**
- https://pandas.pydata.org/docs/reference/api/pandas.Series.notnull.html
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.notnull.html

In [1]:
# 참조
import pandas as pd

In [3]:
# 데이터프레임 생성
cols = ['name', 'survived', 'pclass', 'fare', 'sex', 'age']
tt = pd.read_excel('titanic3.xls', usecols=cols, index_col='name')
tt

Unnamed: 0_level_0,pclass,survived,sex,age,fare
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Allen, Miss. Elisabeth Walton",1,1,female,29.0000,211.3375
"Allison, Master. Hudson Trevor",1,1,male,0.9167,151.5500
"Allison, Miss. Helen Loraine",1,0,female,2.0000,151.5500
"Allison, Mr. Hudson Joshua Creighton",1,0,male,30.0000,151.5500
"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",1,0,female,25.0000,151.5500
...,...,...,...,...,...
"Zabour, Miss. Hileni",3,0,female,14.5000,14.4542
"Zabour, Miss. Thamine",3,0,female,,14.4542
"Zakarian, Mr. Mapriededer",3,0,male,26.5000,7.2250
"Zakarian, Mr. Ortin",3,0,male,27.0000,7.2250


## 상류층 | 중산층 모두 찾을 조건문(마스크)를 만들 때
- 기존에 배운 방식으로 한다면

In [15]:
pclass_1_mask=tt['pclass']==1 
pclass_2_mask=tt['pclass']==2 
pclass_1_mask | pclass_2_mask

name
Allen, Miss. Elisabeth Walton                       True
Allison, Master. Hudson Trevor                      True
Allison, Miss. Helen Loraine                        True
Allison, Mr. Hudson Joshua Creighton                True
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)     True
                                                   ...  
Zabour, Miss. Hileni                               False
Zabour, Miss. Thamine                              False
Zakarian, Mr. Mapriededer                          False
Zakarian, Mr. Ortin                                False
Zimmerman, Mr. Leo                                 False
Name: pclass, Length: 1309, dtype: bool

In [17]:
pclass_1_mask | pclass_2_mask

name
Allen, Miss. Elisabeth Walton                       True
Allison, Master. Hudson Trevor                      True
Allison, Miss. Helen Loraine                        True
Allison, Mr. Hudson Joshua Creighton                True
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)     True
                                                   ...  
Zabour, Miss. Hileni                               False
Zabour, Miss. Thamine                              False
Zakarian, Mr. Mapriededer                          False
Zakarian, Mr. Ortin                                False
Zimmerman, Mr. Leo                                 False
Name: pclass, Length: 1309, dtype: bool

In [21]:
(pclass_1_mask | pclass_2_mask).sum()

600

## `.isin()`으로 동일코드를 구현해보면

> ### `Series.isin(values)`
**Parameters**
values : set or list-like  
**Returns**
Series : Series of booleans indicating if each element is in values.  
**Raises** TypeError : If values is a string

---
- 파이썬의 in 과 비슷하게 생각해보자.
    - `1 in [1, 2]` `2 in [1, 2]` 모두 True가 나옴
- `isin` 메소드를 사용하여 각 요소가 특정 값들에 속하는지 여부를 확인할 수 있다.
- Series의 각 요소가 주어진 값(values)에 포함되는지 여부를 나타내는 불리언 Series를 반환
- values는 집합(set)이나 리스트 형태의 값들로 이루어진 시퀀스
- values에 단일 문자열을 전달할 경우 TypeError가 발생
    - 단일 문자열을 하나의 요소로 갖는 리스트로 변환해야함.

In [23]:
tt['pclass'].isin([1,2])
#마스크로 만들수도있음

name
Allen, Miss. Elisabeth Walton                       True
Allison, Master. Hudson Trevor                      True
Allison, Miss. Helen Loraine                        True
Allison, Mr. Hudson Joshua Creighton                True
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)     True
                                                   ...  
Zabour, Miss. Hileni                               False
Zabour, Miss. Thamine                              False
Zakarian, Mr. Mapriededer                          False
Zakarian, Mr. Ortin                                False
Zimmerman, Mr. Leo                                 False
Name: pclass, Length: 1309, dtype: bool

In [25]:
tt['pclass'].isin([1,2]).sum()

600

In [27]:
# 1, 2, 3 모두 확인한다면
tt['pclass'].isin([1,2,3])

name
Allen, Miss. Elisabeth Walton                      True
Allison, Master. Hudson Trevor                     True
Allison, Miss. Helen Loraine                       True
Allison, Mr. Hudson Joshua Creighton               True
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)    True
                                                   ... 
Zabour, Miss. Hileni                               True
Zabour, Miss. Thamine                              True
Zakarian, Mr. Mapriededer                          True
Zakarian, Mr. Ortin                                True
Zimmerman, Mr. Leo                                 True
Name: pclass, Length: 1309, dtype: bool

## `.isnull()`
- null 요소가 있는지 확인하는 메서드
- NA를 발견하면 True로 반환한다.

In [29]:
# null이 있는 데이터 확인하기
tt.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1309 entries, Allen, Miss. Elisabeth Walton to Zimmerman, Mr. Leo
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   pclass    1309 non-null   int64  
 1   survived  1309 non-null   int64  
 2   sex       1309 non-null   object 
 3   age       1046 non-null   float64
 4   fare      1308 non-null   float64
dtypes: float64(2), int64(2), object(1)
memory usage: 93.6+ KB


In [31]:
# age열 isnull()
tt['age'].isnull()

name
Allen, Miss. Elisabeth Walton                      False
Allison, Master. Hudson Trevor                     False
Allison, Miss. Helen Loraine                       False
Allison, Mr. Hudson Joshua Creighton               False
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)    False
                                                   ...  
Zabour, Miss. Hileni                               False
Zabour, Miss. Thamine                               True
Zakarian, Mr. Mapriededer                          False
Zakarian, Mr. Ortin                                False
Zimmerman, Mr. Leo                                 False
Name: age, Length: 1309, dtype: bool

In [33]:
# 개수 확인
tt['age'].isnull().sum()

263

## `.notnull()`
- isnull과 반대로 NA가 아닌 걸 발견하면 True로 반환

In [35]:
tt['age'].notnull()

name
Allen, Miss. Elisabeth Walton                       True
Allison, Master. Hudson Trevor                      True
Allison, Miss. Helen Loraine                        True
Allison, Mr. Hudson Joshua Creighton                True
Allison, Mrs. Hudson J C (Bessie Waldo Daniels)     True
                                                   ...  
Zabour, Miss. Hileni                                True
Zabour, Miss. Thamine                              False
Zakarian, Mr. Mapriededer                           True
Zakarian, Mr. Ortin                                 True
Zimmerman, Mr. Leo                                  True
Name: age, Length: 1309, dtype: bool

In [37]:
# 개수 확인
tt['age'].notnull().sum()

1046

## 이를 이용하여 조건문(Mask)만들기

In [39]:
unknown_age_mask=tt['age'].isnull()
known_age_mask=tt['age'].notnull()

In [41]:
# 나이가 식별되지 않은 사람 확인
tt[unknown_age_mask]

Unnamed: 0_level_0,pclass,survived,sex,age,fare
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Baumann, Mr. John D",1,0,male,,25.9250
"Bradley, Mr. George (""George Arthur Brayton"")",1,1,male,,26.5500
"Brewe, Dr. Arthur Jackson",1,0,male,,39.6000
"Cairns, Mr. Alexander",1,0,male,,31.0000
"Cassebeer, Mrs. Henry Arthur Jr (Eleanor Genevieve Fosdick)",1,1,female,,27.7208
...,...,...,...,...,...
"Williams, Mr. Howard Hugh ""Harry""",3,0,male,,8.0500
"Wiseman, Mr. Phillippe",3,0,male,,7.2500
"Yousif, Mr. Wazli",3,0,male,,7.2250
"Yousseff, Mr. Gerious",3,0,male,,14.4583


In [43]:
# 나이가 식별된 사람 확인
tt[known_age_mask]

Unnamed: 0_level_0,pclass,survived,sex,age,fare
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Allen, Miss. Elisabeth Walton",1,1,female,29.0000,211.3375
"Allison, Master. Hudson Trevor",1,1,male,0.9167,151.5500
"Allison, Miss. Helen Loraine",1,0,female,2.0000,151.5500
"Allison, Mr. Hudson Joshua Creighton",1,0,male,30.0000,151.5500
"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",1,0,female,25.0000,151.5500
...,...,...,...,...,...
"Youseff, Mr. Gerious",3,0,male,45.5000,7.2250
"Zabour, Miss. Hileni",3,0,female,14.5000,14.4542
"Zakarian, Mr. Mapriededer",3,0,male,26.5000,7.2250
"Zakarian, Mr. Ortin",3,0,male,27.0000,7.2250
