# Pandas
- https://pandas.pydata.org/docs/index.html
- 교과서 78쪽~
- 표 형태의 데이터를 처리하고 분석할 때 주로 사용
- pd라는 별칭을 사용

### 특징
- 월스트리트 금융회사의 분석 전문가 웨스 매키니(Wes McKinney)가 회사에서 사용하는 분석용 데이터 핸들링 툴이 마음에 안 들어서 개발했다.
- 행과 열로 이루어진 2차원 데이터를 효율적으로 가공/처리할 수 있는 다양한 기능을 제공한다.
- 많은 부분이 넘파이 기반으로 작성됐지만 넘파이보다 훨씬 유연하고 편리하게 데이터를 처리한다.
- 파이썬의 리스트, 컬렉션, 넘파이 등의 내부 데이터나 CSV 등의 파일을 쉽게 DataFrame으로 변경해 데이터를 가공/분석한다.

#### Pandas 데이터 구조
- Series : 1차원 배열. 칼럼이 하나뿐인 데이터 구조체이다. DataFrame은 여러 개의 Series로 이뤄졌다.
- DataFrame : 2차원 배열. 여러 개의 로우와 칼럼으로 이뤄진 2차원 데이터를 담는 데이터 구조체이다.


#### 인덱스
- Index : PK처럼 개별 데이터를 고유하게 식별하는 Key 값이다. Series와 DataFrame은 모두 Index를 key 값으로 가지고 있다.


#### DataFrame
- 판다스의 핵심 객체
- 행과 열로 만들어진 2차원 배열 구조


#### read_csv()
- read_csv(filepath_or_buffer, sep=',')
- filepath에는 로드하려는 데이터 파일의 경로를 포함한 파일명을 입력한다.
- filepath에 파일명만 입력되면 파이썬 실행 파일이 있는 디렉터리와 동일한 디렉터리에 있는 파일명을 로딩한다.
- sep에 필드 구분 문자를 입력하면 어떤 필드 구분 문자 기반의 파일 포맷도 DataFrame으로 변환한다.
- sep 인자를 생략하면 자동으로 콤마로 할당한다.
- 인자로 들어온 파일을 로딩해 DataFrame 객체로 반환한다.
- 별다른 파라미터 지정이 없으면 파일의 맨 처음 로우를 칼럼명으로 인지하고 칼럼으로 변환한다.

# 1. 데이터 로드 및 확인

In [1]:
import pandas as pd
df = pd.read_csv('inflammation_sm.csv')

In [2]:
df

Unnamed: 0,0,0.1,1,3,1.1
0,0,1,2,1,2
1,0,1,1,3,3
2,0,0,2,0,4
3,0,1,1,3,3
4,0,0,1,2,2
5,0,0,2,2,4
6,0,0,1,2,3
7,0,0,0,3,1
8,0,1,1,2,1


In [3]:
df[:5]    #확인결과, 문제점?

Unnamed: 0,0,0.1,1,3,1.1
0,0,1,2,1,2
1,0,1,1,3,3
2,0,0,2,0,4
3,0,1,1,3,3
4,0,0,1,2,2


In [4]:
df.head()  #상위 5행 확인, 문제점?

Unnamed: 0,0,0.1,1,3,1.1
0,0,1,2,1,2
1,0,1,1,3,3
2,0,0,2,0,4
3,0,1,1,3,3
4,0,0,1,2,2


In [5]:
df.shape   #문제점?

(9, 5)

In [6]:
df.columns   #문제점?

Index(['0', '0.1', '1', '3', '1.1'], dtype='object')

In [7]:
import pandas as pd
df = pd.read_csv('inflammation_sm.csv', header=None)   #header=None 데이터에 헤더가 없을 경우 
print(df.shape)
print(df)

(10, 5)
   0  1  2  3  4
0  0  0  1  3  1
1  0  1  2  1  2
2  0  1  1  3  3
3  0  0  2  0  4
4  0  1  1  3  3
5  0  0  1  2  2
6  0  0  2  2  4
7  0  0  1  2  3
8  0  0  0  3  1
9  0  1  1  2  1


In [8]:
df.head()  

Unnamed: 0,0,1,2,3,4
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3
3,0,0,2,0,4
4,0,1,1,3,3


In [9]:
df[:5]

Unnamed: 0,0,1,2,3,4
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3
3,0,0,2,0,4
4,0,1,1,3,3


In [None]:
#column 이름 바꾸기

In [10]:
t = 'day'
col = []
for i in range(1,6) :
    col.append(t+str(i))
print(col)

['day1', 'day2', 'day3', 'day4', 'day5']


In [11]:
df.columns = col #column 이름 바꾸기
df.head()

Unnamed: 0,day1,day2,day3,day4,day5
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3
3,0,0,2,0,4
4,0,1,1,3,3


In [None]:
#df.columns = ['day1', 'day2', 'day3', 'day4', 'day5']
#df

In [12]:
df.columns

Index(['day1', 'day2', 'day3', 'day4', 'day5'], dtype='object')

In [13]:
df.head(3)

Unnamed: 0,day1,day2,day3,day4,day5
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3


In [14]:
df.tail()

Unnamed: 0,day1,day2,day3,day4,day5
5,0,0,1,2,2
6,0,0,2,2,4
7,0,0,1,2,3
8,0,0,0,3,1
9,0,1,1,2,1


In [15]:
df.info()  #총 데이터 개수, 컬럼의 타입, Null 데이터 개수 등 정보 확인

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
day1    10 non-null int64
day2    10 non-null int64
day3    10 non-null int64
day4    10 non-null int64
day5    10 non-null int64
dtypes: int64(5)
memory usage: 480.0 bytes


In [16]:
df.describe()  ##컬럼별 기초 통계자료 확인

Unnamed: 0,day1,day2,day3,day4,day5
count,10.0,10.0,10.0,10.0,10.0
mean,0.0,0.4,1.2,2.1,2.4
std,0.0,0.516398,0.632456,0.994429,1.173788
min,0.0,0.0,0.0,0.0,1.0
25%,0.0,0.0,1.0,2.0,1.25
50%,0.0,0.0,1.0,2.0,2.5
75%,0.0,1.0,1.75,3.0,3.0
max,0.0,1.0,2.0,3.0,4.0


# 2. 인덱싱/슬라이싱

In [17]:
df[1] #err

KeyError: 1

In [18]:
#행 선택. DataFrame.iloc[#]
df.iloc[1]   #iloc[#] : 정수형 위치 인덱싱, 인덱스 1번 행을 선택

day1    0
day2    1
day3    2
day4    1
day5    2
Name: 1, dtype: int64

In [19]:
df.loc[1]   #loc[행이름] 이 경우 인덱스번호와 행이름이 같음.

day1    0
day2    1
day3    2
day4    1
day5    2
Name: 1, dtype: int64

In [20]:
#행 1개 선택, Series 생성
t = df.iloc[1]
type(t)

pandas.core.series.Series

In [21]:
df.head()

Unnamed: 0,day1,day2,day3,day4,day5
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3
3,0,0,2,0,4
4,0,1,1,3,3


In [22]:
#열 선택. dataframe[컬럼명]
df['day5']

0    1
1    2
2    3
3    4
4    3
5    2
6    4
7    3
8    1
9    1
Name: day5, dtype: int64

In [23]:
#열 1개 선택 : Series 생성
t = df['day5']
type(t)

pandas.core.series.Series

In [24]:
#행 n개 선택, DataFrame 생성
idx = [1,3]
df.iloc[idx]  

Unnamed: 0,day1,day2,day3,day4,day5
1,0,1,2,1,2
3,0,0,2,0,4


In [25]:
print(type(df.iloc[idx]))

<class 'pandas.core.frame.DataFrame'>


In [26]:
#열 n개 선택, DataFrame 생성
idx = ['day2', 'day5']
df[idx]

Unnamed: 0,day2,day5
0,0,1
1,1,2
2,1,3
3,0,4
4,1,3
5,0,2
6,0,4
7,0,3
8,0,1
9,1,1


In [27]:
idx = ['day2', 'day5']
t=df[idx]
type(t)

pandas.core.frame.DataFrame

In [28]:
#행 열 선택
df.iloc[2, 1:]  #인덱스 2번 행, 1번 열~ 

day2    1
day3    1
day4    3
day5    3
Name: 2, dtype: int64

In [29]:
df.iloc[2, 3]  #인덱스 2번 행, 3번열

3

In [30]:
df['day3'] < 2  #비교 연산

0     True
1    False
2     True
3    False
4     True
5     True
6    False
7     True
8     True
9     True
Name: day3, dtype: bool

In [31]:
df

Unnamed: 0,day1,day2,day3,day4,day5
0,0,0,1,3,1
1,0,1,2,1,2
2,0,1,1,3,3
3,0,0,2,0,4
4,0,1,1,3,3
5,0,0,1,2,2
6,0,0,2,2,4
7,0,0,1,2,3
8,0,0,0,3,1
9,0,1,1,2,1


In [32]:
df[df['day3'] < 2]  #Boolean Indexing

Unnamed: 0,day1,day2,day3,day4,day5
0,0,0,1,3,1
2,0,1,1,3,3
4,0,1,1,3,3
5,0,0,1,2,2
7,0,0,1,2,3
8,0,0,0,3,1
9,0,1,1,2,1


In [33]:
df['day5'] > 3

0    False
1    False
2    False
3     True
4    False
5    False
6     True
7    False
8    False
9    False
Name: day5, dtype: bool

In [34]:
df[df['day5'] > 3]

Unnamed: 0,day1,day2,day3,day4,day5
3,0,0,2,0,4
6,0,0,2,2,4


In [35]:
df['day2'] == 0

0     True
1    False
2    False
3     True
4    False
5     True
6     True
7     True
8     True
9    False
Name: day2, dtype: bool

In [36]:
df[(df['day2']==0) & (df['day5'] > 3)]  #논리연산. & and , | or 
#논리 연산자 사용시 반드시 소괄호()로 그룹핑

Unnamed: 0,day1,day2,day3,day4,day5
3,0,0,2,0,4
6,0,0,2,2,4


# 데이터 추가/수정/삭제

In [37]:
df['day_0'] = 99  #열 추가
df.head(3)

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,0,0,1,3,1,99
1,0,1,2,1,2,99
2,0,1,1,3,3,99


In [38]:
df.iloc[10] = 99  #err
df

IndexError: single positional indexer is out-of-bounds

In [39]:
df.loc[10] = 99  #행 추가시 loc[행이름] 사용해야함 
df

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,0,0,1,3,1,99
1,0,1,2,1,2,99
2,0,1,1,3,3,99
3,0,0,2,0,4,99
4,0,1,1,3,3,99
5,0,0,1,2,2,99
6,0,0,2,2,4,99
7,0,0,1,2,3,99
8,0,0,0,3,1,99
9,0,1,1,2,1,99


In [40]:
df['day1'] = 99   #수정
df

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,99,0,1,3,1,99
1,99,1,2,1,2,99
2,99,1,1,3,3,99
3,99,0,2,0,4,99
4,99,1,1,3,3,99
5,99,0,1,2,2,99
6,99,0,2,2,4,99
7,99,0,1,2,3,99
8,99,0,0,3,1,99
9,99,1,1,2,1,99


- DataFrame에서 데이터의 삭제는 drop() 메서드를 이용한다.
- drop() 메서드의 원형은 DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

- axis 값에 따라 특정 칼럼 또는 특정 로우를 드롭한다.

- inplace=False이면 자신의 DataFrame의 데이터는 삭제하지 않으며, 삭제된 결과 DataFrame을 반환한다.
- inplace=True이면 자신의 DataFrame의 데이터를 삭제한다.

In [41]:
df.drop('day_0', axis=1)  #axis=1 열 삭제

Unnamed: 0,day1,day2,day3,day4,day5
0,99,0,1,3,1
1,99,1,2,1,2
2,99,1,1,3,3
3,99,0,2,0,4
4,99,1,1,3,3
5,99,0,1,2,2
6,99,0,2,2,4
7,99,0,1,2,3
8,99,0,0,3,1
9,99,1,1,2,1


In [42]:
df

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,99,0,1,3,1,99
1,99,1,2,1,2,99
2,99,1,1,3,3,99
3,99,0,2,0,4,99
4,99,1,1,3,3,99
5,99,0,1,2,2,99
6,99,0,2,2,4,99
7,99,0,1,2,3,99
8,99,0,0,3,1,99
9,99,1,1,2,1,99


In [43]:
df.drop(1, axis=0)  #axis=0 행 삭제

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,99,0,1,3,1,99
2,99,1,1,3,3,99
3,99,0,2,0,4,99
4,99,1,1,3,3,99
5,99,0,1,2,2,99
6,99,0,2,2,4,99
7,99,0,1,2,3,99
8,99,0,0,3,1,99
9,99,1,1,2,1,99
10,99,99,99,99,99,99


In [44]:
df

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,99,0,1,3,1,99
1,99,1,2,1,2,99
2,99,1,1,3,3,99
3,99,0,2,0,4,99
4,99,1,1,3,3,99
5,99,0,1,2,2,99
6,99,0,2,2,4,99
7,99,0,1,2,3,99
8,99,0,0,3,1,99
9,99,1,1,2,1,99


In [45]:
df.drop(1, axis=0, inplace=True)  #inplace=True이면 자신의 DataFrame의 데이터를 삭제

In [46]:
df

Unnamed: 0,day1,day2,day3,day4,day5,day_0
0,99,0,1,3,1,99
2,99,1,1,3,3,99
3,99,0,2,0,4,99
4,99,1,1,3,3,99
5,99,0,1,2,2,99
6,99,0,2,2,4,99
7,99,0,1,2,3,99
8,99,0,0,3,1,99
9,99,1,1,2,1,99
10,99,99,99,99,99,99


# titanic data load

In [1]:
import pandas as pd
df = pd.read_csv('titanic_train.csv')

In [48]:
#데이터 프레임 확인
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,30.0708,,C


In [49]:
#상위 5개 행 보기
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


- PassengerID : 승객 ID
- Survived : 생존여부. 0=사망, 1=생존
- Pclass : 티켓 등급. 1=1st(upper), 2=2nd(middle), 3=3rd(lower)
- SibSp : 타이타닉호에 탑승한 형제/배우자의 수
- Parch : 타이타닉호에 탑승한 부모/자녀의 수
- Embarked : 기항지 위치
- https://gooopy.tistory.com/79

In [50]:
#하위 세개의 행 보기
df.tail(3)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [51]:
#데이터 세트 모양 확인
df.shape

(891, 12)

In [52]:
df.info() #정보 확인

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
PassengerId    891 non-null int64
Survived       891 non-null int64
Pclass         891 non-null int64
Name           891 non-null object
Sex            891 non-null object
Age            714 non-null float64
SibSp          891 non-null int64
Parch          891 non-null int64
Ticket         891 non-null object
Fare           891 non-null float64
Cabin          204 non-null object
Embarked       889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.6+ KB


In [53]:
df.describe() #컬럼벌 기초 통계자료를 확인

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,891.0,891.0,891.0,714.0,891.0,891.0,891.0
mean,446.0,0.383838,2.308642,29.699118,0.523008,0.381594,32.204208
std,257.353842,0.486592,0.836071,14.526497,1.102743,0.806057,49.693429
min,1.0,0.0,1.0,0.42,0.0,0.0,0.0
25%,223.5,0.0,2.0,20.125,0.0,0.0,7.9104
50%,446.0,0.0,3.0,28.0,0.0,0.0,14.4542
75%,668.5,1.0,3.0,38.0,1.0,0.0,31.0
max,891.0,1.0,3.0,80.0,8.0,6.0,512.3292


In [54]:
#Name 컬럼 선택
df['Name']

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
5                                       Moran, Mr. James
6                                McCarthy, Mr. Timothy J
7                         Palsson, Master. Gosta Leonard
8      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
9                    Nasser, Mrs. Nicholas (Adele Achem)
10                       Sandstrom, Miss. Marguerite Rut
11                              Bonnell, Miss. Elizabeth
12                        Saundercock, Mr. William Henry
13                           Andersson, Mr. Anders Johan
14                  Vestrom, Miss. Hulda Amanda Adolfina
15                      Hewlett, Mrs. (Mary D Kingcome) 
16                                  Rice, Master. Eugene
17                          Wil

In [55]:
#Survived, Pclass, Sex, Age 컬럼 선택
df[['Survived','Pclass','Sex','Age']]

Unnamed: 0,Survived,Pclass,Sex,Age
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0
5,0,3,male,
6,0,1,male,54.0
7,0,3,male,2.0
8,1,3,female,27.0
9,1,2,female,14.0


In [56]:
#Survived, Pclass, Sex, Age 컬럼 선택하고 new_df에 저장 후 확인
new_df=df[['Survived','Pclass','Sex','Age']]
new_df

Unnamed: 0,Survived,Pclass,Sex,Age
0,0,3,male,22.0
1,1,1,female,38.0
2,1,3,female,26.0
3,1,1,female,35.0
4,0,3,male,35.0
5,0,3,male,
6,0,1,male,54.0
7,0,3,male,2.0
8,1,3,female,27.0
9,1,2,female,14.0


In [57]:
#나이가 20살 미만인 승객 출력
df[df['Age']<20]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.00,3,1,349909,21.0750,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.00,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.00,1,1,PP 9549,16.7000,G6,S
14,15,0,3,"Vestrom, Miss. Hulda Amanda Adolfina",female,14.00,0,0,350406,7.8542,,S
16,17,0,3,"Rice, Master. Eugene",male,2.00,4,1,382652,29.1250,,Q
22,23,1,3,"McGowan, Miss. Anna ""Annie""",female,15.00,0,0,330923,8.0292,,Q
24,25,0,3,"Palsson, Miss. Torborg Danira",female,8.00,3,1,349909,21.0750,,S
27,28,0,1,"Fortune, Mr. Charles Alexander",male,19.00,3,2,19950,263.0000,C23 C25 C27,S
38,39,0,3,"Vander Planke, Miss. Augusta Maria",female,18.00,2,0,345764,18.0000,,S
39,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.00,1,0,2651,11.2417,,C


In [58]:
df[df['Age']<20].shape

(164, 12)

In [59]:
#생존한 승객 확인
df[df['Survived']==1]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.00,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.00,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.00,1,0,113803,53.1000,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.00,0,2,347742,11.1333,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.00,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.00,1,1,PP 9549,16.7000,G6,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.00,0,0,113783,26.5500,C103,S
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.00,0,0,248706,16.0000,,S
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0000,,S
19,20,1,3,"Masselmani, Mrs. Fatima",female,,0,0,2649,7.2250,,C


In [60]:
#생존한 승객의 컬럼벌 기초 통계자료를 확인
df[df['Survived']==1].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,342.0,342.0,342.0,290.0,342.0,342.0,342.0
mean,444.368421,1.0,1.950292,28.34369,0.473684,0.464912,48.395408
std,252.35884,0.0,0.863321,14.950952,0.708688,0.771712,66.596998
min,2.0,1.0,1.0,0.42,0.0,0.0,0.0
25%,250.75,1.0,1.0,19.0,0.0,0.0,12.475
50%,439.5,1.0,2.0,28.0,0.0,0.0,26.0
75%,651.5,1.0,3.0,36.0,1.0,1.0,57.0
max,890.0,1.0,3.0,80.0,4.0,5.0,512.3292


In [61]:
#사망한 승객의 컬럼벌 기초 통계자료를 확인
df[df['Survived']==0].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,549.0,549.0,549.0,424.0,549.0,549.0,549.0
mean,447.016393,0.0,2.531876,30.626179,0.553734,0.32969,22.117887
std,260.640469,0.0,0.735805,14.17211,1.288399,0.823166,31.388207
min,1.0,0.0,1.0,1.0,0.0,0.0,0.0
25%,211.0,0.0,2.0,21.0,0.0,0.0,7.8542
50%,455.0,0.0,3.0,28.0,0.0,0.0,10.5
75%,675.0,0.0,3.0,39.0,1.0,0.0,26.0
max,891.0,0.0,3.0,74.0,8.0,6.0,263.0


In [62]:
#나이가 20살 미만이고, 생존한 승객 출력
df[(df['Age']<20) & (df['Survived']==1)]  

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.00,1,0,237736,30.0708,,C
10,11,1,3,"Sandstrom, Miss. Marguerite Rut",female,4.00,1,1,PP 9549,16.7000,G6,S
22,23,1,3,"McGowan, Miss. Anna ""Annie""",female,15.00,0,0,330923,8.0292,,Q
39,40,1,3,"Nicola-Yarred, Miss. Jamila",female,14.00,1,0,2651,11.2417,,C
43,44,1,2,"Laroche, Miss. Simonne Marie Anne Andree",female,3.00,1,2,SC/Paris 2123,41.5792,,C
44,45,1,3,"Devaney, Miss. Margaret Delia",female,19.00,0,0,330958,7.8792,,Q
58,59,1,2,"West, Miss. Constance Mirium",female,5.00,1,2,C.A. 34651,27.7500,,S
68,69,1,3,"Andersson, Miss. Erna Alexandra",female,17.00,4,2,3101281,7.9250,,S
78,79,1,2,"Caldwell, Master. Alden Gates",male,0.83,0,2,248738,29.0000,,S
84,85,1,2,"Ilett, Miss. Bertha",female,17.00,0,0,SO/C 14885,10.5000,,S


In [63]:
df[(df['Age']<20) & (df['Survived']==1)].shape

(79, 12)

In [64]:
#여성 승객 중 생존한 승객의 기초 통계 자료 확인
df[(df['Sex']=='female') & (df['Survived']==1)].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,233.0,233.0,233.0,197.0,233.0,233.0,233.0
mean,429.699571,1.0,1.918455,28.847716,0.515021,0.515021,51.938573
std,255.048296,0.0,0.834211,14.175073,0.737533,0.820527,64.102256
min,2.0,1.0,1.0,0.75,0.0,0.0,7.225
25%,238.0,1.0,1.0,19.0,0.0,0.0,13.0
50%,400.0,1.0,2.0,28.0,0.0,0.0,26.0
75%,636.0,1.0,3.0,38.0,1.0,1.0,76.2917
max,888.0,1.0,3.0,63.0,4.0,5.0,512.3292


In [65]:
#남성 승객 중 생존한 승객의 기초 통계 자료 확인
df[(df['Sex']=='male') & (df['Survived']==1)].describe()

Unnamed: 0,PassengerId,Survived,Pclass,Age,SibSp,Parch,Fare
count,109.0,109.0,109.0,93.0,109.0,109.0,109.0
mean,475.724771,1.0,2.018349,27.276022,0.385321,0.357798,40.821484
std,244.717482,0.0,0.922774,16.504803,0.636952,0.645826,71.355967
min,18.0,1.0,1.0,0.42,0.0,0.0,0.0
25%,272.0,1.0,1.0,18.0,0.0,0.0,9.5
50%,508.0,1.0,2.0,28.0,0.0,0.0,26.2875
75%,680.0,1.0,3.0,36.0,1.0,1.0,39.0
max,890.0,1.0,3.0,80.0,4.0,2.0,512.3292


In [66]:
df[(df['Sex']=='male') & (df['Survived']==1)].describe()['Age'].mean()

37.400103063074354

In [67]:
#Passenger ID가 631인 승객의 이름
df[df['PassengerId']==631]['Name']

630    Barkworth, Mr. Algernon Henry Wilson
Name: Name, dtype: object

In [2]:
df['Age'].mean()

29.69911764705882