**crosstab**
- 범주형 데이터를 비교분석할 때 유용합니다.
- `pd.crosstab(index=행, columns=열, margins=True/False, normalize=True/False)`

---



In [1]:
import pandas as pd

df = pd.read_csv("/content/titanic_train.csv")

In [2]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


## **범주별 갯수 구하기**
- `pd.crosstab(행, 열)`

In [3]:
pd.crosstab(df['Sex'], df['Survived'])

Survived,0,1
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,81,233
male,468,109


In [4]:
pd.crosstab(df['Pclass'], df['Survived'])

Survived,0,1
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1
1,80,136
2,97,87
3,372,119


## **범주별 비율 구하기**
- `normalize = 'all'`: 전체 합이 100%
- `normalize = 'index'`: 행별 합이 100%
- `normalize = 'columns'`: 열별 합이 100%

In [5]:
pd.crosstab(df['Sex'], df['Survived'], normalize='all')

Survived,0,1
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.090909,0.261504
male,0.525253,0.122334


In [6]:
pd.crosstab(df['Sex'], df['Survived'], normalize='index')

Survived,0,1
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.257962,0.742038
male,0.811092,0.188908


In [7]:
pd.crosstab(df['Sex'], df['Survived'], normalize='columns')

Survived,0,1
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.147541,0.681287
male,0.852459,0.318713


In [8]:
pd.crosstab(df['Sex'], df['Survived'], normalize='all', margins=True)

Survived,0,1,All
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.090909,0.261504,0.352413
male,0.525253,0.122334,0.647587
All,0.616162,0.383838,1.0


In [9]:
pd.crosstab(df['Sex'], df['Survived'], normalize='index', margins=True)

Survived,0,1
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.257962,0.742038
male,0.811092,0.188908
All,0.616162,0.383838


In [10]:
pd.crosstab(df['Sex'], df['Survived'], normalize='columns', margins=True)

Survived,0,1,All
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.147541,0.681287,0.352413
male,0.852459,0.318713,0.647587


## **다중 인덱스, 다중 컬럼의 범주표 구하기**

In [11]:
pd.crosstab(index=[df['Sex'], df['Pclass']], columns=df['Survived'])

Unnamed: 0_level_0,Survived,0,1
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1
female,1,3,91
female,2,6,70
female,3,72,72
male,1,77,45
male,2,91,17
male,3,300,47


In [12]:
pd.crosstab(index=[df['Sex'], df['Pclass']], columns=df['Survived'], normalize='all')

Unnamed: 0_level_0,Survived,0,1
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1
female,1,0.003367,0.102132
female,2,0.006734,0.078563
female,3,0.080808,0.080808
male,1,0.08642,0.050505
male,2,0.102132,0.01908
male,3,0.3367,0.05275


In [13]:
pd.crosstab(index=[df['Sex'], df['Pclass']], columns=df['Survived'], normalize='columns')

Unnamed: 0_level_0,Survived,0,1
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1
female,1,0.005464,0.266082
female,2,0.010929,0.204678
female,3,0.131148,0.210526
male,1,0.140255,0.131579
male,2,0.165756,0.049708
male,3,0.546448,0.137427


In [14]:
pd.crosstab(index=[df['Sex'], df['Pclass']], columns=[df['Survived'], df['Embarked']])

Unnamed: 0_level_0,Survived,0,0,0,1,1,1
Unnamed: 0_level_1,Embarked,C,Q,S,C,Q,S
Sex,Pclass,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,1,1,0,2,42,1,46
female,2,0,0,6,7,2,61
female,3,8,9,55,15,24,33
male,1,25,1,51,17,0,28
male,2,8,1,82,2,0,15
male,3,33,36,231,10,3,34


In [15]:
pd.crosstab(index=[df['Sex'], df['Pclass']], columns=[df['Survived'], df['Embarked']], normalize='all')

Unnamed: 0_level_0,Survived,0,0,0,1,1,1
Unnamed: 0_level_1,Embarked,C,Q,S,C,Q,S
Sex,Pclass,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
female,1,0.001125,0.0,0.00225,0.047244,0.001125,0.051744
female,2,0.0,0.0,0.006749,0.007874,0.00225,0.068616
female,3,0.008999,0.010124,0.061867,0.016873,0.026997,0.03712
male,1,0.028121,0.001125,0.057368,0.019123,0.0,0.031496
male,2,0.008999,0.001125,0.092238,0.00225,0.0,0.016873
male,3,0.03712,0.040495,0.259843,0.011249,0.003375,0.038245
