## 12. 확인


## 참고자료
* [Python 완전정복 시리즈] 2편 : Pandas DataFrame 완전정복 : https://wikidocs.net/book/7188

In [1]:
import pandas as pd
import numpy as np

## 행 <-> 열 교환 (swapaxes)

In [2]:
idx = ['row1','row2']
col = ['col1','col2']
data= [['A','B'],[1,2]]
df = pd.DataFrame(data, idx, col)
df

Unnamed: 0,col1,col2
row1,A,B
row2,1,2


In [4]:
df.swapaxes(axis1=0, axis2=1)

Unnamed: 0,row1,row2
col1,A,1
col2,B,2


In [5]:
df.swapaxes(axis1=1, axis2=1)

Unnamed: 0,col1,col2
row1,A,B
row2,1,2


## 레이블명 변경 (rename) 

In [6]:
data= [['-','-'],['-','-']]
df1 = pd.DataFrame(data)
df1

Unnamed: 0,0,1
0,-,-
1,-,-


In [7]:
# mapper를 통해 0을 col1로, 1을 col2로 설정하고 축을 열(1:columns)로 설정
df1.rename(mapper={0:'col1',1:'col2'}, axis=1)

Unnamed: 0,col1,col2
0,-,-
1,-,-


In [9]:
# index에 변경값을 입력
df1.rename(index= {0:'row1', 1:'row2'})

Unnamed: 0,0,1
row1,-,-
row2,-,-


In [11]:
# inplace를 이용한 원본 변경
df1.rename(index={0:'row1',1:'row2'}, columns={0:'col1',1:'col2'},inplace=True)
df1

Unnamed: 0,col1,col2
row1,-,-
row2,-,-


In [12]:
idx = [['row1','row1','row2','row2'],[1,2,3,4]]
col = ['col1','col2']
data = [['-','-'],['-','-'],['-','-'],['-','-']]
df2 = pd.DataFrame(data, idx, col)
df2

Unnamed: 0,Unnamed: 1,col1,col2
row1,1,-,-
row1,2,-,-
row2,3,-,-
row2,4,-,-


In [13]:
df2.rename(level=1, index={1:'val1',2:'val2',3:'val3',4:'val4',5:'val5'})

Unnamed: 0,Unnamed: 1,col1,col2
row1,val1,-,-
row1,val2,-,-
row2,val3,-,-
row2,val4,-,-


In [14]:
# erros는 기본적으로 ignore, 존재하지 않는 인덱스에 대해 dict를 입력하더라도 오류 발생 X
# erros를 raise로하여 존재하지 않는 인덱스에 대해 입력할 경우 에러 발생
df2.rename(errors='raise',level=1, index={1:'val1',2:'val2',3:'val3',4:'val4',5:'val5'})

KeyError: '[5] not found in axis'

## 축 이름 변경 (rename_axis)

In [15]:
df = pd.DataFrame(data=[['-','-'],['-','-']],index=['row1','row2'],columns=['col1','col2'])
df

Unnamed: 0,col1,col2
row1,-,-
row2,-,-


In [16]:
# mapper 이용 시 axis를 이용해 적용될 축 지정해줘야 함
df = df.rename_axis(mapper='index',axis=0)
df

Unnamed: 0_level_0,col1,col2
index,Unnamed: 1_level_1,Unnamed: 2_level_1
row1,-,-
row2,-,-


In [17]:
df = df.rename_axis(columns='columns')
df

columns,col1,col2
index,Unnamed: 1_level_1,Unnamed: 2_level_1
row1,-,-
row2,-,-


In [18]:
df.rename_axis(index=str.upper, columns=str.upper, inplace=True)
df

COLUMNS,col1,col2
INDEX,Unnamed: 1_level_1,Unnamed: 2_level_1
row1,-,-
row2,-,-


## 열을 인덱스로 설정 (set_index)

In [19]:
data={'col1':['A','A','A','B','B'],
      'col2':['[1]','[2]','[3]','[1]','[2]'],
      'col3':[2,5,3,4,1],
      'col4':['X','X','Y','Z','Z']}
idx=['row1','row2','row3','row4','row5']
df = pd.DataFrame(data=data,index=idx)
df

Unnamed: 0,col1,col2,col3,col4
row1,A,[1],2,X
row2,A,[2],5,X
row3,A,[3],3,Y
row4,B,[1],4,Z
row5,B,[2],1,Z


In [20]:
df.set_index(keys='col1')

Unnamed: 0_level_0,col2,col3,col4
col1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A,[1],2,X
A,[2],5,X
A,[3],3,Y
B,[1],4,Z
B,[2],1,Z


In [21]:
df.set_index(keys=['col1', 'col3'])

Unnamed: 0_level_0,Unnamed: 1_level_0,col2,col4
col1,col3,Unnamed: 2_level_1,Unnamed: 3_level_1
A,2,[1],X
A,5,[2],X
A,3,[3],Y
B,4,[1],Z
B,1,[2],Z


In [22]:
df.set_index(keys='col1', append=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,col2,col3,col4
Unnamed: 0_level_1,col1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
row1,A,[1],2,X
row2,A,[2],5,X
row3,A,[3],3,Y
row4,B,[1],4,Z
row5,B,[2],1,Z


In [23]:
# drop의 기본값은 True로 keys에 설정된 열이 인덱스로 설정될 경우 열 삭제

df.set_index(keys='col1', drop=False)

Unnamed: 0_level_0,col1,col2,col3,col4
col1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,A,[1],2,X
A,A,[2],5,X
A,A,[3],3,Y
B,B,[1],4,Z
B,B,[2],1,Z


In [24]:
# verify_integrity의 기본값은 False로 중복된 값이 있더라도 출력

df.set_index(keys='col4')

Unnamed: 0_level_0,col1,col2,col3
col4,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
X,A,[1],2
X,A,[2],5
Y,A,[3],3
Z,B,[1],4
Z,B,[2],1


In [25]:
# 중복값이 있어 에러 발생
df.set_index(keys='col4',verify_integrity=True)

ValueError: Index has duplicate keys: Index(['X', 'Z'], dtype='object', name='col4')

## 레이블명 변경 (set_index)

In [32]:
df = pd.DataFrame(data=[[1,2],[3,4]])
df

Unnamed: 0,0,1
0,1,2
1,3,4


In [33]:
df.set_axis(labels=['row1', 'row2'], axis=0)

Unnamed: 0,0,1
row1,1,2
row2,3,4


In [34]:
df

Unnamed: 0,0,1
0,1,2
1,3,4


In [29]:
df.set_axis(labels=['col1', 'col2'], axis=1)

Unnamed: 0,col1,col2
0,1,2
1,3,4


In [31]:
df.set_axis(labels=['idx1', 'idx2'], axis=0, inplace=True)
df

  df.set_axis(labels=['idx1', 'idx2'], axis=0, inplace=True)


Unnamed: 0,0,1
idx1,1,2
idx2,3,4


##  접미사/접두사 (suffix / prefix)

In [35]:
df = pd.DataFrame(data=[[1,2],[3,4]])
df

Unnamed: 0,0,1
0,1,2
1,3,4


In [37]:
df.add_suffix('_열')

Unnamed: 0,0_열,1_열
0,1,2
1,3,4


In [38]:
df.add_prefix('열_')

Unnamed: 0,열_0,열_1
0,1,2
1,3,4


In [None]:
# 열 이름에만 가능

## 인덱스 변경 (reindex)

In [39]:
idx = [3,6,11]
col = ['col1','col2','col3','col4']
data = [[1,2,3,4],[2,4,6,8],[3,6,9,12]]
df = pd.DataFrame(data,idx,col)
df

Unnamed: 0,col1,col2,col3,col4
3,1,2,3,4
6,2,4,6,8
11,3,6,9,12


In [40]:
col2 = ['col1','idx2','idx3','col4']

# labels 이용시 axis를 이용해 적용할 축을 지정해줘야 함

df.reindex(labels=col2, axis=1)

Unnamed: 0,col1,idx2,idx3,col4
3,1,,,4
6,2,,,8
11,3,,,12


In [41]:
# index나 columns 이용시 대상 축에 변경값 바로 적용 가능

df.reindex(columns=col2)

Unnamed: 0,col1,idx2,idx3,col4
3,1,,,4
6,2,,,8
11,3,,,12


In [42]:
df.reindex(columns=col2, fill_value='-')

Unnamed: 0,col1,idx2,idx3,col4
3,1,-,-,4
6,2,-,-,8
11,3,-,-,12


In [43]:
col3 = ['col0','col1','col2','col3','col4','col5','col6','col7']
df.reindex(columns=col3)

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7
3,,1,2,3,4,,,
6,,2,4,6,8,,,
11,,3,6,9,12,,,


In [44]:
df.reindex(columns=col3, method='bfill')

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7
3,1,1,2,3,4,,,
6,2,2,4,6,8,,,
11,3,3,6,9,12,,,


In [45]:
df.reindex(columns=col3, method='ffill')

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7
3,,1,2,3,4,4,4,4
6,,2,4,6,8,8,8,8
11,,3,6,9,12,12,12,12


In [46]:
df.reindex(columns=col3, method='ffill', limit=2)

Unnamed: 0,col0,col1,col2,col3,col4,col5,col6,col7
3,,1,2,3,4,4,4,
6,,2,4,6,8,8,8,
11,,3,6,9,12,12,12,


In [48]:
# │변경 전 인덱스 - 변경 후 인덱스│≤ tolerance 를 만족하게 method 적용 허용 범위를 조정 가능
idx2 = [4, 8, 14]
df.reindex(index=idx2,method='ffill',tolerance=1)

Unnamed: 0,col1,col2,col3,col4
4,1.0,2.0,3.0,4.0
8,,,,
14,,,,


In [49]:
df.reindex(index=idx2,method='ffill',tolerance=2)

Unnamed: 0,col1,col2,col3,col4
4,1.0,2.0,3.0,4.0
8,2.0,4.0,6.0,8.0
14,,,,


In [50]:
df.reindex(index=idx2,method='ffill',tolerance=3)

Unnamed: 0,col1,col2,col3,col4
4,1,2,3,4
8,2,4,6,8
14,3,6,9,12


## 인덱스 변경 (reindex_like)

In [51]:
col1  = ['col1','col3','col6']
idx1  = ['row0','row2','row3']
data1 = [['A','X','+'],['B','Y','-'],['C','Z','=']]
self = pd.DataFrame(data1, idx1, col1)
self

Unnamed: 0,col1,col3,col6
row0,A,X,+
row2,B,Y,-
row3,C,Z,=


In [53]:
col2  = ['col1','col2','col3','col4','col5','col6']
idx2  = ['row1','row2','row3']
data2 = [[1,2,3,4,5,6],[2,3,6,8,10,12],[3,6,9,12,15,18]]
other = pd.DataFrame(data2, idx2, col2)
other

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,1,2,3,4,5,6
row2,2,3,6,8,10,12
row3,3,6,9,12,15,18


In [54]:
# other의 인덱스를 기준으로 self의 인덱스와 값을 적용. 일치하지 않는경우 NaN을 반환

self.reindex_like(other=other)  # 인덱스는 other기준으로 생성, 내용은 self에서 일치하는 인덱스 기준으로 생성됨.

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,,,,,,
row2,B,,Y,,,-
row3,C,,Z,,,=


In [55]:
self.reindex_like(other=other,method='ffill')

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,A,A,X,X,X,+
row2,B,B,Y,Y,Y,-
row3,C,C,Z,Z,Z,=


In [57]:
self.reindex_like(other=other,method='bfill')

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,B,Y,Y,-,-,-
row2,B,Y,Y,-,-,-
row3,C,Z,Z,=,=,=


In [58]:
self.reindex_like(other=other,method='bfill',limit=1)

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,B,Y,Y,,-,-
row2,B,Y,Y,,-,-
row3,C,Z,Z,,=,=


## 인덱스를 열로 변환 (reset_index)

In [59]:
df = pd.DataFrame([[1,2],[3,4],[5,6]],['row1','row2','row3'],['col1','col2'])
df

Unnamed: 0,col1,col2
row1,1,2
row2,3,4
row3,5,6


In [60]:
df.reset_index() # index라는 열이 생성되고 기존 인덱스가 기본 인덱스로 변경됨

Unnamed: 0,index,col1,col2
0,row1,1,2
1,row2,3,4
2,row3,5,6


In [61]:
df.reset_index(drop=True) # drop=True이면 열이 완전히 삭제

Unnamed: 0,col1,col2
0,1,2
1,3,4
2,5,6


In [64]:
df.reset_index(inplace=True)
df

Unnamed: 0,level_0,index,col1,col2
0,0,row1,1,2
1,1,row2,3,4
2,2,row3,5,6


## 멀티인덱스 레벨 변경 (reorder_levels)

In [65]:
idx = [['IDX1','IDX1','IDX2','IDX2'],['row1','row2','row3','row4']]
col = [['COL1','COL1','COL2','COL2'],['val1','val2','val3','val4']]
data = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]
df2 = pd.DataFrame(data,idx,col)

In [66]:
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,val1,val2,val3,val4
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16


In [67]:
df2.reset_index()

Unnamed: 0_level_0,level_0,level_1,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,val1,val2,val3,val4
0,IDX1,row1,1,2,3,4
1,IDX1,row2,5,6,7,8
2,IDX2,row3,9,10,11,12
3,IDX2,row4,13,14,15,16


In [68]:
df2.reset_index(level=0)

Unnamed: 0_level_0,level_0,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,val1,val2,val3,val4
row1,IDX1,1,2,3,4
row2,IDX1,5,6,7,8
row3,IDX2,9,10,11,12
row4,IDX2,13,14,15,16


In [70]:
df2.reset_index(level=1)

Unnamed: 0_level_0,level_1,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,val1,val2,val3,val4
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16


In [71]:
df2.reset_index(level=1,col_fill='COL0')

Unnamed: 0_level_0,level_1,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,COL0,val1,val2,val3,val4
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16


In [72]:
df2.reset_index(level=1,col_fill='COL0',col_level=1)

Unnamed: 0_level_0,COL0,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,level_1,val1,val2,val3,val4
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16


## 멀티인덱스 레벨 변경

In [73]:
idx = [['IDX1','IDX1','IDX2','IDX2'],['row1','row2','row3','row4']]
col = [['COL1','COL1','COL2','COL2'],['val1','val2','val3','val4']]
data = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]
df = pd.DataFrame(data,idx,col)
df

Unnamed: 0_level_0,Unnamed: 1_level_0,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,val1,val2,val3,val4
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16


In [74]:
df.reorder_levels([1,0])

Unnamed: 0_level_0,Unnamed: 1_level_0,COL1,COL1,COL2,COL2
Unnamed: 0_level_1,Unnamed: 1_level_1,val1,val2,val3,val4
row1,IDX1,1,2,3,4
row2,IDX1,5,6,7,8
row3,IDX2,9,10,11,12
row4,IDX2,13,14,15,16


In [75]:
df.reorder_levels([1,0], axis=1)

Unnamed: 0_level_0,Unnamed: 1_level_0,val1,val2,val3,val4
Unnamed: 0_level_1,Unnamed: 1_level_1,COL1,COL1,COL2,COL2
IDX1,row1,1,2,3,4
IDX1,row2,5,6,7,8
IDX2,row3,9,10,11,12
IDX2,row4,13,14,15,16
