<a href="https://colab.research.google.com/github/sangjin94/SQL_ITWILL/blob/main/da12_index.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# index 

In [3]:
import numpy as np
import pandas as pd

# Series 단일 계층 인덱스

In [4]:
s =pd.Series(data=np.arange(1,6))
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [5]:
# Series 객체를 생성할 떄 index를 설정하지 않으면 Rangeindex가 자동으로 만들어짐.
s.index

RangeIndex(start=0, stop=5, step=1)

In [6]:
s =pd.Series(data=np.random.rand(5),
             index=['a','b','c','d','e'])

In [7]:
s.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [8]:
# index 객체는 nlevels 속성을 가지고 있음.
s.index.nlevels

1

In [9]:
# Series에서 index는 값을 참조하기 위해서 사용.
print(s['a'])   # 한개의 값을 참조
print(s.loc['a':'c']) # slicing 부분집합

0.5723058002611442
a    0.572306
b    0.571029
c    0.188472
dtype: float64


# Series 계층적 인덱스(Hierachical index)

* Multi-level index

In [10]:
s= pd.Series(data= np.random.rand(6),
             index=[['a','a','b','b','c','c'],
                    [1,2,3,1,2,3]])
s

a  1    0.570932
   2    0.139144
b  3    0.574360
   1    0.300460
c  2    0.357622
   3    0.155497
dtype: float64

In [11]:
s.index

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 3),
            ('b', 1),
            ('c', 2),
            ('c', 3)],
           )

In [12]:
s.index.nlevels

2

nlevels가 2이상인 Multiindex를 사용해서 loc를 사용하는 방법:

* 첫번째 레벨의 인덱스만 가지고 indexing, slicing을 할 수 있음.
* 두번째 레벨의 인덱스만 가지고는 인덱싱을 할 수없음!
* 튜플 형태의 인덱스로 인덱싱을 할 수 있음.

In [13]:
s.loc['a']

1    0.570932
2    0.139144
dtype: float64

In [14]:
s.loc['a':'b']

a  1    0.570932
   2    0.139144
b  3    0.574360
   1    0.300460
dtype: float64

In [15]:
# s.loc[1] #> key error 발생 

In [16]:
s.loc[('a',1)] # 튜플 형태의 인덱스로 인덱싱 할 수 있음

0.5709321356257764

In [17]:
# s.loc[('a',1):('b':3)] # > 튜플형태로 인덱싱을 할수 있지만 슬라이싱은 할수없음.

`pd.Series.swaplevel()`: 인덱스의 레벨을 바꿔줌. 

In [18]:
s.swaplevel()

1  a    0.570932
2  a    0.139144
3  b    0.574360
1  b    0.300460
2  c    0.357622
3  c    0.155497
dtype: float64

두번째 레벨의 인덱스만 가지고 indexing과 slicing을 할 수는 없고, 첫 번째와 두번째 레벨의 인덱스를 서로 위치를 바꾼 후 첫번째 레벨 인덱스로 indexing과 slicing을 하면 됨 

In [19]:
s.swaplevel().loc[1]

a    0.570932
b    0.300460
dtype: float64

* `pd.Series.sort_index()`: Series객체의 index를 정렬.
* `pd.Series.sort_values()`: Series객체의 values를 정렬.

In [20]:
s.swaplevel().sort_index().loc[1:2]
#> 정렬되지 않은 인덱스로는 slicing을 할 수 없기 때문에, 인덱스들을 먼저 정렬한 후 slicing을 함.

1  a    0.570932
   b    0.300460
2  a    0.139144
   c    0.357622
dtype: float64

# DataFrame의 계층적 인덱스

In [21]:
df = pd.DataFrame(data=np.random.randn(6, 3),
                  columns=['a', 'b', 'c'],
                  index=[['Fri', 'Fri', 'Sat', 'Sat', 'Sun', 'Sun'], 
                         ['Lunch', 'Dinner'] * 3])
df

Unnamed: 0,Unnamed: 1,a,b,c
Fri,Lunch,-0.590384,1.090389,-0.175586
Fri,Dinner,1.013418,0.001523,-0.895698
Sat,Lunch,0.126552,-1.015755,-0.71913
Sat,Dinner,-0.07393,0.740643,0.290385
Sun,Lunch,-0.272735,0.098712,-1.498233
Sun,Dinner,1.872857,-0.731246,-1.92799


In [22]:
df.loc['Fri'] # 첫번쨰 레벨의 인덱스를 사용한 indexing

Unnamed: 0,a,b,c
Lunch,-0.590384,1.090389,-0.175586
Dinner,1.013418,0.001523,-0.895698


In [23]:
df.loc['Sat':'Sun'] # 첫번쨰 레벨의 인덱스를 사용한 slicing

Unnamed: 0,Unnamed: 1,a,b,c
Sat,Lunch,0.126552,-1.015755,-0.71913
Sat,Dinner,-0.07393,0.740643,0.290385
Sun,Lunch,-0.272735,0.098712,-1.498233
Sun,Dinner,1.872857,-0.731246,-1.92799


In [24]:
df.loc[('Fri','Lunch')]

a   -0.590384
b    1.090389
c   -0.175586
Name: (Fri, Lunch), dtype: float64

In [25]:
# 'Lunch' 만 선택
df.swaplevel().loc['Lunch']

Unnamed: 0,a,b,c
Fri,-0.590384,1.090389,-0.175586
Sat,0.126552,-1.015755,-0.71913
Sun,-0.272735,0.098712,-1.498233


# DataFrame column <--> row index

* pd.DataFrame.set_index
  * DataFrame의 컬럼(들)을 row index로 변환한 DataFrame을 리턴.
* pd.DataFrame.reset_index
  * DataFrame의 row index(들)을 컬럼으로 변환한 DataFrame을 리턴.

In [26]:
df

Unnamed: 0,Unnamed: 1,a,b,c
Fri,Lunch,-0.590384,1.090389,-0.175586
Fri,Dinner,1.013418,0.001523,-0.895698
Sat,Lunch,0.126552,-1.015755,-0.71913
Sat,Dinner,-0.07393,0.740643,0.290385
Sun,Lunch,-0.272735,0.098712,-1.498233
Sun,Dinner,1.872857,-0.731246,-1.92799


In [27]:
df.reset_index()
# level=None 생략 가능(default argument):모든 레벨의 인덱스를 모두 컬럼으로 변환 

Unnamed: 0,level_0,level_1,a,b,c
0,Fri,Lunch,-0.590384,1.090389,-0.175586
1,Fri,Dinner,1.013418,0.001523,-0.895698
2,Sat,Lunch,0.126552,-1.015755,-0.71913
3,Sat,Dinner,-0.07393,0.740643,0.290385
4,Sun,Lunch,-0.272735,0.098712,-1.498233
5,Sun,Dinner,1.872857,-0.731246,-1.92799


In [29]:
df.reset_index(level=1) # 레벨 1의 인덱스만 컬럼으로 변환 

Unnamed: 0,level_1,a,b,c
Fri,Lunch,-0.590384,1.090389,-0.175586
Fri,Dinner,1.013418,0.001523,-0.895698
Sat,Lunch,0.126552,-1.015755,-0.71913
Sat,Dinner,-0.07393,0.740643,0.290385
Sun,Lunch,-0.272735,0.098712,-1.498233
Sun,Dinner,1.872857,-0.731246,-1.92799


In [32]:
df.reset_index(level=0)

Unnamed: 0,level_0,a,b,c
Lunch,Fri,-0.590384,1.090389,-0.175586
Dinner,Fri,1.013418,0.001523,-0.895698
Lunch,Sat,0.126552,-1.015755,-0.71913
Dinner,Sat,-0.07393,0.740643,0.290385
Lunch,Sun,-0.272735,0.098712,-1.498233
Dinner,Sun,1.872857,-0.731246,-1.92799


In [34]:
exam=pd.DataFrame({'class': [1]*5+[2]*5,
                  'id':np.arange(1,11),
                  'math':np.random.randint(0,101,size=10),
                  'science':np.random.randint(0,101,size=10),
                  'history':np.random.randint(0,101,size=10)})

exam

Unnamed: 0,class,id,math,science,history
0,1,1,45,30,42
1,1,2,64,100,45
2,1,3,70,83,32
3,1,4,24,91,51
4,1,5,23,89,33
5,2,6,17,6,25
6,2,7,57,95,97
7,2,8,68,5,15
8,2,9,5,24,30
9,2,10,57,26,51


In [38]:
df_1=exam.set_index(keys='class')

In [37]:
df_1

Unnamed: 0_level_0,id,math,science,history
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,45,30,42
1,2,64,100,45
1,3,70,83,32
1,4,24,91,51
1,5,23,89,33
2,6,17,6,25
2,7,57,95,97
2,8,68,5,15
2,9,5,24,30
2,10,57,26,51


In [39]:
df_1.reset_index()

Unnamed: 0,class,id,math,science,history
0,1,1,45,30,42
1,1,2,64,100,45
2,1,3,70,83,32
3,1,4,24,91,51
4,1,5,23,89,33
5,2,6,17,6,25
6,2,7,57,95,97
7,2,8,68,5,15
8,2,9,5,24,30
9,2,10,57,26,51


In [40]:
exam[exam['class']==1] # boolean indexing

Unnamed: 0,class,id,math,science,history
0,1,1,45,30,42
1,1,2,64,100,45
2,1,3,70,83,32
3,1,4,24,91,51
4,1,5,23,89,33


In [42]:
df_1.loc[1] # loc 속성을 사용한 참조 

Unnamed: 0_level_0,id,math,science,history
class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,45,30,42
1,2,64,100,45
1,3,70,83,32
1,4,24,91,51
1,5,23,89,33


In [45]:
df2=exam.set_index(keys=['class','id'])
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,math,science,history
class,id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,45,30,42
1,2,64,100,45
1,3,70,83,32
1,4,24,91,51
1,5,23,89,33
2,6,17,6,25
2,7,57,95,97
2,8,68,5,15
2,9,5,24,30
2,10,57,26,51


In [46]:
df2.reset_index(level='class')
# 인덱스가 이름을 가지고 있는 경우는 인덱스의 이름을 argument로 전달해서 reset_index를 할 수 있음.

Unnamed: 0_level_0,class,math,science,history
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1,45,30,42
2,1,64,100,45
3,1,70,83,32
4,1,24,91,51
5,1,23,89,33
6,2,17,6,25
7,2,57,95,97
8,2,68,5,15
9,2,5,24,30
10,2,57,26,51
