- Series(1차원), DataFrame(2차원)
  - 차원의 수: 자료구조에서 값을 추출할 때 필요한 참조점의 개수
- 3차원 이상의 자료구조: MultiIndex (다중 레벨-행에 대한 값을 저장-을 가지는 인덱스 객체)
  - 데이터의 행을 값의 조합으로 식별해야하는 경우 사용
  - 한 열의 값이 다른 열 값의 하위 범주인 계층적 데이터에 적용하기 좋음
  - 다중 인덱스 레벨은 데이터셋을 슬라이싱, 다이싱하는 다양한 방법 지원

# MultiIndex 객체

In [1]:
import pandas as pd

address = ("8809 Flair Square", "Toddside", "IL", "37206")
address

('8809 Flair Square', 'Toddside', 'IL', '37206')

- Series 및 DataFrame 인덱스로 위치 당 하나의 값(레이블)만 저장 가능
  - 다양한 데이터 유형(문자열,숫자,날짜/시간)을 사용할 순 있음
- 인덱스 레이블이 컨테이너: 튜플에는 개수의 제한x, 튜플이 DataFrame의 인덱스 레이블 역할을 함
- MultiIndex 클래스는 판다스 라이브러리에서 최상위 속성으로 접근 가능

In [2]:
addresses = [
    ("8809 Flair Square", "Toddside", "IL", "37206"),
    ("9901 Austin Street", "Toddside", "IL", "37206"),
    ("905 Hogan Quarter", "Franklin", "IL", "37206")
]

In [3]:
#MultiIndex는 from_tuples 클래스 메서드를 제공
pd.MultiIndex.from_tuples(addresses)
#pd.MultiIndex.from_tuples(tuples = addresses) 와 동일

MultiIndex([( '8809 Flair Square', 'Toddside', 'IL', '37206'),
            ('9901 Austin Street', 'Toddside', 'IL', '37206'),
            ( '905 Hogan Quarter', 'Franklin', 'IL', '37206')],
           )

- 각 튜플의 요소엔 일관된 규칙이 존재.
- 같은 위치에 있는 튜플 값의 모음 = MultiIndex의 레벨

In [4]:
#from_tuples 메서드의 names 매개변수에 리스트 전달->각 MultiIndex 레벨에 이름 할당
row_index = pd.MultiIndex.from_tuples(
    tuples = addresses,
    names = ["Street", "City", "State", "Zip"]
)
row_index

MultiIndex([( '8809 Flair Square', 'Toddside', 'IL', '37206'),
            ('9901 Austin Street', 'Toddside', 'IL', '37206'),
            ( '905 Hogan Quarter', 'Franklin', 'IL', '37206')],
           names=['Street', 'City', 'State', 'Zip'])

- MultiIndex: 각 레이블이 여러 값을 보유하는 컨테이너
  - 레벨: 레이블에서 동일한 위치에 있는 값으로 구성됨
- DataFrame의 index매개변수를 사용해 MultiIndex와 연결

In [5]:
data = [
    ["A", "B+"],
    ["C+", "C"],
    ["D-", "A"],
]

columns = ["School", "Cost of Living"]

area_grades = pd.DataFrame(
    data = data, index = row_index, columns = columns
)

area_grades

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,School,Cost of Living
Street,City,State,Zip,Unnamed: 4_level_1,Unnamed: 5_level_1
8809 Flair Square,Toddside,IL,37206,A,B+
9901 Austin Street,Toddside,IL,37206,C+,C
905 Hogan Quarter,Franklin,IL,37206,D-,A


In [6]:
#판다스는 단일 레벨 Index 객체에 2개의 열 이름 저장
area_grades.columns

Index(['School', 'Cost of Living'], dtype='object')

In [7]:
column_index = pd.MultiIndex.from_tuples([
    ("Culture", "Restaurants"),
    ("Culture", "Museums"),
    ("Services", "Police"),
    ("Services", "Schools"),
])
column_index

MultiIndex([( 'Culture', 'Restaurants'),
            ( 'Culture',     'Museums'),
            ('Services',      'Police'),
            ('Services',     'Schools')],
           )

In [8]:
#두 MultiIndex의 행x열의 수 만큼의 데이터셋 필요
data = [
    ["C-", "B+", "B-", "A"],
    ["D+", "C", "A", "C+"],
    ["A-", "A", "D+","F"],
]

pd.DataFrame(
    data = data, index = row_index, columns = column_index
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Restaurants,Museums,Police,Schools
Street,City,State,Zip,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
8809 Flair Square,Toddside,IL,37206,C-,B+,B-,A
9901 Austin Street,Toddside,IL,37206,D+,C,A,C+
905 Hogan Quarter,Franklin,IL,37206,A-,A,D+,F


# MultiIndex DataFrame

In [9]:
neighborhoods = pd.read_csv("neighborhoods.csv")
neighborhoods.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Culture,Culture.1,Services,Services.1
0,,,,Restaurants,Museums,Police,Schools
1,State,City,Street,,,,
2,MO,Fisherborough,244 Tracy View,C+,F,D-,A+
3,SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
4,WV,Jimenezview,432 John Common,A,A+,F,B


- CSV파일을 가져올 때 판다스는 파일의 첫번째 행에 열 이름이 있다고 가정(헤더)
  - 헤더에 값이 없으면 판다스는 해당 열에 'Unnamed'라는 이름을 할당
  - 헤더의 값이 중복되면 숫자를 추가

In [10]:
#다중 레벨 행 인덱스, 다중 레벨 열 인덱스를 의도한 데이터셋을 제대로 다루려면
#특정 매개변수가 필요
neighborhoods = pd.read_csv(
    "neighborhoods.csv",
    index_col = [0, 1, 2], #인덱스를 나타내는 열의 인덱스 숫자 리스트
    header = [0, 1] #열 헤더로 설정할 행의 인덱스 숫자 리스트
)
neighborhoods.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
WV,Jimenezview,432 John Common,A,A+,F,B
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
ND,New Joshuaport,877 Walter Neck,D+,C-,B,B


In [12]:
neighborhoods.info() #열 이름과 행 레이블을 튜플로 출력

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 251 entries, ('MO', 'Fisherborough', '244 Tracy View') to ('NE', 'South Kennethmouth', '346 Wallace Pass')
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   (Culture, Restaurants)  251 non-null    object
 1   (Culture, Museums)      251 non-null    object
 2   (Services, Police)      251 non-null    object
 3   (Services, Schools)     251 non-null    object
dtypes: object(4)
memory usage: 27.2+ KB


In [13]:
neighborhoods.index

MultiIndex([('MO',      'Fisherborough',        '244 Tracy View'),
            ('SD',   'Port Curtisville',     '446 Cynthia Inlet'),
            ('WV',        'Jimenezview',       '432 John Common'),
            ('AK',        'Stevenshire',        '238 Andrew Rue'),
            ('ND',     'New Joshuaport',       '877 Walter Neck'),
            ('ID',         'Wellsville',   '696 Weber Stravenue'),
            ('TN',          'Jodiburgh',    '285 Justin Corners'),
            ('DC',   'Lake Christopher',   '607 Montoya Harbors'),
            ('OH',          'Port Mike',      '041 Michael Neck'),
            ('ND',         'Hardyburgh', '550 Gilmore Mountains'),
            ...
            ('AK',          'Scottstad',      '114 Jones Garden'),
            ('IA',    'Port Willieport',  '320 Jennifer Mission'),
            ('ME',         'Port Linda',        '692 Hill Glens'),
            ('KS',         'Kaylamouth',       '483 Freeman Via'),
            ('WA',     'Port Shawnfort',    '6

In [14]:
neighborhoods.columns

MultiIndex([( 'Culture', 'Restaurants'),
            ( 'Culture',     'Museums'),
            ('Services',      'Police'),
            ('Services',     'Schools')],
           )

In [15]:
#판다스는 MultiIndex 내의 각 중첩 레벨에 순서를 할당
neighborhoods.index.names

FrozenList(['State', 'City', 'Street'])

In [16]:
#get_level_values 메서드는 주어진 MultiIndex 레벨에서 Index 객체를 추출
neighborhoods.index.get_level_values(1) #("City")

Index(['Fisherborough', 'Port Curtisville', 'Jimenezview', 'Stevenshire',
       'New Joshuaport', 'Wellsville', 'Jodiburgh', 'Lake Christopher',
       'Port Mike', 'Hardyburgh',
       ...
       'Scottstad', 'Port Willieport', 'Port Linda', 'Kaylamouth',
       'Port Shawnfort', 'North Matthew', 'Chadton', 'Diazmouth', 'Laurentown',
       'South Kennethmouth'],
      dtype='object', name='City', length=251)

In [17]:
#CSV가 이름을 제공X -> 열의 MultiIndex 레벨에는 이름이 없음
neighborhoods.columns.names

FrozenList([None, None])

In [18]:
#columns 속성을 사용하여 열의 MultiIndex에 접근 후 name 속성에 열의 이름 할당
neighborhoods.columns.names = ["Category", "Subcategory"]
neighborhoods.columns.names

FrozenList(['Category', 'Subcategory'])

In [20]:
#레벨 이름은 출력 결과의 열 헤더 왼쪽에서 확인
neighborhoods.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
WV,Jimenezview,432 John Common,A,A+,F,B


In [21]:
neighborhoods.columns.get_level_values(0) #("Category")

Index(['Culture', 'Culture', 'Services', 'Services'], dtype='object', name='Category')

In [22]:
#MultiIndex는 데이터셋으로부터 새로운 객체를 생성
#인덱스는 작업에 따라 축을 전환 가능
neighborhoods.head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+


In [23]:
neighborhoods.nunique()

Category  Subcategory
Culture   Restaurants    13
          Museums        13
Services  Police         13
          Schools        13
dtype: int64

# MultiIndex 정렬
- 판다스를 탐색할 때 정렬된 컬렉션에서 더 빠르게 값을 찾을 수 있음

In [25]:
#MultiIndex DataFrame에서 메서드를 호출 시 모든 레벨을 오름차순 정렬
#밖에서부터 안쪽 방향으로 정렬을 진행
neighborhoods.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AL,Clarkland,430 Douglas Mission,A,F,C+,B+
...,...,...,...,...,...,...
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+


In [26]:
#sort_index 메서드는 ascending 매개변수를 가짐
neighborhoods.sort_index(ascending = False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
WY,Reneeshire,717 Patel Square,B,B+,D,A
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D


In [27]:
#각 레벨의 정렬 순서를 다르게 지정하고 싶다면 ascending에 불리언 리스트 전달
neighborhoods.sort_index(ascending = [True, False, False]).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AL,Vegaside,191 Mindy Meadows,B+,A-,A+,D+


In [28]:
#level 매개변수로 MultiIndex 레벨 자체를 정렬 가능, 정렬 시 나머지 레벨은 무시
neighborhoods.sort_index(level = 1) #(level = "City")

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
UT,Baileyfort,919 Stewart Hills,D+,C+,A,C
...,...,...,...,...,...,...
NC,West Scott,348 Jack Branch,A-,D-,A-,A
SD,West Scott,139 Hardy Vista,C+,A-,D+,B-
IN,Wilsonborough,066 Carr Road,A+,C-,B,F
NC,Wilsonshire,871 Christopher Vista,B+,B,D+,F


In [30]:
#level 매개변수에 레벨의 리스트(순서대로) 입력 가능
neighborhoods.sort_index(level = [1, 2]).head() #(level = ["City", "Street"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
VT,Baileyfort,831 Norma Cove,B,D+,A+,D+


In [31]:
#ascending 매개변수와 level 매개변수를 함께 사용
neighborhoods.sort_index(
    level = ["City", "Street"], ascending = [True, False]
).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
UT,Baileyfort,919 Stewart Hills,D+,C+,A,C


In [32]:
#axis 매개변수에 인수(=1)를 넣으면 열의 MultiIndex도 정렬 가능
#Category 레벨을 먼저 정렬, Subcategory 레벨을 두번째로 정렬
neighborhoods.sort_index(axis = 1).head() #(axis = "columns")

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Museums,Restaurants,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,F,C+,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,B,C-,B,D+
WV,Jimenezview,432 John Common,A+,A,F,B
AK,Stevenshire,238 Andrew Rue,A,D-,A-,A-
ND,New Joshuaport,877 Walter Neck,C-,D+,B,B


In [None]:
#level, ascending 매개변수를 axis 매개변수와 함께 사용 -> 열의 정렬 순서를 사용자 정의
