- Series(1차원), DataFrame(2차원)
  - 차원의 수: 자료구조에서 값을 추출할 때 필요한 참조점의 개수
- 3차원 이상의 자료구조: MultiIndex (다중 레벨-행에 대한 값을 저장-을 가지는 인덱스 객체)
  - 데이터의 행을 값의 조합으로 식별해야하는 경우 사용
  - 한 열의 값이 다른 열 값의 하위 범주인 계층적 데이터에 적용하기 좋음
  - 다중 인덱스 레벨은 데이터셋을 슬라이싱, 다이싱하는 다양한 방법 지원

# MultiIndex 객체

In [1]:
import pandas as pd

address = ("8809 Flair Square", "Toddside", "IL", "37206")
address

('8809 Flair Square', 'Toddside', 'IL', '37206')

- Series 및 DataFrame 인덱스로 위치 당 하나의 값(레이블)만 저장 가능
  - 다양한 데이터 유형(문자열,숫자,날짜/시간)을 사용할 순 있음
- 인덱스 레이블이 컨테이너: 튜플에는 개수의 제한x, 튜플이 DataFrame의 인덱스 레이블 역할을 함
- MultiIndex 클래스는 판다스 라이브러리에서 최상위 속성으로 접근 가능

In [2]:
addresses = [
    ("8809 Flair Square", "Toddside", "IL", "37206"),
    ("9901 Austin Street", "Toddside", "IL", "37206"),
    ("905 Hogan Quarter", "Franklin", "IL", "37206")
]

In [3]:
#MultiIndex는 from_tuples 클래스 메서드를 제공
pd.MultiIndex.from_tuples(addresses)
#pd.MultiIndex.from_tuples(tuples = addresses) 와 동일

MultiIndex([( '8809 Flair Square', 'Toddside', 'IL', '37206'),
            ('9901 Austin Street', 'Toddside', 'IL', '37206'),
            ( '905 Hogan Quarter', 'Franklin', 'IL', '37206')],
           )

- 각 튜플의 요소엔 일관된 규칙이 존재.
- 같은 위치에 있는 튜플 값의 모음 = MultiIndex의 레벨

In [4]:
#from_tuples 메서드의 names 매개변수에 리스트 전달->각 MultiIndex 레벨에 이름 할당
row_index = pd.MultiIndex.from_tuples(
    tuples = addresses,
    names = ["Street", "City", "State", "Zip"]
)
row_index

MultiIndex([( '8809 Flair Square', 'Toddside', 'IL', '37206'),
            ('9901 Austin Street', 'Toddside', 'IL', '37206'),
            ( '905 Hogan Quarter', 'Franklin', 'IL', '37206')],
           names=['Street', 'City', 'State', 'Zip'])

- MultiIndex: 각 레이블이 여러 값을 보유하는 컨테이너
  - 레벨: 레이블에서 동일한 위치에 있는 값으로 구성됨
- DataFrame의 index매개변수를 사용해 MultiIndex와 연결

In [5]:
data = [
    ["A", "B+"],
    ["C+", "C"],
    ["D-", "A"],
]

columns = ["School", "Cost of Living"]

area_grades = pd.DataFrame(
    data = data, index = row_index, columns = columns
)

area_grades

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,School,Cost of Living
Street,City,State,Zip,Unnamed: 4_level_1,Unnamed: 5_level_1
8809 Flair Square,Toddside,IL,37206,A,B+
9901 Austin Street,Toddside,IL,37206,C+,C
905 Hogan Quarter,Franklin,IL,37206,D-,A


In [6]:
#판다스는 단일 레벨 Index 객체에 2개의 열 이름 저장
area_grades.columns

Index(['School', 'Cost of Living'], dtype='object')

In [7]:
column_index = pd.MultiIndex.from_tuples([
    ("Culture", "Restaurants"),
    ("Culture", "Museums"),
    ("Services", "Police"),
    ("Services", "Schools"),
])
column_index

MultiIndex([( 'Culture', 'Restaurants'),
            ( 'Culture',     'Museums'),
            ('Services',      'Police'),
            ('Services',     'Schools')],
           )

In [8]:
#두 MultiIndex의 행x열의 수 만큼의 데이터셋 필요
data = [
    ["C-", "B+", "B-", "A"],
    ["D+", "C", "A", "C+"],
    ["A-", "A", "D+","F"],
]

pd.DataFrame(
    data = data, index = row_index, columns = column_index
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Restaurants,Museums,Police,Schools
Street,City,State,Zip,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
8809 Flair Square,Toddside,IL,37206,C-,B+,B-,A
9901 Austin Street,Toddside,IL,37206,D+,C,A,C+
905 Hogan Quarter,Franklin,IL,37206,A-,A,D+,F


# MultiIndex DataFrame

In [9]:
neighborhoods = pd.read_csv("neighborhoods.csv")
neighborhoods.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Culture,Culture.1,Services,Services.1
0,,,,Restaurants,Museums,Police,Schools
1,State,City,Street,,,,
2,MO,Fisherborough,244 Tracy View,C+,F,D-,A+
3,SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
4,WV,Jimenezview,432 John Common,A,A+,F,B


- CSV파일을 가져올 때 판다스는 파일의 첫번째 행에 열 이름이 있다고 가정(헤더)
  - 헤더에 값이 없으면 판다스는 해당 열에 'Unnamed'라는 이름을 할당
  - 헤더의 값이 중복되면 숫자를 추가

In [10]:
#다중 레벨 행 인덱스, 다중 레벨 열 인덱스를 의도한 데이터셋을 제대로 다루려면
#특정 매개변수가 필요
neighborhoods = pd.read_csv(
    "neighborhoods.csv",
    index_col = [0, 1, 2], #인덱스를 나타내는 열의 인덱스 숫자 리스트
    header = [0, 1] #열 헤더로 설정할 행의 인덱스 숫자 리스트
)
neighborhoods.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
WV,Jimenezview,432 John Common,A,A+,F,B
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
ND,New Joshuaport,877 Walter Neck,D+,C-,B,B


In [11]:
neighborhoods.info() #열 이름과 행 레이블을 튜플로 출력

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 251 entries, ('MO', 'Fisherborough', '244 Tracy View') to ('NE', 'South Kennethmouth', '346 Wallace Pass')
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   (Culture, Restaurants)  251 non-null    object
 1   (Culture, Museums)      251 non-null    object
 2   (Services, Police)      251 non-null    object
 3   (Services, Schools)     251 non-null    object
dtypes: object(4)
memory usage: 27.2+ KB


In [12]:
neighborhoods.index

MultiIndex([('MO',      'Fisherborough',        '244 Tracy View'),
            ('SD',   'Port Curtisville',     '446 Cynthia Inlet'),
            ('WV',        'Jimenezview',       '432 John Common'),
            ('AK',        'Stevenshire',        '238 Andrew Rue'),
            ('ND',     'New Joshuaport',       '877 Walter Neck'),
            ('ID',         'Wellsville',   '696 Weber Stravenue'),
            ('TN',          'Jodiburgh',    '285 Justin Corners'),
            ('DC',   'Lake Christopher',   '607 Montoya Harbors'),
            ('OH',          'Port Mike',      '041 Michael Neck'),
            ('ND',         'Hardyburgh', '550 Gilmore Mountains'),
            ...
            ('AK',          'Scottstad',      '114 Jones Garden'),
            ('IA',    'Port Willieport',  '320 Jennifer Mission'),
            ('ME',         'Port Linda',        '692 Hill Glens'),
            ('KS',         'Kaylamouth',       '483 Freeman Via'),
            ('WA',     'Port Shawnfort',    '6

In [13]:
neighborhoods.columns

MultiIndex([( 'Culture', 'Restaurants'),
            ( 'Culture',     'Museums'),
            ('Services',      'Police'),
            ('Services',     'Schools')],
           )

In [14]:
#판다스는 MultiIndex 내의 각 중첩 레벨에 순서를 할당
neighborhoods.index.names

FrozenList(['State', 'City', 'Street'])

In [15]:
#get_level_values 메서드는 주어진 MultiIndex 레벨에서 Index 객체를 추출
neighborhoods.index.get_level_values(1) #("City")

Index(['Fisherborough', 'Port Curtisville', 'Jimenezview', 'Stevenshire',
       'New Joshuaport', 'Wellsville', 'Jodiburgh', 'Lake Christopher',
       'Port Mike', 'Hardyburgh',
       ...
       'Scottstad', 'Port Willieport', 'Port Linda', 'Kaylamouth',
       'Port Shawnfort', 'North Matthew', 'Chadton', 'Diazmouth', 'Laurentown',
       'South Kennethmouth'],
      dtype='object', name='City', length=251)

In [16]:
#CSV가 이름을 제공X -> 열의 MultiIndex 레벨에는 이름이 없음
neighborhoods.columns.names

FrozenList([None, None])

In [17]:
#columns 속성을 사용하여 열의 MultiIndex에 접근 후 name 속성에 열의 이름 할당
neighborhoods.columns.names = ["Category", "Subcategory"]
neighborhoods.columns.names

FrozenList(['Category', 'Subcategory'])

In [18]:
#레벨 이름은 출력 결과의 열 헤더 왼쪽에서 확인
neighborhoods.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,C-,B,B,D+
WV,Jimenezview,432 John Common,A,A+,F,B


In [19]:
neighborhoods.columns.get_level_values(0) #("Category")

Index(['Culture', 'Culture', 'Services', 'Services'], dtype='object', name='Category')

In [20]:
#MultiIndex는 데이터셋으로부터 새로운 객체를 생성
#인덱스는 작업에 따라 축을 전환 가능
neighborhoods.head(1)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,C+,F,D-,A+


In [21]:
neighborhoods.nunique()

Category  Subcategory
Culture   Restaurants    13
          Museums        13
Services  Police         13
          Schools        13
dtype: int64

# MultiIndex 정렬
- 판다스를 탐색할 때 정렬된 컬렉션에서 더 빠르게 값을 찾을 수 있음

In [22]:
#MultiIndex DataFrame에서 메서드를 호출 시 모든 레벨을 오름차순 정렬
#밖에서부터 안쪽 방향으로 정렬을 진행
neighborhoods.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AL,Clarkland,430 Douglas Mission,A,F,C+,B+
...,...,...,...,...,...,...
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+


In [23]:
#sort_index 메서드는 ascending 매개변수를 가짐
neighborhoods.sort_index(ascending = False).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
WY,Reneeshire,717 Patel Square,B,B+,D,A
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D


In [24]:
#각 레벨의 정렬 순서를 다르게 지정하고 싶다면 ascending에 불리언 리스트 전달
neighborhoods.sort_index(ascending = [True, False, False]).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AL,Vegaside,191 Mindy Meadows,B+,A-,A+,D+


In [25]:
#level 매개변수로 MultiIndex 레벨 자체를 정렬 가능, 정렬 시 나머지 레벨은 무시
neighborhoods.sort_index(level = 1) #(level = "City")

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
UT,Baileyfort,919 Stewart Hills,D+,C+,A,C
...,...,...,...,...,...,...
NC,West Scott,348 Jack Branch,A-,D-,A-,A
SD,West Scott,139 Hardy Vista,C+,A-,D+,B-
IN,Wilsonborough,066 Carr Road,A+,C-,B,F
NC,Wilsonshire,871 Christopher Vista,B+,B,D+,F


In [26]:
#level 매개변수에 레벨의 리스트(순서대로) 입력 가능
neighborhoods.sort_index(level = [1, 2]).head() #(level = ["City", "Street"])

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
VT,Baileyfort,831 Norma Cove,B,D+,A+,D+


In [27]:
#ascending 매개변수와 level 매개변수를 함께 사용
neighborhoods.sort_index(
    level = ["City", "Street"], ascending = [True, False]
).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AR,Allisonland,124 Diaz Brooks,C-,A+,F,C+
GA,Amyburgh,941 Brian Expressway,B,B,D-,C+
IA,Amyburgh,163 Heather Neck,F,D,A+,A-
ID,Andrewshire,952 Ellis Drive,C+,A-,C+,A
UT,Baileyfort,919 Stewart Hills,D+,C+,A,C


In [28]:
#axis 매개변수에 인수(=1)를 넣으면 열의 MultiIndex도 정렬 가능
#Category 레벨을 먼저 정렬, Subcategory 레벨을 두번째로 정렬 - ??????
neighborhoods.sort_index(axis = 1).head(3) #(axis = "columns")

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Museums,Restaurants,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,F,C+,D-,A+
SD,Port Curtisville,446 Cynthia Inlet,B,C-,B,D+
WV,Jimenezview,432 John Common,A+,A,F,B


In [29]:
#level, ascending 매개변수를 axis 매개변수와 함께 사용 -> 열의 정렬 순서를 사용자 정의
neighborhoods.sort_index(
    axis = 1, level = "Subcategory", ascending = False
).head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Services,Culture,Services,Culture
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Schools,Restaurants,Police,Museums
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
MO,Fisherborough,244 Tracy View,A+,C+,D-,F
SD,Port Curtisville,446 Cynthia Inlet,D+,C-,B,B
WV,Jimenezview,432 John Common,B,A,F,A+


In [30]:
neighborhoods = neighborhoods.sort_index(ascending = True)
neighborhoods.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Scottstad,114 Jones Garden,D-,D-,D,D


# MultiIndex 행과 열 선택
- 레벨이 여러 개 있는 경우 행과 열 추출이 까다로움

In [31]:
data = [
    [1, 2],
    [3, 4]
]
df = pd.DataFrame(
    data = data,
    index = ["A", "B"],
    columns = ["X", "Y"]
)
df

Unnamed: 0,X,Y
A,1,2
B,3,4


In [32]:
#대괄호 구문으로 DataFrame에서 열을 Series로 추출 가능
df["X"]

A    1
B    3
Name: X, dtype: int64

## 하나 이상의 열 추출
- Category와 Subcategory라는 2개의 식별자 조합으로 구분되는 df에서 하나의 식별자만 사용할 때 (대괄호 안에 하나의 값만 전달)

In [33]:
#새 DataFrame에는 Category 레벨이 없음, 일반 인덱스만 존재(MultiIndex가 필요X)
neighborhoods["Services"]

Unnamed: 0_level_0,Unnamed: 1_level_0,Subcategory,Police,Schools
State,City,Street,Unnamed: 3_level_1,Unnamed: 4_level_1
AK,Rowlandchester,386 Rebecca Cove,A+,C
AK,Scottstad,082 Leblanc Freeway,D,B+
AK,Scottstad,114 Jones Garden,D,D
AK,Stevenshire,238 Andrew Rue,A-,A-
AL,Clarkland,430 Douglas Mission,C+,B+
...,...,...,...,...
WY,Lake Nicole,754 Weaver Turnpike,B,D
WY,Lake Nicole,933 Jennifer Burg,A-,C
WY,Martintown,013 Bell Mills,A-,B-
WY,Port Jason,624 Faulkner Orchard,C+,C+


In [34]:
#대괄호 안의 값이 열 MultiIndex의 가장 바깥쪽 레벨에 없으면 KeyError발생
neighborhoods["Schools"]

KeyError: 'Schools'

In [35]:
#특정 Category를 선택한 후 그 안의 Subcategory를 선택
#열 인덱스가 없는 Series를 반환
neighborhoods[("Services", "Schools")]

State  City            Street              
AK     Rowlandchester  386 Rebecca Cove         C
       Scottstad       082 Leblanc Freeway     B+
                       114 Jones Garden         D
       Stevenshire     238 Andrew Rue          A-
AL     Clarkland       430 Douglas Mission     B+
                                               ..
WY     Lake Nicole     754 Weaver Turnpike      D
                       933 Jennifer Burg        C
       Martintown      013 Bell Mills          B-
       Port Jason      624 Faulkner Orchard    C+
       Reneeshire      717 Patel Square         A
Name: (Services, Schools), Length: 251, dtype: object

In [36]:
#DataFrame에서 여러 개의 열을 추출 -> 튜플의 리스트
neighborhoods[[("Services", "Schools"), ("Culture", "Museums")]]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Services,Culture
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Schools,Museums
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2
AK,Rowlandchester,386 Rebecca Cove,C,A-
AK,Scottstad,082 Leblanc Freeway,B+,C-
AK,Scottstad,114 Jones Garden,D,D-
AK,Stevenshire,238 Andrew Rue,A-,A
AL,Clarkland,430 Douglas Mission,B+,F
...,...,...,...,...
WY,Lake Nicole,754 Weaver Turnpike,D,D-
WY,Lake Nicole,933 Jennifer Burg,C,A+
WY,Martintown,013 Bell Mills,B-,D
WY,Port Jason,624 Faulkner Orchard,C+,F


In [37]:
#여러 개의 열 추출; 간결한 코드
columns = [
    ("Services", "Schools"), 
    ("Culture", "Museums")
]
neighborhoods[columns]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Services,Culture
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Schools,Museums
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2
AK,Rowlandchester,386 Rebecca Cove,C,A-
AK,Scottstad,082 Leblanc Freeway,B+,C-
AK,Scottstad,114 Jones Garden,D,D-
AK,Stevenshire,238 Andrew Rue,A-,A
AL,Clarkland,430 Douglas Mission,B+,F
...,...,...,...,...
WY,Lake Nicole,754 Weaver Turnpike,D,D-
WY,Lake Nicole,933 Jennifer Burg,C,A+
WY,Martintown,013 Bell Mills,B-,D
WY,Port Jason,624 Faulkner Orchard,C+,F


## loc으로 하나 이상의 행 추출
- loc 접근자: 인덱스 레이블로 행과 열을 추출
- iloc 접근자: 인덱스 위치로 행과 열을 추출

In [38]:
df

Unnamed: 0,X,Y
A,1,2
B,3,4


In [39]:
df.loc["A"]

X    1
Y    2
Name: A, dtype: int64

In [40]:
df.iloc[1]

X    3
Y    4
Name: B, dtype: int64

In [41]:
#MultiIndex의 각 레벨에서 선택할 값을 알고 있다면 대괄호 안에 튜플 넣기
#레벨의 값을 넣으면 결과에 레벨이 존재할 필요X
neighborhoods.loc["TX", "Kingchester", "534 Gordon Falls"]

Category  Subcategory
Culture   Restaurants     C
          Museums        D+
Services  Police          B
          Schools         B
Name: (TX, Kingchester, 534 Gordon Falls), dtype: object

In [42]:
#대괄호 안에 단일 레이블 입력 시 가장 바깥쪽 MultiIndex 레벨에서 찾음
neighborhoods.loc["CA"]

Unnamed: 0_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Subcategory,Restaurants,Museums,Police,Schools
City,Street,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Dustinmouth,793 Cynthia Square,A-,A+,C-,A
North Jennifer,303 Alisha Road,D-,C+,C+,A+
Ryanfort,934 David Run,F,B+,F,D-


In [43]:
#대괄호의 두 번째 인수: 추출하려는 열 or "MultiIndex의 다음레벨에서 찾을 값"
neighborhoods.loc["CA", "Dustinmouth"]

Category,Culture,Culture,Services,Services
Subcategory,Restaurants,Museums,Police,Schools
Street,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
793 Cynthia Square,A-,A+,C-,A


In [44]:
#대괄호의 두 번째 인수: "추출하려는 열" or MultiIndex의 다음레벨에서 찾을 값
neighborhoods.loc["CA", "Culture"]

Unnamed: 0_level_0,Subcategory,Restaurants,Museums
City,Street,Unnamed: 2_level_1,Unnamed: 3_level_1
Dustinmouth,793 Cynthia Square,A-,A+
North Jennifer,303 Alisha Road,D-,C+
Ryanfort,934 David Run,F,B+


- 대괄호의 두 번째 인수가 나타내는 대상이 모호하기 때문에 이런 구문을 지양.
- 인덱싱 기법 사용: loc의 첫 번째 인수로 행 인덱스 레이블 사용, 두 번째 인수로 열 인덱스 레이블 사용
  - 행이나 열의 인덱스 레이블을 여러 개 지정해야하는 경우 튜플을 사용

In [45]:
#loc의 두 번재 인수가 항상 대상의 열 인덱스 레이블을 나타내기 때문에 일관성 있는 구문
neighborhoods.loc[("CA", "Dustinmouth")]

Category,Culture,Culture,Services,Services
Subcategory,Restaurants,Museums,Police,Schools
Street,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
793 Cynthia Square,A-,A+,C-,A


In [46]:
#파이썬에서 요소가 하나인 튜플을 나타낼 때는 요소 뒤에 쉼표를 붙여야 함.
neighborhoods.loc[("CA", "Dustinmouth"), ("Services",)]

Subcategory,Police,Schools
Street,Unnamed: 1_level_1,Unnamed: 2_level_1
793 Cynthia Square,C-,A


- 판다스는 접근자의 인수 유형(리스트 또는 튜플)을 구분
  - 리스트: 여러 **키**
  - 튜플: 하나의 다중 레벨 **키**의 구성 요소

In [47]:
neighborhoods.loc[("CA", "Dustinmouth"), ("Services", "Schools")]

Street
793 Cynthia Square    A
Name: (Services, Schools), dtype: object

In [48]:
#연속된 행을 선택 - 파이썬의 리스트 슬라이싱 구문 사용
neighborhoods["NE":"NH"]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
NE,Barryborough,460 Anna Tunnel,A+,A+,B,A
NE,Shawnchester,802 Cook Cliff,D-,D+,D,A
NE,South Kennethmouth,346 Wallace Pass,C-,B-,A,A-
NE,South Nathan,821 Jake Fork,C+,D,D+,A
NH,Courtneyfort,697 Spencer Isle,A+,A+,C+,A+
NH,East Deborahberg,271 Ryan Mount,B,C,D+,B-
NH,Ingramton,430 Calvin Underpass,C+,D+,C,C-
NH,North Latoya,603 Clark Mount,D-,A-,B+,B-
NH,South Tara,559 Michael Glens,C-,C-,F,B


In [49]:
#리스트 슬라이싱 구문은 튜플 인수와 결합하여 사용 가능
neighborhoods.loc[("NE", "Shawnchester"):("NH", "North Latoya")]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
NE,Shawnchester,802 Cook Cliff,D-,D+,D,A
NE,South Kennethmouth,346 Wallace Pass,C-,B-,A,A-
NE,South Nathan,821 Jake Fork,C+,D,D+,A
NH,Courtneyfort,697 Spencer Isle,A+,A+,C+,A+
NH,East Deborahberg,271 Ryan Mount,B,C,D+,B-
NH,Ingramton,430 Calvin Underpass,C+,D+,C,C-
NH,North Latoya,603 Clark Mount,D-,A-,B+,B-


In [50]:
#리스트 슬라이싱 구문을 사용 시에는 변수를 할당하여 구문을 나누면 코드를 단순화시킬 수 있음
start = ("NE", "Shawnchester")
end = ("NH", "North Latoya")
neighborhoods.loc[start:end]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
NE,Shawnchester,802 Cook Cliff,D-,D+,D,A
NE,South Kennethmouth,346 Wallace Pass,C-,B-,A,A-
NE,South Nathan,821 Jake Fork,C+,D,D+,A
NH,Courtneyfort,697 Spencer Isle,A+,A+,C+,A+
NH,East Deborahberg,271 Ryan Mount,B,C,D+,B-
NH,Ingramton,430 Calvin Underpass,C+,D+,C,C-
NH,North Latoya,603 Clark Mount,D-,A-,B+,B-


In [51]:
#모든 레벨에 대한 튜플 값을 제공할 필요는 없음
neighborhoods.loc[("NE", "Shawnchester"):("NH")]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
NE,Shawnchester,802 Cook Cliff,D-,D+,D,A
NE,South Kennethmouth,346 Wallace Pass,C-,B-,A,A-
NE,South Nathan,821 Jake Fork,C+,D,D+,A
NH,Courtneyfort,697 Spencer Isle,A+,A+,C+,A+
NH,East Deborahberg,271 Ryan Mount,B,C,D+,B-
NH,Ingramton,430 Calvin Underpass,C+,D+,C,C-
NH,North Latoya,603 Clark Mount,D-,A-,B+,B-
NH,South Tara,559 Michael Glens,C-,C-,F,B


## iloc으로 하나 이상의 행 추출

In [53]:
#iloc에 인덱스 포지션을 넘겨 하나의 행을 추출 가능
neighborhoods.iloc[25]

Category  Subcategory
Culture   Restaurants    A+
          Museums         A
Services  Police         A+
          Schools        C+
Name: (CT, East Jessicaland, 208 Todd Knolls), dtype: object

In [54]:
#행 및 열 인덱스를 나타내는 2개의 인수를 iloc에 전달
neighborhoods.iloc[25, 2]

'A+'

In [55]:
#행의 인덱스 위치를 리스트에 넣어 iloc에 전달->여러 행을 가져올 수 있음
neighborhoods.iloc[[25, 30]]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
CT,East Jessicaland,208 Todd Knolls,A+,A,A+,C+
DC,East Lisaview,910 Sandy Ramp,A-,A+,B,B


In [56]:
#슬라이싱도 loc과 동일한 규칙
neighborhoods.iloc[25:30, 1:3]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Museums,Police
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2
CT,East Jessicaland,208 Todd Knolls,A,A+
CT,New Adrianhaven,048 Brian Cove,C+,A+
CT,Port Mike,410 Keith Lodge,A,B+
CT,Sethstad,139 Bailey Grove,C-,C+
DC,East Jessica,149 Norman Crossing,C-,C+


In [57]:
#음수 슬라이싱도 허용
neighborhoods.iloc[-4:, -2:]

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2
WY,Lake Nicole,933 Jennifer Burg,A-,C
WY,Martintown,013 Bell Mills,A-,B-
WY,Port Jason,624 Faulkner Orchard,C+,C+
WY,Reneeshire,717 Patel Square,D,A


- 판다스는 DataFrame의 각 행에 주어진 인덱스 레벨의 값이 아닌 인덱스 위치를 할당
- iloc으로는 연속적인 MultiIndex 레벨에 걸쳐 인덱싱할 수 없음
- iloc은 "DataFrame의 구조를 전혀 고려하지 않는 엄격한 위치 인덱서"

# 단면 추출
- xs 메서드에 MultiIndex 레벨에 대한 값을 넘기면 행을 추출 가능
  - key 매개변수: 찾고자 하는 값
  - level 매개변수: 값을 찾을 인덱스 레벨의 이름, 숫자 위치

In [59]:
neighborhoods.xs(key = "Lake Nicole", level = 1) #level = "City"

Unnamed: 0_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,Street,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
OR,650 Angela Track,D,C-,D,F
WY,754 Weaver Turnpike,B,D-,B,D
WY,933 Jennifer Burg,C,A+,A-,C


In [60]:
#axis 매개변수의 인수에 columns를 전달하면 열에 추출 기법을 적용 가능
neighborhoods.xs(
    axis = "columns", key = "Museums", level = "Subcategory"
).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture
State,City,Street,Unnamed: 3_level_1
AK,Rowlandchester,386 Rebecca Cove,A-
AK,Scottstad,082 Leblanc Freeway,C-
AK,Scottstad,114 Jones Garden,D-
AK,Stevenshire,238 Andrew Rue,A
AL,Clarkland,430 Douglas Mission,F


In [61]:
#xs 메서드를 사용하면 비연속적인 MultiIndex 레벨에 걸쳐 키를 찾을 수 있음
neighborhoods.xs(
    key = ("AK", "238 Andrew Rue"), level = ["State", "Street"]#level=[0, 2]
)

Category,Culture,Culture,Services,Services
Subcategory,Restaurants,Museums,Police,Schools
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Stevenshire,D-,A,A-,A-


# 인덱스 조작
## 인덱스 재설정

In [62]:
neighborhoods.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AL,Clarkland,430 Douglas Mission,A,F,C+,B+


In [63]:
#reorder_levels 메서드는 MultiIndex 레벨을 지정된 순서로 정렬
#레벨 리스트를 구성하여 order 매개변수에 전달
new_order = ["City", "State", "Street"]
neighborhoods.reorder_levels(order = new_order).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
City,State,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Rowlandchester,AK,386 Rebecca Cove,C-,A-,A+,C
Scottstad,AK,082 Leblanc Freeway,D,C-,D,B+
Scottstad,AK,114 Jones Garden,D-,D-,D,D
Stevenshire,AK,238 Andrew Rue,D-,A,A-,A-
Clarkland,AL,430 Douglas Mission,A,F,C+,B+


In [65]:
#order 매개변수에 정수 리스트를 전달 가능
#각 숫자는 MultiIndex 레벨의 현재 인덱스 위치
neighborhoods.reorder_levels(order = [1, 0, 2]).head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Unnamed: 1_level_1,Subcategory,Restaurants,Museums,Police,Schools
City,State,Street,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Rowlandchester,AK,386 Rebecca Cove,C-,A-,A+,C
Scottstad,AK,082 Leblanc Freeway,D,C-,D,B+
Scottstad,AK,114 Jones Garden,D-,D-,D,D
Stevenshire,AK,238 Andrew Rue,D-,A,A-,A-
Clarkland,AL,430 Douglas Mission,A,F,C+,B+


In [70]:
#reset_index: MultiIndex 레벨을 열로 통합하는 새 DataFrame을 반환
#reset_index를 매개변수 없이 호출하면 모든 인덱스 레벨을 일반 열로 반환
#이전 MultiIndex를 표준 숫자로 대체
#Category와 Subcategory의 값을 튜플로 만들기 위해 새로운 열의 Subcategory값에 빈 문자열을 할당
neighborhoods.reset_index().tail()

Category,State,City,Street,Culture,Culture,Services,Services
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Restaurants,Museums,Police,Schools
246,WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
247,WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
248,WY,Martintown,013 Bell Mills,C-,D,A-,B-
249,WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
250,WY,Reneeshire,717 Patel Square,B,B+,D,A


In [71]:
#reset_index의 col_level 매개변수: 새로 추가되는 열을 다른 MultiIndex 레벨에 추가
#새로운 열의 Subcategory레벨의 상위 레벨(Category)에 빈 문자열 할당
neighborhoods.reset_index(col_level =1).tail() #(col_level = "Subcategory")

Category,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Culture,Culture,Services,Services
Subcategory,State,City,Street,Restaurants,Museums,Police,Schools
246,WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
247,WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
248,WY,Martintown,013 Bell Mills,C-,D,A-,B-
249,WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
250,WY,Reneeshire,717 Patel Square,B,B+,D,A


In [72]:
#reset_index의 col_fill 매개변수: 빈 문자열을 원하는 값(인수)로 변경 가능
neighborhoods.reset_index(
    col_fill = "Address", col_level = "Subcategory"
).tail()

Category,Address,Address,Address,Culture,Culture,Services,Services
Subcategory,State,City,Street,Restaurants,Museums,Police,Schools
246,WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
247,WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
248,WY,Martintown,013 Bell Mills,C-,D,A-,B-
249,WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
250,WY,Reneeshire,717 Patel Square,B,B+,D,A


In [74]:
#reset_index의 level 매개변수: 하나의 인덱스 레벨을 열로 옮길 수 있음
neighborhoods.reset_index(level = "Street").tail()

Unnamed: 0_level_0,Category,Street,Culture,Culture,Services,Services
Unnamed: 0_level_1,Subcategory,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
State,City,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
WY,Reneeshire,717 Patel Square,B,B+,D,A


In [75]:
neighborhoods.reset_index(level = ["Street", "City"]).tail()

Category,City,Street,Culture,Culture,Services,Services
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
State,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
WY,Lake Nicole,754 Weaver Turnpike,B,D-,B,D
WY,Lake Nicole,933 Jennifer Burg,C,A+,A-,C
WY,Martintown,013 Bell Mills,C-,D,A-,B-
WY,Port Jason,624 Faulkner Orchard,A-,F,C+,C+
WY,Reneeshire,717 Patel Square,B,B+,D,A


In [76]:
#reset_index메서드의 drop 매개변수 값을 True로 전달하면 지정된 레벨을 삭제
neighborhoods.reset_index(level = "Street", drop = True).tail()

Unnamed: 0_level_0,Category,Culture,Culture,Services,Services
Unnamed: 0_level_1,Subcategory,Restaurants,Museums,Police,Schools
State,City,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
WY,Lake Nicole,B,D-,B,D
WY,Lake Nicole,C,A+,A-,C
WY,Martintown,C-,D,A-,B-
WY,Port Jason,A-,F,C+,C+
WY,Reneeshire,B,B+,D,A


In [77]:
neighborhoods = neighborhoods.reset_index()

## 인덱스 설정

In [78]:
neighborhoods.head(3)

Category,State,City,Street,Culture,Culture,Services,Services
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Restaurants,Museums,Police,Schools
0,AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
1,AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
2,AK,Scottstad,114 Jones Garden,D-,D-,D,D


In [79]:
#set_index 메서드: 하나 이상의 DataFrame열을 새로운 인덱스로 설정
neighborhoods.set_index(keys = "City").head()

Category,State,Street,Culture,Culture,Services,Services
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Rowlandchester,AK,386 Rebecca Cove,C-,A-,A+,C
Scottstad,AK,082 Leblanc Freeway,D,C-,D,B+
Scottstad,AK,114 Jones Garden,D-,D-,D,D
Stevenshire,AK,238 Andrew Rue,D-,A,A-,A-
Clarkland,AL,430 Douglas Mission,A,F,C+,B+


In [80]:
#set_index 메서드: MultiIndex 레벨에서 대상으로 지정할 값을 담은 튜플을 keys 매개변수로 넘김
neighborhoods.set_index(keys = ("Culture", "Museums")).head()

Category,State,City,Street,Culture,Services,Services
Subcategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Restaurants,Police,Schools
"(Culture, Museums)",Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
A-,AK,Rowlandchester,386 Rebecca Cove,C-,A+,C
C-,AK,Scottstad,082 Leblanc Freeway,D,D,B+
D-,AK,Scottstad,114 Jones Garden,D-,D,D
A,AK,Stevenshire,238 Andrew Rue,D-,A-,A-
F,AL,Clarkland,430 Douglas Mission,A,C+,B+


In [81]:
#set_index 메서드: 행 축에 MultiIndex를 생성하려면 여러 열이 있는 리스트를 keys 매개변수로 전달
neighborhoods.set_index(keys = ["State", "City"]).head()

Unnamed: 0_level_0,Category,Street,Culture,Culture,Services,Services
Unnamed: 0_level_1,Subcategory,Unnamed: 2_level_1,Restaurants,Museums,Police,Schools
State,City,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
AK,Rowlandchester,386 Rebecca Cove,C-,A-,A+,C
AK,Scottstad,082 Leblanc Freeway,D,C-,D,B+
AK,Scottstad,114 Jones Garden,D-,D-,D,D
AK,Stevenshire,238 Andrew Rue,D-,A,A-,A-
AL,Clarkland,430 Douglas Mission,A,F,C+,B+


- 분석을 위해 데이터셋을 변형하는 다양한 배열과 조합 기법 존재
- DataFrame의 인덱스를 정의할 때
  - 현재 문제에서 가장 중요한 값이 무엇인지
  - 핵심 정보가 무엇인지
  - 여러 데이터 조각을 함께 묶어야 하는지
  - 어떤 데이터 포인트를 행과 열로 저장해야하는지
  - 행이나 열이 그룹이나 범주를 구성하는지

# 코딩 챌린지

In [83]:
investments = pd.read_csv("investments.csv")
investments.head()

Unnamed: 0,Name,Market,Status,State,Funding Rounds
0,#waywire,News,Acquired,NY,1
1,&TV Communications,Games,Operating,CA,2
2,-R- Ranch and Mine,Tourism,Operating,TX,2
3,004 Technologies,Software,Operating,IL,1
4,1-4 All,Software,Operating,NC,1


In [84]:
#고유값 개수 식별 -> 고유한 항목 수가 적은 열은 보통 범주형 데이터, 인덱스 레벨에 적합
investments.nunique()

Name              27763
Market              693
Status                3
State                61
Funding Rounds       16
dtype: int64

In [85]:
investments = investments.set_index(
    keys = ["Status", "Funding Rounds", "State"]
).sort_index()

In [86]:
investments.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Name,Market
Status,Funding Rounds,State,Unnamed: 3_level_1,Unnamed: 4_level_1
Acquired,1,AB,Hallpass Media,Games
Acquired,1,AL,EnteGreat,Enterprise Software
Acquired,1,AL,Onward Behavioral Health,Biotechnology
Acquired,1,AL,Proxsys,Biotechnology
Acquired,1,AZ,Envox Group,Public Relations


In [87]:
#1. status가 'Closed'인 모든 행을 추출
investments.loc[("Closed",)].head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Market
Funding Rounds,State,Unnamed: 2_level_1,Unnamed: 3_level_1
1,AB,Cardinal Media Technologies,Social Network Media
1,AB,Easy Bill Online,Tracking
1,AB,Globel Direct,Public Relations
1,AB,Ph03nix New Media,Games
1,AL,Naubo,News


In [88]:
#2. status가 'Acquired'이고 Funding Rounds가 10인 모든 행을 추출
investments.loc[("Acquired", 10)]

Unnamed: 0_level_0,Name,Market
State,Unnamed: 1_level_1,Unnamed: 2_level_1
NY,Genesis Networks,Web Hosting
TX,ACTIVE Network,Software


In [89]:
#3. Status가 'Operating'이고 Funding Rounds가 6이고 State가 'NJ'인 모든 행 추출
investments.loc[("Operating", 6, "NJ")]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Name,Market
Status,Funding Rounds,State,Unnamed: 3_level_1,Unnamed: 4_level_1
Operating,6,NJ,Agile Therapeutics,Biotechnology
Operating,6,NJ,Agilence,Retail Technology
Operating,6,NJ,Edge Therapeutics,Biotechnology
Operating,6,NJ,Nistica,Web Hosting


In [91]:
#4. Status가 'Closed'이고 Funding Rounds가 8인 모든 행 추출(Name열만 추출)
investments.loc[("Closed", 8), ("Name",)]

Unnamed: 0_level_0,Name
State,Unnamed: 1_level_1
CA,CipherMax
CA,Dilithium Networks
CA,Moblyng
CA,SolFocus
CA,Solyndra
FL,Extreme Enterprises
GA,MedShape
NC,Biolex Therapeutics
WA,Cozi Group


In [93]:
#5. Status나 Funding Rounds의 값과 상관없이 State가 'NJ'인 모든 행 추출
investments.xs(key = "NJ", level = "State").head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Name,Market
Status,Funding Rounds,Unnamed: 2_level_1,Unnamed: 3_level_1
Acquired,1,AkaRx,Biotechnology
Acquired,1,Aptalis Pharma,Biotechnology
Acquired,1,Cadent,Software
Acquired,1,Cancer Genetics,Health And Wellness
Acquired,1,Clacendix,E-Commerce


In [94]:
#6. MultiIndex 레벨을 DataFrame에 열로 다시 통합
investments = investments.reset_index()
investments.head()

Unnamed: 0,Status,Funding Rounds,State,Name,Market
0,Acquired,1,AB,Hallpass Media,Games
1,Acquired,1,AL,EnteGreat,Enterprise Software
2,Acquired,1,AL,Onward Behavioral Health,Biotechnology
3,Acquired,1,AL,Proxsys,Biotechnology
4,Acquired,1,AZ,Envox Group,Public Relations
