# Hierarchical Indexing in Pandas Coding Practice Questions

1. Create a Pandas Series named `data` with a MultiIndex using arrays `[['a', 'a', 'b', 'b'], [1, 2, 1, 2]]` and values `[10, 20, 30, 40]`. Display the Series.

In [1]:
import numpy as np
import pandas as pd
data = pd.Series([10, 20, 30, 40], index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]])
data

a  1    10
   2    20
b  1    30
   2    40
dtype: int64

2. Check the index of the Series `data`.

In [2]:
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

3. Select the value associated with the index `('a', 1)` from `data`.

In [3]:
data_0 = data.loc[('a', 1)]
data_0

10

4. Slice the Series `data` to get values associated with the index 'a'.

In [4]:
data_1 = data[:('a',2)]
data_1

a  1    10
   2    20
dtype: int64

In [34]:
data['a']

1    10
2    20
dtype: int64

5. Create a DataFrame `df` with a MultiIndex using arrays `[['A', 'A', 'B', 'B'], [1, 2, 1, 2]]` and data `[[1, 2], [3, 4], [5, 6], [7, 8]]`. Display the DataFrame.

In [38]:
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], index=[['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
df

Unnamed: 0,Unnamed: 1,0,1
A,1,1,2
A,2,3,4
B,1,5,6
B,2,7,8


6. Check the columns of the DataFrame `df`.

In [6]:
df.columns

RangeIndex(start=0, stop=2, step=1)

7. Select the data associated with the index `('A', 1)` from `df`.

In [7]:
df.loc[('A', 1)]

0    1
1    2
Name: (A, 1), dtype: int64

8. Swap the levels of the MultiIndex in `df`.

In [8]:
df_swaped = df.swaplevel(0, 1, axis=0)
df_swaped

Unnamed: 0,Unnamed: 1,0,1
1,A,1,2
2,A,3,4
1,B,5,6
2,B,7,8


9. Sort the MultiIndex of `df` by the second level.

In [9]:
print(df)
df_sorted = df.sort_index(level=1)
df_sorted

     0  1
A 1  1  2
  2  3  4
B 1  5  6
  2  7  8


Unnamed: 0,Unnamed: 1,0,1
A,1,1,2
B,1,5,6
A,2,3,4
B,2,7,8


10. Reset the index of `df`.

In [39]:
df_copy = df
df_reset = df_copy.reset_index()
df_reset

Unnamed: 0,level_0,level_1,0,1
0,A,1,1,2
1,A,2,3,4
2,B,1,5,6
3,B,2,7,8


11. Set the columns of `df` as a MultiIndex using arrays `[['X', 'Y'], ['alpha', 'beta']]`.

In [43]:
columns = pd.MultiIndex.from_arrays([['X', 'Y'], ['alpha', 'beta']])
df.columns = columns
df

Unnamed: 0_level_0,Unnamed: 1_level_0,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,alpha,beta
A,1,1,2
A,2,3,4
B,1,5,6
B,2,7,8


12. Aggregate the data in `df` using the `sum` function, grouping by the first level of the index.

In [12]:
result = df.groupby(level=0).sum()
result

Unnamed: 0_level_0,X,Y
Unnamed: 0_level_1,alpha,beta
A,4,6
B,12,14


13. Use the `xs` method to select data at the first level of the index equal to 'A' from `df`.

In [13]:
data_A = df.xs('A', level=0)
data_A

Unnamed: 0_level_0,X,Y
Unnamed: 0_level_1,alpha,beta
1,1,2
2,3,4


14. Use the `xs` method to select data at the second level of the index equal to 1 from `df`.

In [14]:
df

Unnamed: 0_level_0,Unnamed: 1_level_0,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,alpha,beta
A,1,1,2
A,2,3,4
B,1,5,6
B,2,7,8


In [15]:
data_level_1 = df.xs(1, level=1)
data_level_1

Unnamed: 0_level_0,X,Y
Unnamed: 0_level_1,alpha,beta
A,1,2
B,5,6


15. Stack the DataFrame `df`.

In [44]:
df.stack()

Unnamed: 0,Unnamed: 1,Unnamed: 2,X,Y
A,1,alpha,1.0,
A,1,beta,,2.0
A,2,alpha,3.0,
A,2,beta,,4.0
B,1,alpha,5.0,
B,1,beta,,6.0
B,2,alpha,7.0,
B,2,beta,,8.0


16. Unstack the last level of the MultiIndex in `df`.

In [45]:
df.unstack()

Unnamed: 0_level_0,X,X,Y,Y
Unnamed: 0_level_1,alpha,alpha,beta,beta
Unnamed: 0_level_2,1,2,1,2
A,1,3,2,4
B,5,7,6,8


17. Create a pivot table from `df` using the first level of the index as rows and the second level of the index as columns.

In [46]:
# Create a pivot table using the first level of the index as rows and the second level as columns
pivot_table = df.pivot_table(index=df.index.get_level_values(0), columns=df.index.get_level_values(1))

# Display the pivot table
pivot_table

Unnamed: 0_level_0,X,X,Y,Y
Unnamed: 0_level_1,alpha,alpha,beta,beta
Unnamed: 0_level_2,1,2,1,2
A,1,3,2,4
B,5,7,6,8


In [19]:
pivot_table = df.pivot_table(index=df.index.get_level_values(0), columns=df.index.get_level_values(1))
pivot_table

Unnamed: 0_level_0,X,X,Y,Y
Unnamed: 0_level_1,alpha,alpha,beta,beta
Unnamed: 0_level_2,1,2,1,2
A,1,3,2,4
B,5,7,6,8


18. Melt the DataFrame `df` to long format.

In [49]:
print(df)
df.reset_index().melt(id_vars=['level_0', 'level_1'])

        X    Y
    alpha beta
A 1     1    2
  2     3    4
B 1     5    6
  2     7    8


Unnamed: 0,level_0,level_1,variable_0,variable_1,value
0,A,1,X,alpha,1
1,A,2,X,alpha,3
2,B,1,X,alpha,5
3,B,2,X,alpha,7
4,A,1,Y,beta,2
5,A,2,Y,beta,4
6,B,1,Y,beta,6
7,B,2,Y,beta,8


19. Create a cross-tabulation of the two levels of the MultiIndex in `df`.

In [21]:
cross_tab = pd.crosstab(index=df.index.get_level_values(0), columns=df.index.get_level_values(1))
cross_tab

col_0,1,2
row_0,Unnamed: 1_level_1,Unnamed: 2_level_1
A,1,1
B,1,1


20. Convert the MultiIndex in `df` to a flat index using the `reset_index` method.

In [22]:
df_flat = df.reset_index()
df_flat

Unnamed: 0_level_0,level_0,level_1,X,Y
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,alpha,beta
0,A,1,1,2
1,A,2,3,4
2,B,1,5,6
3,B,2,7,8


# Solution

1. Create a Pandas Series named `data` with a MultiIndex using arrays `[['a', 'a', 'b', 'b'], [1, 2, 1, 2]]` and values `[10, 20, 30, 40]`. Display the Series.

In [23]:
import numpy as np
import pandas as pd

In [24]:
data = pd.Series([10, 20, 30, 40], index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]])
data

a  1    10
   2    20
b  1    30
   2    40
dtype: int64

In [25]:
# 2. Check the index of the Series `data`.
data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('b', 1),
            ('b', 2)],
           )

In [26]:
# 3. Select the value associated with the index `('a', 1)` from `data`.
data_0 = data.loc[('a', 1)]
data_0

10

In [27]:
# 4. Slice the Series `data` to get values associated with the index 'a'.
data_1 = data[:('a',2)]
data_1

a  1    10
   2    20
dtype: int64

In [28]:
# 5. Create a DataFrame df with a MultiIndex using arrays [['A', 'A', 'B', 'B'], [1, 2, 1, 2]] and data [[1, 2], [3, 4], [5, 6], [7, 8]]. Display the DataFrame.
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], index=[['A', 'A', 'B', 'B'], [1, 2, 1, 2]])
df

Unnamed: 0,Unnamed: 1,0,1
A,1,1,2
A,2,3,4
B,1,5,6
B,2,7,8


In [29]:
# 6. Check the columns of the DataFrame df.
df.columns

RangeIndex(start=0, stop=2, step=1)

In [30]:
# 7. Select the data associated with the index ('A', 1) from df.
df.loc[('A', 1)]

0    1
1    2
Name: (A, 1), dtype: int64

In [31]:
# 8. Swap the levels of the MultiIndex in df.
df = df.swaplevel(0, 1)
df

Unnamed: 0,Unnamed: 1,0,1
1,A,1,2
2,A,3,4
1,B,5,6
2,B,7,8


In [32]:
# 9. Sort the MultiIndex of df by the second level.
df = df.sort_index(level=1)
df

Unnamed: 0,Unnamed: 1,0,1
1,A,1,2
2,A,3,4
1,B,5,6
2,B,7,8


In [33]:
# 10. Reset the index of df
df = df.reset_index()
df

Unnamed: 0,level_0,level_1,0,1
0,1,A,1,2
1,2,A,3,4
2,1,B,5,6
3,2,B,7,8
