# **Article 119 : MultiIndex Series and DataFrames** [![Static Badge](https://img.shields.io/badge/Open%20in%20Colab%20-%20orange?style=plastic&logo=googlecolab&labelColor=grey)](https://colab.research.google.com/github/sshrizvi/DS-Python/blob/main/Pandas/Notebooks/119_multiindex_series_and_dataframes.ipynb)

|🔴 **NOTE** 🔴|
|:-----------:|
|This notebook contains the practical implementations of the concepts discussed in the following article.|
| Here is Article 119 - [MultiIndex Series and DataFrames](../Articles/119_multiindex_series_and_dataframes.md) |

### 📦 **Importing Relevant Libraries**

In [1]:
import numpy as np
import pandas as pd

### 🚀 **Creating a MultiIndex Object**

#### **1. From Tuples**

In [2]:
multiindex_tuple = [
    ('USA', 'California'),
    ('USA', 'Texas'),
    ('India', 'Delhi'),
    ('India', 'Mumbai')
]

multiindex_obj_from_tuple = pd.MultiIndex.from_tuples(
    tuples = multiindex_tuple,
    names = ['Country', 'State']
)

multiindex_obj_from_tuple

MultiIndex([(  'USA', 'California'),
            (  'USA',      'Texas'),
            ('India',      'Delhi'),
            ('India',     'Mumbai')],
           names=['Country', 'State'])

#### **2. From Product**

In [3]:
countries = ['New York', 'Los Angeles', 'Chicago']
products = ['Laptop', 'Smartphone', 'Tablet']

multiindex_obj_from_prod = pd.MultiIndex.from_product(
    iterables = [countries, products],
    names = ['Country', 'Product']
)

multiindex_obj_from_prod

MultiIndex([(   'New York',     'Laptop'),
            (   'New York', 'Smartphone'),
            (   'New York',     'Tablet'),
            ('Los Angeles',     'Laptop'),
            ('Los Angeles', 'Smartphone'),
            ('Los Angeles',     'Tablet'),
            (    'Chicago',     'Laptop'),
            (    'Chicago', 'Smartphone'),
            (    'Chicago',     'Tablet')],
           names=['Country', 'Product'])

#### **3. From Arrays**

In [4]:
arrays = [
    [1, 1, 2, 2],
    ['red', 'blue', 'red', 'blue']
]

multiindex_obj_from_arrays = pd.MultiIndex.from_arrays(
    arrays = arrays,
    names=('number', 'color')
)

multiindex_obj_from_arrays

MultiIndex([(1,  'red'),
            (1, 'blue'),
            (2,  'red'),
            (2, 'blue')],
           names=['number', 'color'])

### 🚀 **Creating a MultiIndex Series**

In [5]:
units_sold = np.array(
    object = [100, 200, 300, 110, 210, 310, 120, 220, 320]
)

product_sales_series = pd.Series(
    data = units_sold,
    name = 'Units Sold',
    index = multiindex_obj_from_prod
)

product_sales_series

Country      Product   
New York     Laptop        100
             Smartphone    200
             Tablet        300
Los Angeles  Laptop        110
             Smartphone    210
             Tablet        310
Chicago      Laptop        120
             Smartphone    220
             Tablet        320
Name: Units Sold, dtype: int32

### 📦 **Stacking**  
**Method :** `pandas.DataFrame.stack(...)`

#### **1. Stacking Single-Level Column**

##### ⚠️ **Example Data**

In [6]:
df = pd.DataFrame(
    data = {
    'Math': [85, 90],
    'Science': [88, 92]
    },
    index = pd.Index(['Alice', 'Bob'],
    name='Name')
)

df

Unnamed: 0_level_0,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,90,92


##### ⚡ **Implementation**

In [7]:
df.stack()

Name          
Alice  Math       85
       Science    88
Bob    Math       90
       Science    92
dtype: int64

#### **2. Stacking MultiIndex Column**

##### ⚠️ **Example Data**

In [8]:
products = ['Laptop', 'Smartphone']
quarters = ['Q1', 'Q2']
metrics = ['Units Sold', 'Revenue']

multi_col = pd.MultiIndex.from_product(
    iterables = [products, quarters, metrics],
    names = ['Product', 'Quarter', 'Metric']
)

np.random.seed(42)
data = np.random.randint(
    low = 100,
    high = 1000,
    size = (5, len(multi_col))
)

multi_col_df = pd.DataFrame(
    data = data,
    columns = multi_col,
    index = ['Store A', 'Store B', 'Store C', 'Store D', 'Store E']
)

multi_col_df

Product,Laptop,Laptop,Laptop,Laptop,Smartphone,Smartphone,Smartphone,Smartphone
Quarter,Q1,Q1,Q2,Q2,Q1,Q1,Q2,Q2
Metric,Units Sold,Revenue,Units Sold,Revenue,Units Sold,Revenue,Units Sold,Revenue
Store A,202,535,960,370,206,171,800,120
Store B,714,221,566,314,430,558,187,472
Store C,199,971,763,230,761,408,869,443
Store D,591,513,905,485,291,376,260,559
Store E,413,121,352,847,956,660,574,158


##### ⚡ **Implementation**

In [9]:
multi_col_df.stack(
    future_stack = True
)

Unnamed: 0_level_0,Product,Laptop,Laptop,Smartphone,Smartphone
Unnamed: 0_level_1,Quarter,Q1,Q2,Q1,Q2
Unnamed: 0_level_2,Metric,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Store A,Units Sold,202,960,206,800
Store A,Revenue,535,370,171,120
Store B,Units Sold,714,566,430,187
Store B,Revenue,221,314,558,472
Store C,Units Sold,199,763,761,869
Store C,Revenue,971,230,408,443
Store D,Units Sold,591,905,291,260
Store D,Revenue,513,485,376,559
Store E,Units Sold,413,352,956,574
Store E,Revenue,121,847,660,158


In [10]:
multi_col_df.stack(
    level = 0,
    future_stack = True
)

Unnamed: 0_level_0,Quarter,Q1,Q1,Q2,Q2
Unnamed: 0_level_1,Metric,Units Sold,Revenue,Units Sold,Revenue
Unnamed: 0_level_2,Product,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Store A,Laptop,202,535,960,370
Store A,Smartphone,206,171,800,120
Store B,Laptop,714,221,566,314
Store B,Smartphone,430,558,187,472
Store C,Laptop,199,971,763,230
Store C,Smartphone,761,408,869,443
Store D,Laptop,591,513,905,485
Store D,Smartphone,291,376,260,559
Store E,Laptop,413,121,352,847
Store E,Smartphone,956,660,574,158


In [11]:
multi_col_df.stack(
    level ='Quarter',
    future_stack = True
)

Unnamed: 0_level_0,Product,Laptop,Laptop,Smartphone,Smartphone
Unnamed: 0_level_1,Metric,Units Sold,Revenue,Units Sold,Revenue
Unnamed: 0_level_2,Quarter,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Store A,Q1,202,535,206,171
Store A,Q2,960,370,800,120
Store B,Q1,714,221,430,558
Store B,Q2,566,314,187,472
Store C,Q1,199,971,761,408
Store C,Q2,763,230,869,443
Store D,Q1,591,513,291,376
Store D,Q2,905,485,260,559
Store E,Q1,413,121,956,660
Store E,Q2,352,847,574,158


In [12]:
multi_col_df.stack(
    level = [0,1],
    future_stack = True
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Metric,Units Sold,Revenue
Unnamed: 0_level_1,Product,Quarter,Unnamed: 3_level_1,Unnamed: 4_level_1
Store A,Laptop,Q1,202,535
Store A,Laptop,Q2,960,370
Store A,Smartphone,Q1,206,171
Store A,Smartphone,Q2,800,120
Store B,Laptop,Q1,714,221
Store B,Laptop,Q2,566,314
Store B,Smartphone,Q1,430,558
Store B,Smartphone,Q2,187,472
Store C,Laptop,Q1,199,971
Store C,Laptop,Q2,763,230


### 📦 **Unstacking**  
**Method :** `pandas.DataFrame.unstack(...)`

#### **1. Unstacking Single-Level Index**

##### ⚠️ **Example Data**

In [13]:
df = pd.DataFrame(
    data = {
    'Math': [85, 90],
    'Science': [88, 92]
    },
    index = pd.Index(['Alice', 'Bob'],
    name='Name')
)

df

Unnamed: 0_level_0,Math,Science
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,85,88
Bob,90,92


##### ⚡ **Implementation**

In [14]:
df.unstack()

         Name 
Math     Alice    85
         Bob      90
Science  Alice    88
         Bob      92
dtype: int64

#### **2. Unstacking MultiIndex Index**

##### ⚠️ **Example Data**

In [15]:
cities = ['New York', 'Los Angeles', 'Chicago']
years = [2022, 2023]

index = pd.MultiIndex.from_product(
    iterables = [cities, years],
    names = ['City', 'Year']
)

np.random.seed(42)
data = {
    'Laptop Sales': np.random.randint(100, 1000, size=len(index)),
    'Smartphone Sales': np.random.randint(100, 1000, size=len(index)),
    'Tablet Sales': np.random.randint(100, 1000, size=len(index))
}

multi_index_df = pd.DataFrame(
    data = data,
    index = index
)

multi_index_df

Unnamed: 0_level_0,Unnamed: 1_level_0,Laptop Sales,Smartphone Sales,Tablet Sales
City,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
New York,2022,202,800,430
New York,2023,535,120,558
Los Angeles,2022,960,714,187
Los Angeles,2023,370,221,472
Chicago,2022,206,566,199
Chicago,2023,171,314,971


##### ⚡ **Implementation**

In [16]:
multi_index_df.unstack()

Unnamed: 0_level_0,Laptop Sales,Laptop Sales,Smartphone Sales,Smartphone Sales,Tablet Sales,Tablet Sales
Year,2022,2023,2022,2023,2022,2023
City,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Chicago,206,171,566,314,199,971
Los Angeles,960,370,714,221,187,472
New York,202,535,800,120,430,558


In [17]:
multi_index_df.unstack(
    level = 0
)

Unnamed: 0_level_0,Laptop Sales,Laptop Sales,Laptop Sales,Smartphone Sales,Smartphone Sales,Smartphone Sales,Tablet Sales,Tablet Sales,Tablet Sales
City,Chicago,Los Angeles,New York,Chicago,Los Angeles,New York,Chicago,Los Angeles,New York
Year,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
2022,206,960,202,566,714,800,199,187,430
2023,171,370,535,314,221,120,971,472,558


In [18]:
multi_index_df.unstack(
    level = [0, 1]
)

                  City         Year
Laptop Sales      New York     2022    202
                               2023    535
                  Los Angeles  2022    960
                               2023    370
                  Chicago      2022    206
                               2023    171
Smartphone Sales  New York     2022    800
                               2023    120
                  Los Angeles  2022    714
                               2023    221
                  Chicago      2022    566
                               2023    314
Tablet Sales      New York     2022    430
                               2023    558
                  Los Angeles  2022    187
                               2023    472
                  Chicago      2022    199
                               2023    971
dtype: int32

In [19]:
multi_index_df.unstack().unstack()

                  Year  City       
Laptop Sales      2022  Chicago        206
                        Los Angeles    960
                        New York       202
                  2023  Chicago        171
                        Los Angeles    370
                        New York       535
Smartphone Sales  2022  Chicago        566
                        Los Angeles    714
                        New York       800
                  2023  Chicago        314
                        Los Angeles    221
                        New York       120
Tablet Sales      2022  Chicago        199
                        Los Angeles    187
                        New York       430
                  2023  Chicago        971
                        Los Angeles    472
                        New York       558
dtype: int32

### 🚀 **MultiIndex DataFrame**

#### 📌 **Creating a MultiIndex DataFrame**

##### **Example 1**

In [20]:
regions = ['North', 'South']
stores = ['Store A', 'Store B']
years = [2022, 2023]

index = pd.MultiIndex.from_product(
    iterables = [regions, stores, years],
    names = ['Region', 'Store', 'Year']
)

data = {
    'Total Sales ($)': np.random.randint(50000, 200000, size=len(index)),
    'Profit ($)': np.random.randint(5000, 30000, size=len(index)),
    'Units Sold': np.random.randint(1000, 10000, size=len(index)),
}

df1 = pd.DataFrame(
    data = data,
    index = index
)

df1

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Total Sales ($),Profit ($),Units Sold
Region,Store,Year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
North,Store A,2022,66023,10051,8849
North,Store A,2023,91090,11420,3047
North,Store B,2022,117221,22568,3747
North,Store B,2023,114820,25939,1189
South,Store A,2022,50769,24769,3734
South,Store A,2023,109735,11396,4005
South,Store B,2022,114925,13666,5658
South,Store B,2023,55311,23942,2899


##### **Example 2**

In [21]:
departments = ['CSE', 'ECE']
semesters = ['Sem 1', 'Sem 2']
subjects = ['Maths', 'DSA', 'Python']

index = pd.MultiIndex.from_product(
    iterables = [departments, semesters, subjects],
    names = ['Department', 'Semester', 'Subject']
)

data = {
    'Average Score': np.random.randint(50, 90, size=len(index)),
    'Highest Score': np.random.randint(85, 100, size=len(index)),
    'Pass Percentage': np.random.randint(70, 100, size=len(index))
}

df2 = pd.DataFrame(
    data = data,
    index = index
)

df2

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Average Score,Highest Score,Pass Percentage
Department,Semester,Subject,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CSE,Sem 1,Maths,52,89,83
CSE,Sem 1,DSA,86,86,86
CSE,Sem 1,Python,56,88,73
CSE,Sem 2,Maths,70,96,87
CSE,Sem 2,DSA,58,99,77
CSE,Sem 2,Python,88,96,73
ECE,Sem 1,Maths,67,91,71
ECE,Sem 1,DSA,53,96,99
ECE,Sem 1,Python,74,97,75
ECE,Sem 2,Maths,63,92,91


##### **Example 3**

In [22]:
hospitals = ['Hospital A', 'Hospital B']
departments = ['Cardiology', 'Neurology']
patients = ['P001', 'P002']
row_index = pd.MultiIndex.from_product([hospitals, departments, patients], names=['Hospital', 'Department', 'Patient ID'])

years = [2022, 2023]
metrics = ['Blood Pressure', 'Cholesterol']
checkups = ['Initial', 'Final']
col_index = pd.MultiIndex.from_product([years, metrics, checkups], names=['Year', 'Metric', 'Checkup'])

data = np.random.randint(
    low = 70,
    high = 180,
    size = (len(row_index), len(col_index))
)

df3 = pd.DataFrame(
    data = data,
    index = row_index,
    columns = col_index
)

df3

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2022,2022,2023,2023,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital A,Cardiology,P001,123,162,132,87,159,113,103,143
Hospital A,Cardiology,P002,131,169,83,164,117,84,141,147
Hospital A,Neurology,P001,156,131,109,154,149,178,151,122
Hospital A,Neurology,P002,93,95,158,129,178,110,98,84
Hospital B,Cardiology,P001,114,134,158,140,78,157,70,177
Hospital B,Cardiology,P002,77,157,132,80,150,77,104,104
Hospital B,Neurology,P001,102,74,175,172,110,97,76,142
Hospital B,Neurology,P002,141,81,103,102,117,92,131,157


#### 📌 **Accessing Columns in a MultiIndex DataFrame**

In [23]:
df1['Profit ($)']

Region  Store    Year
North   Store A  2022    10051
                 2023    11420
        Store B  2022    22568
                 2023    25939
South   Store A  2022    24769
                 2023    11396
        Store B  2022    13666
                 2023    23942
Name: Profit ($), dtype: int32

In [24]:
type(df1['Profit ($)'])

pandas.core.series.Series

In [25]:
df3[2022]

Unnamed: 0_level_0,Unnamed: 1_level_0,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Unnamed: 0_level_1,Unnamed: 1_level_1,Checkup,Initial,Final,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Hospital A,Cardiology,P001,123,162,132,87
Hospital A,Cardiology,P002,131,169,83,164
Hospital A,Neurology,P001,156,131,109,154
Hospital A,Neurology,P002,93,95,158,129
Hospital B,Cardiology,P001,114,134,158,140
Hospital B,Cardiology,P002,77,157,132,80
Hospital B,Neurology,P001,102,74,175,172
Hospital B,Neurology,P002,141,81,103,102


In [26]:
df3[2022]['Blood Pressure']

Unnamed: 0_level_0,Unnamed: 1_level_0,Checkup,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_1,Unnamed: 4_level_1
Hospital A,Cardiology,P001,123,162
Hospital A,Cardiology,P002,131,169
Hospital A,Neurology,P001,156,131
Hospital A,Neurology,P002,93,95
Hospital B,Cardiology,P001,114,134
Hospital B,Cardiology,P002,77,157
Hospital B,Neurology,P001,102,74
Hospital B,Neurology,P002,141,81


In [27]:
df3[(2022, 'Blood Pressure')]

Unnamed: 0_level_0,Unnamed: 1_level_0,Checkup,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_1,Unnamed: 4_level_1
Hospital A,Cardiology,P001,123,162
Hospital A,Cardiology,P002,131,169
Hospital A,Neurology,P001,156,131
Hospital A,Neurology,P002,93,95
Hospital B,Cardiology,P001,114,134
Hospital B,Cardiology,P002,77,157
Hospital B,Neurology,P001,102,74
Hospital B,Neurology,P002,141,81


In [28]:
df3[2022]['Blood Pressure']['Initial']

Hospital    Department  Patient ID
Hospital A  Cardiology  P001          123
                        P002          131
            Neurology   P001          156
                        P002           93
Hospital B  Cardiology  P001          114
                        P002           77
            Neurology   P001          102
                        P002          141
Name: Initial, dtype: int32

**Extracting The *Initial Values* Column of Health Metrics Using Slicing**

In [29]:
df3.loc[:,(2022, 'Blood Pressure'):(2023, 'Cholesterol'):2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Cholesterol,Blood Pressure,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Initial,Initial,Initial
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Hospital A,Cardiology,P001,123,132,159,103
Hospital A,Cardiology,P002,131,83,117,141
Hospital A,Neurology,P001,156,109,149,151
Hospital A,Neurology,P002,93,158,178,98
Hospital B,Cardiology,P001,114,158,78,70
Hospital B,Cardiology,P002,77,132,150,104
Hospital B,Neurology,P001,102,175,110,76
Hospital B,Neurology,P002,141,103,117,131


#### 📌 **Accessing Rows in a MultiIndex DataFrame**

**Use loc[ ] or iloc[ ] for accessing rows**

In [30]:
df1.loc['North']

Unnamed: 0_level_0,Unnamed: 1_level_0,Total Sales ($),Profit ($),Units Sold
Store,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Store A,2022,66023,10051,8849
Store A,2023,91090,11420,3047
Store B,2022,117221,22568,3747
Store B,2023,114820,25939,1189


In [31]:
df1.loc['North'].loc['Store A']

Unnamed: 0_level_0,Total Sales ($),Profit ($),Units Sold
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022,66023,10051,8849
2023,91090,11420,3047


In [32]:
df1.loc[('North', 'Store A')]

Unnamed: 0_level_0,Total Sales ($),Profit ($),Units Sold
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2022,66023,10051,8849
2023,91090,11420,3047


**Showing only *2022* data**

In [33]:
df1.iloc[::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Total Sales ($),Profit ($),Units Sold
Region,Store,Year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
North,Store A,2022,66023,10051,8849
North,Store B,2022,117221,22568,3747
South,Store A,2022,50769,24769,3734
South,Store B,2022,114925,13666,5658


#### 📌 **Accessing Both Rows and Columns in a MultiIndex DataFrame**

**Accessing Alternate Rows and All Columns**

In [34]:
df1.iloc[0::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Total Sales ($),Profit ($),Units Sold
Region,Store,Year,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
North,Store A,2022,66023,10051,8849
North,Store B,2022,117221,22568,3747
South,Store A,2022,50769,24769,3734
South,Store B,2022,114925,13666,5658


**Accessing All Rows and Alternate Columns**

In [35]:
df1.iloc[::,0::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Total Sales ($),Units Sold
Region,Store,Year,Unnamed: 3_level_1,Unnamed: 4_level_1
North,Store A,2022,66023,8849
North,Store A,2023,91090,3047
North,Store B,2022,117221,3747
North,Store B,2023,114820,1189
South,Store A,2022,50769,3734
South,Store A,2023,109735,4005
South,Store B,2022,114925,5658
South,Store B,2023,55311,2899


In [36]:
df3.loc[:, 0::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Cholesterol,Blood Pressure,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Initial,Initial,Initial
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Hospital A,Cardiology,P001,123,132,159,103
Hospital A,Cardiology,P002,131,83,117,141
Hospital A,Neurology,P001,156,109,149,151
Hospital A,Neurology,P002,93,158,178,98
Hospital B,Cardiology,P001,114,158,78,70
Hospital B,Cardiology,P002,77,132,150,104
Hospital B,Neurology,P001,102,175,110,76
Hospital B,Neurology,P002,141,103,117,131


In [37]:
df3.loc[:,(2022, 'Blood Pressure'):(2023, 'Cholesterol'):2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Cholesterol,Blood Pressure,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Initial,Initial,Initial
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Hospital A,Cardiology,P001,123,132,159,103
Hospital A,Cardiology,P002,131,83,117,141
Hospital A,Neurology,P001,156,109,149,151
Hospital A,Neurology,P002,93,158,178,98
Hospital B,Cardiology,P001,114,158,78,70
Hospital B,Cardiology,P002,77,132,150,104
Hospital B,Neurology,P001,102,175,110,76
Hospital B,Neurology,P002,141,103,117,131


**Accessing Initial Values Metric for *P001* only**

In [38]:
df3.iloc[0::2, 0::2]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Cholesterol,Blood Pressure,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Initial,Initial,Initial
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Hospital A,Cardiology,P001,123,132,159,103
Hospital A,Neurology,P001,156,109,149,151
Hospital B,Cardiology,P001,114,158,78,70
Hospital B,Neurology,P001,102,175,110,76


In [39]:
df3.loc[
    ('Hospital A', 'Cardiology', 'P001')::2,
    (2022, 'Blood Pressure')::2
]

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Cholesterol,Blood Pressure,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Initial,Initial,Initial
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3
Hospital A,Cardiology,P001,123,132,159,103
Hospital A,Neurology,P001,156,109,149,151
Hospital B,Cardiology,P001,114,158,78,70
Hospital B,Neurology,P001,102,175,110,76


#### 📌 **Sorting in a MultiIndex DataFrame**

**Sorting Data such that *P002* appears at the TOP**

In [40]:
df3.sort_index(
    axis = 0,
    level = ['Patient ID'],
    ascending = False
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2022,2022,2023,2023,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital B,Neurology,P002,141,81,103,102,117,92,131,157
Hospital B,Cardiology,P002,77,157,132,80,150,77,104,104
Hospital A,Neurology,P002,93,95,158,129,178,110,98,84
Hospital A,Cardiology,P002,131,169,83,164,117,84,141,147
Hospital B,Neurology,P001,102,74,175,172,110,97,76,142
Hospital B,Cardiology,P001,114,134,158,140,78,157,70,177
Hospital A,Neurology,P001,156,131,109,154,149,178,151,122
Hospital A,Cardiology,P001,123,162,132,87,159,113,103,143


**Sorting Data such that 2023 year appears first**

In [41]:
df3.sort_index(
    axis = 1,
    level = 'Year',
    ascending = False
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2023,2023,2023,2023,2022,2022,2022,2022
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital A,Cardiology,P001,103,143,159,113,132,87,123,162
Hospital A,Cardiology,P002,141,147,117,84,83,164,131,169
Hospital A,Neurology,P001,151,122,149,178,109,154,156,131
Hospital A,Neurology,P002,98,84,178,110,158,129,93,95
Hospital B,Cardiology,P001,70,177,78,157,158,140,114,134
Hospital B,Cardiology,P002,104,104,150,77,132,80,77,157
Hospital B,Neurology,P001,76,142,110,97,175,172,102,74
Hospital B,Neurology,P002,131,157,117,92,103,102,141,81


#### 📦 **Transpose**

In [42]:
df3.transpose()

Unnamed: 0_level_0,Unnamed: 1_level_0,Hospital,Hospital A,Hospital A,Hospital A,Hospital A,Hospital B,Hospital B,Hospital B,Hospital B
Unnamed: 0_level_1,Unnamed: 1_level_1,Department,Cardiology,Cardiology,Neurology,Neurology,Cardiology,Cardiology,Neurology,Neurology
Unnamed: 0_level_2,Unnamed: 1_level_2,Patient ID,P001,P002,P001,P002,P001,P002,P001,P002
Year,Metric,Checkup,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
2022,Blood Pressure,Initial,123,131,156,93,114,77,102,141
2022,Blood Pressure,Final,162,169,131,95,134,157,74,81
2022,Cholesterol,Initial,132,83,109,158,158,132,175,103
2022,Cholesterol,Final,87,164,154,129,140,80,172,102
2023,Blood Pressure,Initial,159,117,149,178,78,150,110,117
2023,Blood Pressure,Final,113,84,178,110,157,77,97,92
2023,Cholesterol,Initial,103,141,151,98,70,104,76,131
2023,Cholesterol,Final,143,147,122,84,177,104,142,157


In [43]:
df3.T

Unnamed: 0_level_0,Unnamed: 1_level_0,Hospital,Hospital A,Hospital A,Hospital A,Hospital A,Hospital B,Hospital B,Hospital B,Hospital B
Unnamed: 0_level_1,Unnamed: 1_level_1,Department,Cardiology,Cardiology,Neurology,Neurology,Cardiology,Cardiology,Neurology,Neurology
Unnamed: 0_level_2,Unnamed: 1_level_2,Patient ID,P001,P002,P001,P002,P001,P002,P001,P002
Year,Metric,Checkup,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
2022,Blood Pressure,Initial,123,131,156,93,114,77,102,141
2022,Blood Pressure,Final,162,169,131,95,134,157,74,81
2022,Cholesterol,Initial,132,83,109,158,158,132,175,103
2022,Cholesterol,Final,87,164,154,129,140,80,172,102
2023,Blood Pressure,Initial,159,117,149,178,78,150,110,117
2023,Blood Pressure,Final,113,84,178,110,157,77,97,92
2023,Cholesterol,Initial,103,141,151,98,70,104,76,131
2023,Cholesterol,Final,143,147,122,84,177,104,142,157


#### 📦 **Swaplevel**

In [44]:
df3.swaplevel()

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2022,2022,2023,2023,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Hospital,Patient ID,Department,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital A,P001,Cardiology,123,162,132,87,159,113,103,143
Hospital A,P002,Cardiology,131,169,83,164,117,84,141,147
Hospital A,P001,Neurology,156,131,109,154,149,178,151,122
Hospital A,P002,Neurology,93,95,158,129,178,110,98,84
Hospital B,P001,Cardiology,114,134,158,140,78,157,70,177
Hospital B,P002,Cardiology,77,157,132,80,150,77,104,104
Hospital B,P001,Neurology,102,74,175,172,110,97,76,142
Hospital B,P002,Neurology,141,81,103,102,117,92,131,157


In [45]:
df3.swaplevel(
    axis = 1
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Year,2022,2022,2022,2022,2023,2023,2023,2023
Unnamed: 0_level_1,Unnamed: 1_level_1,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Unnamed: 0_level_2,Unnamed: 1_level_2,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital A,Cardiology,P001,123,162,132,87,159,113,103,143
Hospital A,Cardiology,P002,131,169,83,164,117,84,141,147
Hospital A,Neurology,P001,156,131,109,154,149,178,151,122
Hospital A,Neurology,P002,93,95,158,129,178,110,98,84
Hospital B,Cardiology,P001,114,134,158,140,78,157,70,177
Hospital B,Cardiology,P002,77,157,132,80,150,77,104,104
Hospital B,Neurology,P001,102,74,175,172,110,97,76,142
Hospital B,Neurology,P002,141,81,103,102,117,92,131,157


In [46]:
df3.swaplevel(
    axis = 1,
    i = 'Year',
    j = 'Metric'
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Metric,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol,Blood Pressure,Blood Pressure,Cholesterol,Cholesterol
Unnamed: 0_level_1,Unnamed: 1_level_1,Year,2022,2022,2022,2022,2023,2023,2023,2023
Unnamed: 0_level_2,Unnamed: 1_level_2,Checkup,Initial,Final,Initial,Final,Initial,Final,Initial,Final
Hospital,Department,Patient ID,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3
Hospital A,Cardiology,P001,123,162,132,87,159,113,103,143
Hospital A,Cardiology,P002,131,169,83,164,117,84,141,147
Hospital A,Neurology,P001,156,131,109,154,149,178,151,122
Hospital A,Neurology,P002,93,95,158,129,178,110,98,84
Hospital B,Cardiology,P001,114,134,158,140,78,157,70,177
Hospital B,Cardiology,P002,77,157,132,80,150,77,104,104
Hospital B,Neurology,P001,102,74,175,172,110,97,76,142
Hospital B,Neurology,P002,141,81,103,102,117,92,131,157


### 🚀 **Long Format V/S Wide Format Data**

#### ⚠️ **Data Warning**

##### **Reading Data into DataFrames**

In [47]:
covid19_global_deaths = pd.read_csv(
    filepath_or_buffer = '../Resources/Data/time_series_covid19_deaths_global.csv'
)

In [48]:
covid19_global_confirmed = pd.read_csv(
    filepath_or_buffer = '../Resources/Data/time_series_covid19_confirmed_global.csv'
)

#### 📦 **Melt Function**  
Converts Wide Format to Long Format

In [55]:
covid19_global_confirmed.melt()

Unnamed: 0,variable,value
0,Province/State,
1,Province/State,
2,Province/State,
3,Province/State,
4,Province/State,
...,...,...
312404,1/2/23,703228
312405,1/2/23,535
312406,1/2/23,11945
312407,1/2/23,334661


In [61]:
covid19_global_deaths.melt()

Unnamed: 0,variable,value
0,Province/State,
1,Province/State,
2,Province/State,
3,Province/State,
4,Province/State,
...,...,...
312404,1/2/23,5708
312405,1/2/23,0
312406,1/2/23,2159
312407,1/2/23,4024


In [68]:
unpivoted_covid19_global_confirmed = covid19_global_confirmed.melt(
    id_vars = ['Province/State', 'Country/Region', 'Lat', 'Long'],
    var_name = 'Date',
    value_name = 'Num_Of_Confirmed'
)
unpivoted_covid19_global_confirmed

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Num_Of_Confirmed
0,,Afghanistan,33.939110,67.709953,1/22/20,0
1,,Albania,41.153300,20.168300,1/22/20,0
2,,Algeria,28.033900,1.659600,1/22/20,0
3,,Andorra,42.506300,1.521800,1/22/20,0
4,,Angola,-11.202700,17.873900,1/22/20,0
...,...,...,...,...,...,...
311248,,West Bank and Gaza,31.952200,35.233200,1/2/23,703228
311249,,Winter Olympics 2022,39.904200,116.407400,1/2/23,535
311250,,Yemen,15.552727,48.516388,1/2/23,11945
311251,,Zambia,-13.133897,27.849332,1/2/23,334661


In [69]:
unpivoted_covid19_global_deaths = covid19_global_deaths.melt(
    id_vars = ['Province/State', 'Country/Region', 'Lat', 'Long'],
    var_name = 'Date',
    value_name = 'Num_Of_Deaths'
)
unpivoted_covid19_global_deaths

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Num_Of_Deaths
0,,Afghanistan,33.939110,67.709953,1/22/20,0
1,,Albania,41.153300,20.168300,1/22/20,0
2,,Algeria,28.033900,1.659600,1/22/20,0
3,,Andorra,42.506300,1.521800,1/22/20,0
4,,Angola,-11.202700,17.873900,1/22/20,0
...,...,...,...,...,...,...
311248,,West Bank and Gaza,31.952200,35.233200,1/2/23,5708
311249,,Winter Olympics 2022,39.904200,116.407400,1/2/23,0
311250,,Yemen,15.552727,48.516388,1/2/23,2159
311251,,Zambia,-13.133897,27.849332,1/2/23,4024


The below code finds out datewise Confirmed and Death Cases Count of each Country.

In [76]:
pd.merge(
    left = unpivoted_covid19_global_confirmed,
    right = unpivoted_covid19_global_deaths,
    on = ['Province/State', 'Country/Region', 'Lat', 'Long', 'Date'],
    how = 'inner'
)[['Country/Region', 'Date', 'Num_Of_Confirmed', 'Num_Of_Deaths']]

Unnamed: 0,Country/Region,Date,Num_Of_Confirmed,Num_Of_Deaths
0,Afghanistan,1/22/20,0,0
1,Albania,1/22/20,0,0
2,Algeria,1/22/20,0,0
3,Andorra,1/22/20,0,0
4,Angola,1/22/20,0,0
...,...,...,...,...
311248,West Bank and Gaza,1/2/23,703228,5708
311249,Winter Olympics 2022,1/2/23,535,0
311250,Yemen,1/2/23,11945,2159
311251,Zambia,1/2/23,334661,4024
