In [2]:
import pandas as pd
"""
- This is a dictionary of column → values
- Each list represents a column
- All lists must be the same length → here 4 rows
"""
raw_data2= {
    'id': [7, 8, 9, 10],
    'city': ['Berlin', 'Munich', 'London', 'Amsterdam'],
    'rank': ['21st', '32nd', '23rd', '44th'],
    'population': ['10M', '14M', '20M', '6M']
}

df2 = pd.DataFrame(raw_data2,
                   index = pd.Index(['A', 'B', 'C', 'D'], name='letter'),
                   columns=pd.Index(['id', 'city', 'rank', 'population'],
                   name='attributes'))

df2

attributes,id,city,rank,population
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,7,Berlin,21st,10M
B,8,Munich,32nd,14M
C,9,London,23rd,20M
D,10,Amsterdam,44th,6M


Note: In above dataframe,

Row index has a name: letter
Column index has a name: attributes

THe name parameter assigns a label (metadata) to the index object.
It does not affect data, computation, or performance - it is semantic information.

Every Dataframe has two axes:
*  Axis 0 → Row index
*  Axis 1 → Column index

Each axis is represented internally as an Index Object.
The name attribute is:
* a descriptor for that axis
* used in display, reshaping, joins, groupyby, and exports

What happens where the index is reset ?
    df2.reset_index()

- If the index has a name, it becomes the name of the new column
- If the index had no name -> index becomes the column name

Index names are preserved when the dataframe is exported into  (Parquet, CSV, SQL)

Axis names: Are written to Parquet schema
Preserved in Arrow
Used by downstream systems (Spark, Athena, BigQuery)

#### Is a Data Engineering Best Practice
columns = pd.Index(
    ['id', 'city', 'rank', 'population'],
    name='attributes'
)

This:
- Documents schema intent
- Prevents accidental column reordering
- Helps schema validation & data contracts

Care when:
Building data platforms
Working with ML features
Designing schemas
Using groupby / pivot / stack
Exporting to Parquet / Spark / SQL


In MultiIndex

Names allow:

    - Clean groupby(level='region')
    - Clear SQL-like semantics
    - Easier debugging in large data platforms



In [None]:
pd.MultiIndex.from_product(
    [['EU', 'UK'], [2023, 2024]],
    names=['region', 'year']
)

In [3]:
df = df2.copy()
df['region'] = ['EU', 'EU', 'UK', 'EU']
df = df.set_index(['region', df.index])
df.index.names = ['region', 'letter']


In [4]:
df.groupby(level='region').count()

attributes,id,city,rank,population
region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
EU,3,3,3,3
UK,1,1,1,1


### Required for advanced index types

Only pd.Index (or subclasses) can represent:

| Index Type         | Example                |
| ------------------ | ---------------------- |
| `DatetimeIndex`    | Time series            |
| `CategoricalIndex` | Memory-optimized enums |
| `MultiIndex`       | OLAP-style data        |
| `RangeIndex`       | Fast integer ranges    |

Pandas uses Index objects because alignment, joins, slicing, and broadcasting rely on labeled, immutable axes. Explicitly constructing pd.Index allows attaching metadata, enforcing schema, and enabling advanced analytical operations.
