# Index and Columns

In [1]:
import pandas as pd
import numpy as np

## 1. Index


* The **`index`** is the **row label** of a pandas `Series` or `DataFrame`.
* It **identifies each row uniquely** — like a primary key in a database table.
* Can be integers, strings, dates, or multi-level (hierarchical).

In [2]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


Here, `0`, `1`, and `2` are **default row indices**.

### Custom index

In [4]:
df = pd.DataFrame(data, index=['a', 'b', 'c'])
print(df)

      Name  Age
a    Alice   25
b      Bob   30
c  Charlie   35


You can assign meaningful labels instead of numeric values.

### Index Operations

#### Accessing Index

In [5]:
df.index

Index(['a', 'b', 'c'], dtype='object')

#### Setting Index

In [7]:
df = df.set_index('Name')
df

Unnamed: 0_level_0,Age
Name,Unnamed: 1_level_1
Alice,25
Bob,30
Charlie,35


Now `Name` becomes the index.

#### Resetting Index

In [10]:
df = df.reset_index()
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


#### Renaming Index

In [13]:
df.index = ['X', 'Y', 'Z']
df

Unnamed: 0,Name,Age
X,Alice,25
Y,Bob,30
Z,Charlie,35


#### MultiIndex (Hierarchical Index)

In [16]:
df = pd.DataFrame({
    'City': ['New York', 'New York', 'London', 'London'],
    'Year': [2020, 2021, 2020, 2021],
    'Sales': [250, 270, 300, 320]
})

df = df.set_index(['City', 'Year'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
City,Year,Unnamed: 2_level_1
New York,2020,250
New York,2021,270
London,2020,300
London,2021,320


✅ Useful in time series, grouped analysis, panel data.

**Index in Real-World Data Science**

| Scenario                    | Index Usage                |
| --------------------------- | -------------------------- |
| Time series forecasting     | Index = datetime           |
| Log or transaction data     | Index = unique ID          |
| Geo/City-based data         | Index = city/state codes   |
| Panel data (multi-variable) | MultiIndex: (region, year) |

## 2. Columns

* The **columns** are the **labeled fields/variables/features** in a `DataFrame`.
* Each column is a **pandas Series** (1D).

In [20]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


In [21]:
df.columns

Index(['Name', 'Age'], dtype='object')

### Column Operations

#### Access a Column

In [24]:
df['Age']

0    25
1    30
2    35
Name: Age, dtype: int64

#### Access Multiple Columns

In [25]:
df[['Age', 'Name']]

Unnamed: 0,Age,Name
0,25,Alice
1,30,Bob
2,35,Charlie


#### Rename Columns

In [28]:
df.rename(columns={'Age': 'Years'}, inplace=True)
df

Unnamed: 0,Name,Years
0,Alice,25
1,Bob,30
2,Charlie,35


#### Change All Column Names

In [32]:
df.columns = ['FullName', 'Age']
df

Unnamed: 0,FullName,Age
0,Alice,25
1,Bob,30
2,Charlie,35


#### Add a New Column

In [35]:
df['Salary'] = [50000, 60000, 70000]
df

Unnamed: 0,FullName,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


#### Drop a Column

In [37]:
df.drop('Age', axis=1, inplace=True)
df

Unnamed: 0,FullName,Salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000


**Columns in Real-World Data Science**

| Use Case            | Columns Represent                     |
| ------------------- | ------------------------------------- |
| Health dataset      | Patient name, age, disease, diagnosis |
| Sales dataset       | Product name, price, region, date     |
| NLP dataset         | Sentence, word count, sentiment score |
| Forecasting dataset | Date, sales, temperature              |

# 📌 Summary Table

| Feature  | Index                                | Columns                                   |
| -------- | ------------------------------------ | ----------------------------------------- |
| Role     | Labels for rows                      | Labels for variables/features             |
| Type     | `pd.Index` or `pd.MultiIndex`        | `pd.Index`                                |
| Access   | `df.index`                           | `df.columns`, `df['Col']`, `df.Col`       |
| Modify   | `df.set_index()`, `df.reset_index()` | `df.rename()`, `df.drop()`, `df.insert()` |
| Real Use | Time, location, ID                   | Features used in modeling or analysis     |

<center><b>Thanks</b></center>