# **Data Selection & Indexing**

In [1]:
import pandas as pd

# 8. Column Selection Techniques & Axis Operations

Column selection and axis operations are essential for **navigating**, **transforming**, and **analyzing** data structures in pandas — especially when working with real-world datasets with many features.


## 📋 Topics Covered

| Subsection | Concept                                         |
| ---------- | ----------------------------------------------- |
| 1.        | Column Selection by Label                       |
| 2.        | Selecting Multiple Columns                      |
| 3.        | Column Selection by Data Type (`select_dtypes`) |
| 4.        | Column Selection by Regex                       |
| 5.        | Column Selection by Position                    |
| 6.        | Dropping Columns                                |
| 7.        | Axis-based Operations (`axis=0` vs `axis=1`)    |
| 8.        | Transposing DataFrames                          |
| 9.        | Working with `columns` attribute                |
| 10.       | Dynamic Column Selection                        |

## 1. Column Selection by Label

### ✅ Syntax:

```python
df['column_name']
```

In [2]:
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000, 60000, 70000]
})

df

Unnamed: 0,name,age,salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [4]:
df['age']

0    25
1    30
2    35
Name: age, dtype: int64

## 2. Selecting Multiple Columns

In [5]:
df[['name', 'age']]

Unnamed: 0,name,age
0,Alice,25
1,Bob,30
2,Charlie,35


### 📌 Real-Time Use:

Select only the columns needed for ML model features or data export.

## 3. Column Selection by Data Type

### ✅ Using `select_dtypes()`:

In [6]:
df.select_dtypes(include='number')

Unnamed: 0,age,salary
0,25,50000
1,30,60000
2,35,70000


In [7]:
df.select_dtypes(include='object')

Unnamed: 0,name
0,Alice
1,Bob
2,Charlie


In [8]:
df.select_dtypes(exclude='number')

Unnamed: 0,name
0,Alice
1,Bob
2,Charlie


### 📌 Real-Time Use:

Keep only numeric columns for correlation matrix or scaling.

## 4. Column Selection by Regex Pattern

### ✅ Using `filter()`:

In [9]:
# Select columns ending with 'e'
df.filter(regex='e$')

Unnamed: 0,name,age
0,Alice,25
1,Bob,30
2,Charlie,35


### 📌 Real-Time Use:

Select columns matching naming conventions like `Q1_sales`, `Q2_sales`, etc.

## 5. Column Selection by Position

### ✅ Using `.iloc`:

In [10]:
df.iloc[:, [0, 2]]

Unnamed: 0,name,salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000


## 6. Dropping Columns

### ✅ Using `drop()`:

In [11]:
df.drop('age', axis=1)

Unnamed: 0,name,salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000


In [12]:
# To drop multiple columns:
df.drop(['age', 'salary'], axis=1)

Unnamed: 0,name
0,Alice
1,Bob
2,Charlie


## 7. Axis-based Operations (`axis=0` vs `axis=1`)

| Axis     | Meaning      | Used for                |
| -------- | ------------ | ----------------------- |
| `axis=0` | Rows (index) | Summing/down operations |
| `axis=1` | Columns      | Row-wise operations     |

In [13]:
df.sum(axis=0) # Sum of each column

name      AliceBobCharlie
age                    90
salary             180000
dtype: object

In [15]:
try:
    print(df.sum(axis=1)) # Sum of each row
except Exception as e:
    print(e)

can only concatenate str (not "int") to str


In [16]:
df[['age', 'salary']].sum(axis=1) # Sum of each row

0    50025
1    60030
2    70035
dtype: int64

## 8. Transposing DataFrame

### ✅ `.T` for transpose:

In [17]:
df.T

Unnamed: 0,0,1,2
name,Alice,Bob,Charlie
age,25,30,35
salary,50000,60000,70000


This flips rows ↔ columns.

## 9. Working with `columns` attribute

### ✅ Rename columns dynamically:

In [19]:
df.columns = ['Names', 'Age', 'Salary']

df

Unnamed: 0,Names,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


### ✅ Add prefix or suffix:

In [22]:
df.add_prefix('emp_')

Unnamed: 0,emp_Names,emp_Age,emp_Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [21]:
df.add_suffix('_2023')

Unnamed: 0,Names_2023,Age_2023,Salary_2023
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


## 10. Dynamic Column Selection (Advanced)

### ✅ Based on condition:

In [24]:
df

Unnamed: 0,Names,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [38]:
df[df[['Salary', 'Age']].mean(axis=1) > 55000]

Unnamed: 0,Names,Age,Salary


In [43]:
try:
    print(df.loc[:, df[['Salary', 'Age']].mean() > 55000])
except Exception as e:
    print(e)

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).


## ✅ Real-Time Use Cases

| Scenario                                      | Tool                        |
| --------------------------------------------- | --------------------------- |
| Drop unwanted columns before model training   | `df.drop()`                 |
| Select numeric columns for PCA                | `df.select_dtypes()`        |
| Dynamic selection by pattern (like “sales\_”) | `df.filter(regex='sales_')` |
| Axis control in aggregations                  | `.sum(axis=1)`              |
| Rename all columns for clarity                | `df.columns = [...]`        |


## ✅ Summary Table

| Technique           | Method                 | Example                      |
| ------------------- | ---------------------- | ---------------------------- |
| Select column       | `df['col']`            | `df['salary']`               |
| Select multiple     | `df[['col1', 'col2']]` | `df[['age', 'salary']]`      |
| Select by dtype     | `select_dtypes()`      | `df.select_dtypes('number')` |
| Regex column select | `filter(regex=...)`    | `df.filter(regex='^Q')`      |
| Drop column         | `drop(axis=1)`         | `df.drop('col', axis=1)`     |
| Transpose           | `.T`                   | `df.T`                       |
| Add suffix/prefix   | `add_suffix()`         | `df.add_prefix('emp_')`      |
| Axis operations     | `axis=0`/`axis=1`      | `df.sum(axis=1)`             |


<center><b>Thanks</b></center>