# **Data Transformation**

## **1. Applying Functions**

In [11]:
import numpy as np 
import pandas as pd 

## 1. 🧠 What It Does & When to Use It

### 📌 What?

“Applying functions” means executing **a custom or predefined function** to transform data in:

* a **column (Series)**,
* a **row**, or
* **entire DataFrame**.

This enables **custom transformations**, **data cleanup**, **feature engineering**, or applying **complex logic** that built-in functions can’t handle.


### 🎯 When to Use?

Use applying functions when:

* You need **custom logic** for transformation (e.g., classifying age groups)
* Vectorized methods are **not expressive enough**
* You want to **apply row-wise or column-wise** operations
* You need to **scale/modify/clean** values


## 2. 🧾 Syntax & Core Parameters

There are **3 key pandas methods** for applying functions:

| Method                 | Used On     | Axis                             | Purpose                             |
| ---------------------- | ----------- | -------------------------------- | ----------------------------------- |
| `Series.map()`         | `Series`    | N/A                              | Element-wise transformation         |
| `DataFrame.apply()`    | `DataFrame` | `axis=0` (cols), `axis=1` (rows) | Row/col-wise transformation         |
| `DataFrame.applymap()` | `DataFrame` | Element-wise                     | Apply function to every single cell |

---

### 🔹 Basic Syntax

```python
# 1. Series.map()
series.map(function)

# 2. DataFrame.apply()
df.apply(function, axis=0 or 1)

# 3. DataFrame.applymap()
df.applymap(function)
```


## 3. 🧰 Different Methods & Techniques

### 🔸 1. `Series.map()`

* Used only on Series
* Performs **element-wise** transformation

```python
df['col'].map(lambda x: x*2)
```

🔹 You can also pass:

* A function
* A dictionary (for mapping values)
* A Series

### 🔸 2. `DataFrame.apply()`

* Works on **entire rows or columns**
* Supports **row-wise** (`axis=1`) and **column-wise** (`axis=0`) operations

```python
df.apply(lambda row: row['A'] + row['B'], axis=1)
```

Can be used to:

* Compute values based on multiple columns
* Aggregate or transform across rows/columns



### 🔸 3. `DataFrame.applymap()`

* Use when you want to **modify every single cell** in a DataFrame

```python
df.applymap(lambda x: str(x).upper() if type(x)==str else x)
```

⚠️ Only works on DataFrames, not Series.


## 4. ⚠️ Common Pitfalls & Best Practices

### ❌ Pitfalls

* **Performance**: `apply`, `map`, and `applymap` are **slower** than vectorized operations. Use only when vectorized logic isn't sufficient.
* Using `applymap` unnecessarily when `map` or `apply` would be more efficient.
* Misunderstanding `axis`:

  * `axis=0` → column-wise
  * `axis=1` → row-wise
* Returning incorrect format (e.g., returning a list instead of a scalar value inside `apply`)

### ✅ Best Practices

* Prefer **vectorized operations** first, then `map/apply` if necessary.
* Use **`map()`** for single column (Series) transformation.
* Use **`apply()`** with `axis=1` for row-wise custom logic.
* Use **`applymap()`** only when every cell needs the same operation.
* Profile your code with `%timeit` if you're processing large data.


## 5. 🧪 Examples on Real/Pseudo Data

Let’s use a small DataFrame:

In [12]:
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 32, 37],
    'salary': [50000, 60000, 80000]
}
df = pd.DataFrame(data)

df

Unnamed: 0,name,age,salary
0,Alice,25,50000
1,Bob,32,60000
2,Charlie,37,80000


### ▶️ `map()` Example

In [13]:
# Convert names to uppercase
df['name'] = df['name'].map(str.upper)
df

Unnamed: 0,name,age,salary
0,ALICE,25,50000
1,BOB,32,60000
2,CHARLIE,37,80000


### ▶️ `apply()` Column-wise Example

In [14]:
# Find max of each column
df.apply(max, axis=0)

name      CHARLIE
age            37
salary      80000
dtype: object

### ▶️ `apply()` Row-wise Example

In [15]:
# Create a new column 'senior' based on age

df['senior'] = df.apply(lambda row: 'YES' if row['age'] > 30 else 'NO', axis=1)
df

Unnamed: 0,name,age,salary,senior
0,ALICE,25,50000,NO
1,BOB,32,60000,YES
2,CHARLIE,37,80000,YES


### ▶️ `applymap()` Example

In [16]:
# Add ₹ to all numeric cells
df[['salary']] = df[['salary']].applymap(lambda cell: f"₹{cell}")
df

  df[['salary']] = df[['salary']].applymap(lambda cell: f"₹{cell}")


Unnamed: 0,name,age,salary,senior
0,ALICE,25,₹50000,NO
1,BOB,32,₹60000,YES
2,CHARLIE,37,₹80000,YES



## 6. 🌍 Real-World Use Cases

### 📦 E-commerce

* **Apply discount** dynamically to each item:

  ```python
  df['final_price'] = df['price'].apply(lambda x: x*0.9 if x > 1000 else x)
  ```

### 🧾 Finance

* **Classify salary slabs**:

  ```python
  df['slab'] = df['salary'].apply(lambda x: 'High' if x > 70000 else 'Medium' if x > 50000 else 'Low')
  ```

### 📊 HR Analytics

* Combine age and salary to create a **risk factor**:

  ```python
  df['risk'] = df.apply(lambda row: (row['age'] / row['salary']) * 1000, axis=1)
  ```

## 📌 Summary Table

| Method       | Scope            | Use Case Example                        |
| ------------ | ---------------- | --------------------------------------- |
| `map()`      | Series (column)  | Format strings, map values              |
| `apply()`    | Row/column       | Combine columns, apply row logic        |
| `applymap()` | Entire DataFrame | Format all cells, apply function to all |


<center><b>Thanks</b></center>