# **Data Transformation**

## **2. Lambda Functions**

In [1]:
import numpy as np 
import pandas as pd 


## 1. 🧠 What It Does & When to Use It

### 📌 What?

A **lambda function** is an **anonymous**, **inline function** defined using the `lambda` keyword. It is used to **apply simple operations** without defining a named function.

In pandas, lambda functions are commonly used with:

* `apply()`
* `map()`
* `applymap()`
* `groupby().apply()`

They allow **quick, custom, and often one-time-use logic** for transforming data.

---

### 🎯 When to Use?

Use lambda functions when:

* You want to **apply a short, simple logic** on the fly
* You **don’t need to reuse the function**
* You want to **write concise and readable code** during data transformation
* You're working with **custom conditional logic**


## 2. 🧾 Syntax & Core Parameters

### 🔹 Basic Syntax

```python
lambda arguments: expression
```

* No `def` or return statement needed
* Can have multiple parameters (but typically 1 in pandas use cases)
* Usually paired with `apply()`/`map()` in pandas

---

### 🔸 Example:

```python
lambda x: x * 2
```

Can be used like:

```python
df['col'].apply(lambda x: x * 2)
```


## 3. 🧰 Methods & Techniques

Lambda functions are most often used in pandas with:

### 🔹 A. `Series.map()` with lambda

```python
df['name'].map(lambda x: x.upper())
```

---

### 🔹 B. `DataFrame.apply()` with lambda

#### Column-wise:

```python
df.apply(lambda col: col.mean(), axis=0)
```

#### Row-wise:

```python
df.apply(lambda row: row['A'] + row['B'], axis=1)
```

---

### 🔹 C. Conditional assignment with lambda

```python
df['status'] = df['age'].apply(lambda x: 'Senior' if x > 60 else 'Adult')
```

---

### 🔹 D. Lambda with `groupby().apply()`

```python
df.groupby('gender')['salary'].apply(lambda x: x.mean())
```


## 4. ⚠️ Common Pitfalls & Best Practices

### ❌ Pitfalls

| Problem             | Explanation                                                                |
| ------------------- | -------------------------------------------------------------------------- |
| Overusing lambdas   | For complex logic, lambdas become unreadable. Use named functions instead. |
| Performance         | Lambda + `apply()` is slower than vectorized ops.                          |
| Forgetting `axis=1` | For row-wise `apply`, forgetting `axis=1` leads to errors.                 |
| Using side effects  | Avoid print/logging inside lambdas—they're meant to return values.         |

### ✅ Best Practices

* Use lambdas for **short** and **simple expressions**
* Prefer **vectorized operations** where possible
* Use **named functions** if logic exceeds 1–2 lines
* Comment lambdas if their intent isn’t obvious


## 5. 🧪 Examples on Real/Pseudo Data

Let's use the same sample DataFrame:

In [2]:
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 32, 67],
    'salary': [50000, 60000, 80000]
}
df = pd.DataFrame(data)

df

Unnamed: 0,name,age,salary
0,Alice,25,50000
1,Bob,32,60000
2,Charlie,67,80000


### ▶️ 1. Simple Transformation

In [3]:
# Double the salary
df['salary_double'] = df['salary'].apply(lambda row: row * 2)
df

Unnamed: 0,name,age,salary,salary_double
0,Alice,25,50000,100000
1,Bob,32,60000,120000
2,Charlie,67,80000,160000


### ▶️ 2. Conditional Labeling

In [4]:
# Tag people as 'Senior' if age > 60
df['is_senior'] = df['age'].apply(lambda row: True if row > 60 else False)
df

Unnamed: 0,name,age,salary,salary_double,is_senior
0,Alice,25,50000,100000,False
1,Bob,32,60000,120000,False
2,Charlie,67,80000,160000,True


### ▶️ 3. Row-wise Operation with Multiple Columns

In [5]:
# Calculate income-to-age ratio
df['income_ratio'] = df.apply(lambda row: row['salary']/row['age'], axis=1)
df

Unnamed: 0,name,age,salary,salary_double,is_senior,income_ratio
0,Alice,25,50000,100000,False,2000.0
1,Bob,32,60000,120000,False,1875.0
2,Charlie,67,80000,160000,True,1194.029851


### ▶️ 4. String Cleaning

In [6]:
# Uppercase names using map

df['name_upper'] = df['name'].map(lambda row: row.upper())
df

Unnamed: 0,name,age,salary,salary_double,is_senior,income_ratio,name_upper
0,Alice,25,50000,100000,False,2000.0,ALICE
1,Bob,32,60000,120000,False,1875.0,BOB
2,Charlie,67,80000,160000,True,1194.029851,CHARLIE


### ▶️ 5. Group-wise Application

In [8]:
# Group by custom logic

df['salary_level'] = df['salary'].apply(lambda row: 'high' if row > 70000 else 'low')
df

Unnamed: 0,name,age,salary,salary_double,is_senior,income_ratio,name_upper,salary_level
0,Alice,25,50000,100000,False,2000.0,ALICE,low
1,Bob,32,60000,120000,False,1875.0,BOB,low
2,Charlie,67,80000,160000,True,1194.029851,CHARLIE,high



## 6. 🌍 Real-World Use Cases

### 🛍️ E-commerce

```python
# Create discount flag
df['discount_flag'] = df['price'].apply(lambda x: 'Yes' if x > 1000 else 'No')
```

### 💰 Finance

```python
# Calculate credit risk score
df['risk_score'] = df.apply(lambda row: row['loan'] / row['income'] if row['income'] > 0 else 0, axis=1)
```

### 🧬 Healthcare

```python
# Categorize blood sugar levels
df['sugar_category'] = df['blood_sugar'].apply(lambda x: 'Normal' if x < 120 else 'High')
```

### 🧾 HR Analytics

```python
# Seniority level from experience
df['seniority'] = df['experience'].apply(lambda x: 'Senior' if x > 10 else 'Junior')
```

## ✅ Summary

| Feature            | Notes                                          |
| ------------------ | ---------------------------------------------- |
| Anonymous Function | Created using `lambda` keyword                 |
| Use with           | `.apply()`, `.map()`, `.applymap()`, `groupby` |
| Best for           | Quick, custom one-line logic                   |
| Avoid for          | Complex or multi-step logic                    |


<center><b>Thanks</b></center>