# **Data Transformation**

## **6. Value Transformations**

In [1]:
import numpy as np 
import pandas as pd 

## ✅ 1. What it does and when to use it

**Value transformations** involve modifying, replacing, or converting the data values within a DataFrame or Series. These transformations are vital in cleaning, normalizing, or preparing data for analysis or modeling.

**Use cases include:**

* Fixing incorrect or inconsistent values
* Mapping categories or labels
* Changing data types for compatibility (e.g., int → float, object → datetime)
* Normalizing or scaling values
* Performing mathematical transformations

## 🧠 2. Syntax and Core Parameters

Here are the most common transformation functions:

| Method         | Purpose                        |
| -------------- | ------------------------------ |
| `df.replace()` | Replace specific values        |
| `df.astype()`  | Convert data types             |
| `df.round()`   | Round numeric values           |
| `df.abs()`     | Absolute value                 |
| `df.clip()`    | Limit values within a range    |
| `df.fillna()`  | Fill missing values            |
| `df.map()`     | Map/transform values in Series |


## 🧪 3. Methods and Techniques

### A. **Replacing values**

```python
df['Gender'].replace({'M': 'Male', 'F': 'Female'})
```

* Can also replace numeric values:

  ```python
  df['score'].replace(-1, np.nan)
  ```

---

### B. **Type conversion with `astype()`**

```python
df['Year'] = df['Year'].astype(int)
df['Price'] = df['Price'].astype(float)
df['Date'] = pd.to_datetime(df['Date'])
```

---

### C. **Rounding numbers**

```python
df['Salary'].round(2)     # Round to 2 decimal places
```

---

### D. **Absolute values**

```python
df['Profit'] = df['Profit'].abs()
```

---

### E. **Clipping values**

Limit values below or above thresholds:

```python
df['score'].clip(lower=0, upper=100)
```

---

### F. **Filling missing values**

```python
df['Age'].fillna(df['Age'].mean())
```

---

### G. **Mapping with functions**

```python
df['Grade'] = df['Score'].map(lambda x: 'Pass' if x > 40 else 'Fail')
```


## ⚠️ 4. Common Pitfalls and Best Practices

| Pitfall                                      | Fix                                    |
| -------------------------------------------- | -------------------------------------- |
| `astype()` conversion errors                 | Use `errors='coerce'` for safe casting |
| Replacing too broadly (e.g., 0 to NaN)       | Be specific in mappings                |
| Using `map()` on DataFrames                  | Only works on Series                   |
| Forgetting to assign back the transformation | Always do `df['col'] = ...`            |

**Best Practices:**

* Use `.copy()` when doing chained assignments to avoid warning.
* Always verify the result with `.head()` or `.info()`.
* Prefer `map()` for element-wise transformation on Series, and `apply()` on DataFrames.


## 🔍 5. Examples on Real/Pseudo Data

### Example 1: Replace & Map

In [2]:
df = pd.DataFrame({
    'gender': ['M', 'F', 'M', 'F'],
    'score': [89, 45, 67, 90]
})

df

Unnamed: 0,gender,score
0,M,89
1,F,45
2,M,67
3,F,90


In [4]:
# Replace values
df['gender'] = df['gender'].replace({'M': 'Male', 'F': 'Female'})

# Map pass/fail
df['result'] = df['score'].map(lambda x: 'Pass' if x >= 50 else 'Fail')

display(df)

Unnamed: 0,gender,score,result
0,Male,89,Pass
1,Female,45,Fail
2,Male,67,Pass
3,Female,90,Pass


### Example 2: Type Conversion & Rounding

In [5]:
df = pd.DataFrame({
    'amount': ['100.00', '250.50', '400.99'],
    'date': ['2024-01-01', '2024-01-05', '2024-02-01']
})

df

Unnamed: 0,amount,date
0,100.0,2024-01-01
1,250.5,2024-01-05
2,400.99,2024-02-01


In [6]:
df.dtypes

amount    object
date      object
dtype: object

In [7]:
df['amount'] = df['amount'].astype(float)
df['amount'] = df['amount'].round(1)

df['date'] = pd.to_datetime(df['date'])

df

Unnamed: 0,amount,date
0,100.0,2024-01-01
1,250.5,2024-01-05
2,401.0,2024-02-01


In [8]:
df.dtypes

amount           float64
date      datetime64[ns]
dtype: object

## 🌍 6. Real-World Use Cases

| Use Case                               | Value Transformation Technique                           |
| -------------------------------------- | -------------------------------------------------------- |
| **Cleaning survey data**               | `replace()` incorrect codes with readable labels         |
| **Converting age column to numeric**   | `astype()` with `errors='coerce'`                        |
| **Capping outliers in financial data** | `clip(lower=x, upper=y)`                                 |
| **Creating age groups**                | Use `map()` or custom function with `apply()`            |
| **Normalizing income**                 | `df['income'] / df['income'].max()` followed by rounding |
| **Filling missing sensor values**      | `fillna(method='ffill')` or `fillna(mean)`               |
| **Fixing units mismatch**              | Multiply column with factor, then round or convert type  |


## ✅ Summary Table

| Task                        | Method              |
| --------------------------- | ------------------- |
| Replace specific values     | `replace()`         |
| Convert type                | `astype()`          |
| Round decimal numbers       | `round()`           |
| Get absolute values         | `abs()`             |
| Limit values in range       | `clip()`            |
| Fill missing values         | `fillna()`          |
| Transform with custom logic | `map()` / `apply()` |


<center><b>Thanks</b></center>