Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,12 +90,11 @@ Total time: 0.0247s
| Category | Description | Examples |
|----------|-------------|----------|
| **col_** | Column operations | `col_rename`, `col_drop`, `col_cast`, `col_add`, `col_select` |
| **math_** | Arithmetic operations | `math_add`, `math_multiply`, `math_clamp`, `math_round`, `math_abs` |
| **math_** | Arithmetic & scaling | `math_add`, `math_multiply`, `math_standardize`, `math_minmax`, `math_clamp` |
| **rows_** | Row filtering & reshaping | `rows_filter`, `rows_drop_nulls`, `rows_sort`, `rows_unique`, `rows_pivot` |
| **str_** | String operations | `str_lower`, `str_upper`, `str_strip`, `str_replace`, `str_split` |
| **dt_** | Datetime operations | `dt_year`, `dt_month`, `dt_parse`, `dt_age_years`, `dt_diff_days` |
| **map_** | Value mapping | `map_values`, `map_discretize`, `map_case`, `map_from_column` |
| **enc_** | Categorical encoding | `enc_onehot`, `enc_ordinal`, `enc_label` |
| **map_** | Value mapping & encoding | `map_values`, `map_discretize`, `map_onehot`, `map_ordinal` |

## Installation

Expand Down
7 changes: 7 additions & 0 deletions docs/api/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,13 @@ All TransformPlan operations at a glance. Click method names for detailed docume
| [`math_percent_of`](ops/math.md) | Calculate percentage of one column relative to another |
| [`math_cumsum`](ops/math.md) | Calculate cumulative sum (optionally grouped) |
| [`math_rank`](ops/math.md) | Calculate rank of values |
| [`math_standardize`](ops/math.md) | Z-score standardization (mean=0, std=1) |
| [`math_minmax`](ops/math.md) | Min-max normalization to a range |
| [`math_robust_scale`](ops/math.md) | Robust scaling using median and IQR |
| [`math_log`](ops/math.md) | Logarithmic transform |
| [`math_sqrt`](ops/math.md) | Square root transform |
| [`math_power`](ops/math.md) | Power transform |
| [`math_winsorize`](ops/math.md) | Clip values to percentiles or bounds |

### Row Operations

Expand Down
187 changes: 0 additions & 187 deletions docs/api/ops/encoding.md

This file was deleted.

57 changes: 55 additions & 2 deletions docs/api/ops/map.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Map Operations

Value mapping, discretization, and transformation operations.
Value mapping, discretization, encoding, and transformation operations.

## Overview

Map operations transform column values using dictionaries, bins, or other columns. They're useful for categorization, value replacement, and data normalization.
Map operations transform column values using dictionaries, bins, or encoding schemes. They're useful for categorization, value replacement, data normalization, and ML feature preparation.

```python
from transformplan import TransformPlan
Expand All @@ -13,6 +13,7 @@ plan = (
TransformPlan()
.map_values("status", {"A": "Active", "I": "Inactive"})
.map_discretize("age", bins=[18, 35, 55], labels=["Young", "Adult", "Senior"])
.map_onehot("color", categories=["red", "green", "blue"], drop="first")
)
```

Expand All @@ -29,6 +30,9 @@ plan = (
- map_null_to_value
- map_value_to_null
- map_from_column
- map_onehot
- map_ordinal
- map_label

## Examples

Expand Down Expand Up @@ -155,3 +159,52 @@ plan = TransformPlan().map_value_to_null("score", -999)
# Replace null with default
plan = TransformPlan().map_null_to_value("category", "Uncategorized")
```

### One-Hot Encoding

```python
# Basic one-hot encoding
plan = TransformPlan().map_onehot(
column="color",
categories=["red", "green", "blue"]
)
# Creates columns: color_red, color_green, color_blue

# Drop first category to avoid multicollinearity (for regression models)
plan = TransformPlan().map_onehot(
column="color",
categories=["red", "green", "blue"],
drop="first"
)
# Creates columns: color_green, color_blue (drops color_red)
```

### Ordinal Encoding

```python
# Ordinal encoding with meaningful order
plan = TransformPlan().map_ordinal(
column="size",
categories=["small", "medium", "large"]
)
# Maps: small -> 0, medium -> 1, large -> 2
```

### Label Encoding

```python
# Label encoding (alphabetically sorted by default)
plan = TransformPlan().map_label(column="department")
# Maps alphabetically: Engineering -> 0, HR -> 1, Sales -> 2
```

### ML Feature Preparation

```python
# One-hot encode categorical features, dropping first to avoid multicollinearity
plan = (
TransformPlan()
.map_onehot("color", categories=["red", "green", "blue"], drop="first")
.map_ordinal("quality", categories=["low", "medium", "high"])
)
```
56 changes: 56 additions & 0 deletions docs/api/ops/math.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,13 @@ plan = (
- math_percent_of
- math_cumsum
- math_rank
- math_standardize
- math_minmax
- math_robust_scale
- math_log
- math_sqrt
- math_power
- math_winsorize

## Examples

Expand Down Expand Up @@ -128,3 +135,52 @@ plan = TransformPlan().math_rank(
group_by="category"
)
```

### Scaling Operations

```python
# Z-score standardization (explicit params for reproducibility)
plan = TransformPlan().math_standardize("income", mean=50000, std=25000)

# Derive from data
plan = TransformPlan().math_standardize("income")

# Min-max normalization to [0, 1]
plan = TransformPlan().math_minmax("age", min_val=0, max_val=100)

# Custom range
plan = TransformPlan().math_minmax("score", min_val=0, max_val=100, feature_range=(0, 10))

# Robust scaling (resistant to outliers)
plan = TransformPlan().math_robust_scale("salary", median=60000, iqr=30000)
```

### Transform Operations

```python
# Natural log
plan = TransformPlan().math_log("price")

# Log base 10
plan = TransformPlan().math_log("price", base=10)

# Log with offset for zeros
plan = TransformPlan().math_log("count", offset=1) # log(x + 1)

# Square root
plan = TransformPlan().math_sqrt("variance")

# Power transform
plan = TransformPlan().math_power("value", exponent=2) # square
plan = TransformPlan().math_power("value", exponent=0.5) # sqrt
```

### Outlier Handling

```python
# Winsorize by percentiles
plan = TransformPlan().math_winsorize("salary", lower=0.05, upper=0.95)

# Winsorize by explicit values
plan = TransformPlan().math_winsorize("salary", lower_value=20000, upper_value=200000)
```
3 changes: 1 addition & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,7 @@ Total time: 0.0247s
| **rows_** | Row filtering & reshaping | `rows_filter`, `rows_drop_nulls`, `rows_sort`, `rows_unique`, `rows_pivot` |
| **str_** | String operations | `str_lower`, `str_upper`, `str_strip`, `str_replace`, `str_split` |
| **dt_** | Datetime operations | `dt_year`, `dt_month`, `dt_parse`, `dt_age_years`, `dt_diff_days` |
| **map_** | Value mapping | `map_values`, `map_discretize`, `map_case`, `map_from_column` |
| **enc_** | Categorical encoding | `enc_onehot`, `enc_ordinal`, `enc_label` |
| **map_** | Value mapping & encoding | `map_values`, `map_discretize`, `map_onehot`, `map_ordinal` |


## Getting Started
Expand Down
1 change: 0 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,3 @@ nav:
- String Operations: api/ops/string.md
- Datetime Operations: api/ops/datetime.md
- Map Operations: api/ops/map.md
- Encoding Operations: api/ops/encoding.md
Loading