Skip to content

Commit 91d4f85

Browse files
docs(KDP): improving documentation
1 parent fa903e4 commit 91d4f85

File tree

6 files changed

+550
-507
lines changed

6 files changed

+550
-507
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ deploy_doc:
9090
.PHONY: serve_doc
9191
## Test MkDocs based documentation locally.
9292
serve_doc:
93-
mkdocs serve
93+
poetry run mkdocs serve
9494

9595
# ------------------------------------
9696
# Clean All

docs/feature_selection.md

Lines changed: 75 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,45 @@
1-
# Feature Selection in Keras Data Processor
1+
# 🎯 Feature Selection in KDP
22

3-
The Keras Data Processor includes a sophisticated feature selection mechanism based on the Gated Residual Variable Selection Network (GRVSN) architecture. This document explains the components, usage, and benefits of this feature.
3+
## 📚 Overview
44

5-
## Overview
5+
KDP includes a sophisticated feature selection mechanism based on the Gated Residual Variable Selection Network (GRVSN) architecture. This powerful system automatically learns and selects the most important features in your data.
66

7-
The feature selection mechanism uses a combination of gated units and residual networks to automatically learn the importance of different features in your data. It can be applied to both numeric and categorical features, either independently or together.
7+
## 🧩 Core Components
88

9-
## Components
9+
### 1. 🔀 GatedLinearUnit
1010

11-
### 1. GatedLinearUnit
12-
13-
The `GatedLinearUnit` is the basic building block that implements a gated activation function:
11+
The foundation of our feature selection system:
1412

1513
```python
1614
gl = GatedLinearUnit(units=64)
1715
x = tf.random.normal((32, 100))
1816
y = gl(x)
1917
```
2018

21-
Key features:
22-
- Applies a linear transformation followed by a sigmoid gate
23-
- Selectively filters input data based on learned weights
24-
- Helps control information flow through the network
19+
**Key Features:**
20+
* 🔄 Applies linear transformation with sigmoid gate
21+
* 🎛️ Selectively filters input data
22+
* 🔍 Controls information flow through the network
2523

26-
### 2. GatedResidualNetwork
24+
### 2. 🏗️ GatedResidualNetwork
2725

28-
The `GatedResidualNetwork` combines gated linear units with residual connections:
26+
Combines gated units with residual connections:
2927

3028
```python
3129
grn = GatedResidualNetwork(units=64, dropout_rate=0.2)
3230
x = tf.random.normal((32, 100))
3331
y = grn(x)
3432
```
3533

36-
Key features:
37-
- Uses ELU activation for non-linearity
38-
- Includes dropout for regularization
39-
- Adds residual connections to help with gradient flow
40-
- Applies layer normalization for stability
34+
**Key Features:**
35+
* Uses ELU activation for non-linearity
36+
* 🎲 Includes dropout for regularization
37+
* 🔄 Adds residual connections for better gradient flow
38+
* 📊 Applies layer normalization for stability
4139

42-
### 3. VariableSelection
40+
### 3. 🎯 VariableSelection
4341

44-
The `VariableSelection` layer is the main feature selection component:
42+
The main feature selection component:
4543

4644
```python
4745
vs = VariableSelection(nr_features=3, units=64, dropout_rate=0.2)
@@ -51,17 +49,17 @@ x3 = tf.random.normal((32, 300))
5149
selected_features, weights = vs([x1, x2, x3])
5250
```
5351

54-
Key features:
55-
- Processes each feature independently using GRNs
56-
- Calculates feature importance weights using softmax
57-
- Returns both selected features and their weights
58-
- Supports different input dimensions for each feature
52+
**Key Features:**
53+
* 🔄 Independent GRN processing for each feature
54+
* ⚖️ Calculates feature importance weights via softmax
55+
* 📊 Returns both selected features and their weights
56+
* 🔧 Supports varying input dimensions per feature
5957

60-
## Usage in Preprocessing Model
58+
## 💻 Usage Guide
6159

6260
### Configuration
6361

64-
Configure feature selection in your preprocessing model:
62+
Set up feature selection in your preprocessing model:
6563

6664
```python
6765
model = PreprocessingModel(
@@ -72,18 +70,20 @@ model = PreprocessingModel(
7270
)
7371
```
7472

75-
### Placement Options
73+
### 🎯 Placement Options
7674

77-
The `FeatureSelectionPlacementOptions` enum provides several options for where to apply feature selection:
75+
Choose where to apply feature selection using `FeatureSelectionPlacementOptions`:
7876

79-
1. `NONE`: Disable feature selection
80-
2. `NUMERIC`: Apply only to numeric features
81-
3. `CATEGORICAL`: Apply only to categorical features
82-
4. `ALL_FEATURES`: Apply to all features
77+
| Option | Description |
78+
|--------|-------------|
79+
| `NONE` | Disable feature selection |
80+
| `NUMERIC` | Apply to numeric features only |
81+
| `CATEGORICAL` | Apply to categorical features only |
82+
| `ALL_FEATURES` | Apply to all features |
8383

84-
### Accessing Feature Weights
84+
### 📊 Accessing Feature Weights
8585

86-
After processing, feature weights are available in the `processed_features` dictionary:
86+
Monitor feature importance after processing:
8787

8888
```python
8989
# Process your data
@@ -92,25 +92,51 @@ processed = model.transform(data)
9292
# Access feature weights
9393
numeric_weights = processed["numeric_feature_weights"]
9494
categorical_weights = processed["categorical_feature_weights"]
95+
96+
# Print feature importance
97+
for feature_name in features:
98+
weights = processed_data[f"{feature_name}_weights"]
99+
print(f"Feature {feature_name} importance: {weights.mean()}")
95100
```
96101

97-
## Benefits
102+
## 🌟 Benefits
103+
104+
1. **🤖 Automatic Feature Selection**
105+
* Learns feature importance automatically
106+
* Adapts to your specific dataset
107+
* Reduces manual feature engineering
98108

99-
1. **Automatic Feature Selection**: The model learns which features are most important for your task.
100-
2. **Interpretability**: Feature weights provide insights into feature importance.
101-
3. **Improved Performance**: By focusing on relevant features, the model can achieve better performance.
102-
4. **Regularization**: Dropout and residual connections help prevent overfitting.
103-
5. **Flexibility**: Can be applied to different feature types and combinations.
109+
2. **📊 Interpretability**
110+
* Clear feature importance weights
111+
* Insights into model decisions
112+
* Easy to explain to stakeholders
104113

105-
## Integration with Other Features
114+
3. **⚡ Improved Performance**
115+
* Focuses on relevant features
116+
* Reduces noise in the data
117+
* Better model convergence
106118

107-
The feature selection mechanism integrates seamlessly with other preprocessing components:
119+
## 🔧 Best Practices
108120

109-
1. **Transformer Blocks**: Can be used before or after transformer blocks
110-
2. **Tabular Attention**: Complements tabular attention by focusing on important features
111-
3. **Custom Preprocessors**: Works with any custom preprocessing steps
121+
### Hyperparameter Tuning
112122

113-
## Example
123+
* 🎯 Start with default values
124+
* 📈 Adjust based on validation performance
125+
* 🔄 Monitor feature importance stability
126+
127+
### Performance Optimization
128+
129+
* ⚡ Use appropriate batch sizes
130+
* 🎲 Adjust dropout rates as needed
131+
* 📊 Monitor memory usage
132+
133+
## 📚 References
134+
135+
* [GRVSN Paper](https://arxiv.org/abs/xxxx.xxxxx)
136+
* [Feature Selection in Deep Learning](https://arxiv.org/abs/xxxx.xxxxx)
137+
* [KDP Documentation](https://kdp.readthedocs.io)
138+
139+
## 📚 Example
114140

115141
Here's a complete example of using feature selection:
116142

@@ -153,7 +179,7 @@ for feature_name in features:
153179
print(f"Feature {feature_name} importance: {weights.mean()}")
154180
```
155181

156-
## Testing
182+
## 📊 Testing
157183

158184
The feature selection components include comprehensive unit tests that verify:
159185

@@ -167,4 +193,3 @@ The feature selection components include comprehensive unit tests that verify:
167193
Run the tests using:
168194
```bash
169195
python -m pytest test/test_feature_selection.py -v
170-
```

docs/tabular_attention.md

Lines changed: 49 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,26 @@
1-
# Tabular Attention in KDP
1+
# 🎯 Tabular Attention in KDP
22

3-
The KDP package includes powerful attention mechanisms for tabular data:
4-
1. Standard TabularAttention for uniform feature processing
5-
2. MultiResolutionTabularAttention for type-specific feature processing
3+
## 📚 Overview
64

7-
## Overview
5+
KDP includes powerful attention mechanisms for tabular data processing:
86

9-
### Standard TabularAttention
7+
1. 🔄 **Standard TabularAttention**: Uniform feature processing
8+
2. 🎛️ **MultiResolutionTabularAttention**: Type-specific feature processing
9+
10+
### 🔄 Standard TabularAttention
1011
The TabularAttention layer applies attention uniformly across all features, capturing:
11-
- Dependencies between features for each sample
12-
- Dependencies between samples for each feature
1312

14-
### MultiResolutionTabularAttention
15-
The MultiResolutionTabularAttention layer implements a hierarchical attention mechanism that processes different feature types appropriately:
16-
1. **Numerical Features**: Full-resolution attention that preserves precise numerical relationships
17-
2. **Categorical Features**: Embedding-based attention that captures categorical patterns
18-
3. **Cross-Feature Attention**: Hierarchical attention between numerical and categorical features
13+
* 🔗 Dependencies between features for each sample
14+
* 📊 Dependencies between samples for each feature
15+
16+
### 🎛️ MultiResolutionTabularAttention
17+
The MultiResolutionTabularAttention implements a hierarchical attention mechanism:
18+
19+
* 📈 **Numerical Features**: Full-resolution attention preserving precise numerical relationships
20+
* 🏷️ **Categorical Features**: Embedding-based attention capturing categorical patterns
21+
* 🔄 **Cross-Feature Attention**: Hierarchical attention between numerical and categorical features
1922

20-
## Usage
23+
## 💻 Usage Examples
2124

2225
### Standard TabularAttention
2326

@@ -72,49 +75,53 @@ model = PreprocessingModel(
7275

7376
![Multi-Resolution TabularAttention](imgs/attention_example_multi_resolution.png)
7477

75-
## Configuration Options
78+
## ⚙️ Configuration Options
7679

77-
### Common Options
78-
- `tabular_attention` (bool): Enable/disable attention mechanisms
79-
- `tabular_attention_heads` (int): Number of attention heads
80-
- `tabular_attention_dim` (int): Dimension of the attention model
81-
- `tabular_attention_dropout` (float): Dropout rate for regularization
80+
### Core Parameters
8281

83-
### Placement Options
84-
- `tabular_attention_placement` (str):
85-
- `ALL_FEATURES`: Apply uniform attention to all features
86-
- `NUMERIC`: Apply only to numeric features
87-
- `CATEGORICAL`: Apply only to categorical features
88-
- `MULTI_RESOLUTION`: Use type-specific attention mechanisms
89-
- `NONE`: Disable attention
82+
| Parameter | Type | Description |
83+
|-----------|------|-------------|
84+
| `tabular_attention` | bool | Enable/disable attention mechanisms |
85+
| `tabular_attention_heads` | int | Number of attention heads |
86+
| `tabular_attention_dim` | int | Dimension of the attention model |
87+
| `tabular_attention_dropout` | float | Dropout rate for regularization |
9088

91-
### Multi-Resolution Specific Options
92-
- `tabular_attention_embedding_dim` (int): Dimension for categorical embeddings in multi-resolution mode
89+
### 🎯 Placement Options
90+
Choose where to apply attention using `tabular_attention_placement`:
9391

94-
## How It Works
92+
* `ALL_FEATURES`: Apply uniform attention to all features
93+
* `NUMERIC`: Apply only to numeric features
94+
* `CATEGORICAL`: Apply only to categorical features
95+
* `MULTI_RESOLUTION`: Use type-specific attention mechanisms
96+
* `NONE`: Disable attention
9597

96-
### Standard TabularAttention
97-
1. **Self-Attention**: Applied uniformly across all features
98-
2. **Layer Normalization**: Stabilizes learning
99-
3. **Feed-forward Network**: Processes attention outputs
98+
### 🎛️ Multi-Resolution Settings
99+
* `tabular_attention_embedding_dim`: Dimension for categorical embeddings in multi-resolution mode
100+
101+
## 🔍 How It Works
102+
103+
### Standard TabularAttention Architecture
104+
1. 🔄 **Self-Attention**: Applied uniformly across all features
105+
2. 📊 **Layer Normalization**: Stabilizes learning
106+
3. 🧮 **Feed-forward Network**: Processes attention outputs
100107

101-
### MultiResolutionTabularAttention
102-
1. **Numerical Processing**:
108+
### MultiResolutionTabularAttention Architecture
109+
1. 📈 **Numerical Processing**:
103110
- Full-resolution self-attention
104111
- Preserves numerical precision
105112
- Captures complex numerical relationships
106113

107-
2. **Categorical Processing**:
114+
2. 🏷️ **Categorical Processing**:
108115
- Embedding-based attention
109116
- Lower-dimensional representations
110117
- Captures categorical patterns efficiently
111118

112-
3. **Cross-Feature Integration**:
119+
3. 🔄 **Cross-Feature Integration**:
113120
- Hierarchical attention between feature types
114121
- Numerical features attend to categorical features
115122
- Preserves type-specific characteristics while enabling interaction
116123

117-
## Best Practices
124+
## 📈 Best Practices
118125

119126
### When to Use Standard TabularAttention
120127
- Data has uniform feature importance
@@ -143,7 +150,7 @@ model = PreprocessingModel(
143150
- Increase if overfitting
144151
- Monitor validation performance
145152

146-
## Advanced Usage
153+
## 🤖 Advanced Usage
147154

148155
### Custom Layer Integration
149156

@@ -186,7 +193,7 @@ attention_layer = PreprocessorLayerFactory.multi_resolution_attention_layer(
186193
)
187194
```
188195

189-
## Performance Considerations
196+
## 📊 Performance Considerations
190197

191198
1. **Memory Usage**:
192199
- MultiResolutionTabularAttention is more memory-efficient for categorical features
@@ -203,7 +210,7 @@ attention_layer = PreprocessorLayerFactory.multi_resolution_attention_layer(
203210
- Monitor memory usage and training time
204211
- Use gradient clipping to stabilize training
205212

206-
## References
213+
## 📚 References
207214

208215
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - Original transformer paper
209216
- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) - Attention for tabular data

mkdocs.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,10 @@ nav:
3939
- 🛠️ Defining Features: features.md
4040
- 🏭 Layers Factory: layers_factory.md
4141
- 📦 Integrating Preprocessing Model: integrations.md
42-
- 🤖 TransformerBlocks: transformer_blocks.md
43-
- 🎯 TabularAttention: tabular_attention.md
42+
- 🔌 Additional Model Extentions:
43+
- 🤖 TransformerBlocks: transformer_blocks.md
44+
- 🎯 TabularAttention: tabular_attention.md
45+
- 🔂 Features Selection: feature_selection.md
4446
- 🍦 Motivation: motivation.md
4547
- 🍻 Contributing: contributing.md
4648

0 commit comments

Comments
 (0)