|
| 1 | +# Tabular Attention in KDP |
| 2 | + |
| 3 | +The KDP package includes powerful attention mechanisms for tabular data: |
| 4 | +1. Standard TabularAttention for uniform feature processing |
| 5 | +2. MultiResolutionTabularAttention for type-specific feature processing |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +### Standard TabularAttention |
| 10 | +The TabularAttention layer applies attention uniformly across all features, capturing: |
| 11 | +- Dependencies between features for each sample |
| 12 | +- Dependencies between samples for each feature |
| 13 | + |
| 14 | +### MultiResolutionTabularAttention |
| 15 | +The MultiResolutionTabularAttention layer implements a hierarchical attention mechanism that processes different feature types appropriately: |
| 16 | +1. **Numerical Features**: Full-resolution attention that preserves precise numerical relationships |
| 17 | +2. **Categorical Features**: Embedding-based attention that captures categorical patterns |
| 18 | +3. **Cross-Feature Attention**: Hierarchical attention between numerical and categorical features |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +### Standard TabularAttention |
| 23 | + |
| 24 | +```python |
| 25 | +from kdp.processor import PreprocessingModel, TabularAttentionPlacementOptions |
| 26 | + |
| 27 | +model = PreprocessingModel( |
| 28 | + # ... other parameters ... |
| 29 | + tabular_attention=True, |
| 30 | + tabular_attention_heads=4, |
| 31 | + tabular_attention_dim=64, |
| 32 | + tabular_attention_dropout=0.1, |
| 33 | + tabular_attention_placement=TabularAttentionPlacementOptions.ALL_FEATURES.value, |
| 34 | +) |
| 35 | +``` |
| 36 | + |
| 37 | +### Multi-Resolution TabularAttention |
| 38 | + |
| 39 | +```python |
| 40 | +from kdp.processor import PreprocessingModel, TabularAttentionPlacementOptions |
| 41 | + |
| 42 | +model = PreprocessingModel( |
| 43 | + # ... other parameters ... |
| 44 | + tabular_attention=True, |
| 45 | + tabular_attention_heads=4, |
| 46 | + tabular_attention_dim=64, |
| 47 | + tabular_attention_dropout=0.1, |
| 48 | + tabular_attention_embedding_dim=32, # Dimension for categorical embeddings |
| 49 | + tabular_attention_placement=TabularAttentionPlacementOptions.MULTI_RESOLUTION.value, |
| 50 | +) |
| 51 | +``` |
| 52 | + |
| 53 | +## Configuration Options |
| 54 | + |
| 55 | +### Common Options |
| 56 | +- `tabular_attention` (bool): Enable/disable attention mechanisms |
| 57 | +- `tabular_attention_heads` (int): Number of attention heads |
| 58 | +- `tabular_attention_dim` (int): Dimension of the attention model |
| 59 | +- `tabular_attention_dropout` (float): Dropout rate for regularization |
| 60 | + |
| 61 | +### Placement Options |
| 62 | +- `tabular_attention_placement` (str): |
| 63 | + - `ALL_FEATURES`: Apply uniform attention to all features |
| 64 | + - `NUMERIC`: Apply only to numeric features |
| 65 | + - `CATEGORICAL`: Apply only to categorical features |
| 66 | + - `MULTI_RESOLUTION`: Use type-specific attention mechanisms |
| 67 | + - `NONE`: Disable attention |
| 68 | + |
| 69 | +### Multi-Resolution Specific Options |
| 70 | +- `tabular_attention_embedding_dim` (int): Dimension for categorical embeddings in multi-resolution mode |
| 71 | + |
| 72 | +## How It Works |
| 73 | + |
| 74 | +### Standard TabularAttention |
| 75 | +1. **Self-Attention**: Applied uniformly across all features |
| 76 | +2. **Layer Normalization**: Stabilizes learning |
| 77 | +3. **Feed-forward Network**: Processes attention outputs |
| 78 | + |
| 79 | +### MultiResolutionTabularAttention |
| 80 | +1. **Numerical Processing**: |
| 81 | + - Full-resolution self-attention |
| 82 | + - Preserves numerical precision |
| 83 | + - Captures complex numerical relationships |
| 84 | + |
| 85 | +2. **Categorical Processing**: |
| 86 | + - Embedding-based attention |
| 87 | + - Lower-dimensional representations |
| 88 | + - Captures categorical patterns efficiently |
| 89 | + |
| 90 | +3. **Cross-Feature Integration**: |
| 91 | + - Hierarchical attention between feature types |
| 92 | + - Numerical features attend to categorical features |
| 93 | + - Preserves type-specific characteristics while enabling interaction |
| 94 | + |
| 95 | +## Best Practices |
| 96 | + |
| 97 | +### When to Use Standard TabularAttention |
| 98 | +- Data has uniform feature importance |
| 99 | +- Features are of similar scales |
| 100 | +- Memory usage is a concern |
| 101 | + |
| 102 | +### When to Use MultiResolutionTabularAttention |
| 103 | +- Mixed numerical and categorical features |
| 104 | +- Different feature types have different importance |
| 105 | +- Need to preserve type-specific characteristics |
| 106 | +- Complex interactions between feature types |
| 107 | + |
| 108 | +### Configuration Tips |
| 109 | +1. **Attention Heads**: |
| 110 | + - Start with 4-8 heads |
| 111 | + - Increase for complex relationships |
| 112 | + - Monitor computational cost |
| 113 | + |
| 114 | +2. **Dimensions**: |
| 115 | + - `tabular_attention_dim`: Based on feature complexity |
| 116 | + - `tabular_attention_embedding_dim`: Usually smaller than main dimension |
| 117 | + - Balance between expressiveness and efficiency |
| 118 | + |
| 119 | +3. **Dropout**: |
| 120 | + - Start with 0.1 |
| 121 | + - Increase if overfitting |
| 122 | + - Monitor validation performance |
| 123 | + |
| 124 | +## Advanced Usage |
| 125 | + |
| 126 | +### Custom Layer Integration |
| 127 | + |
| 128 | +```python |
| 129 | +from kdp.custom_layers import MultiResolutionTabularAttention |
| 130 | +import tensorflow as tf |
| 131 | + |
| 132 | +# Create custom model with multi-resolution attention |
| 133 | +numerical_inputs = tf.keras.Input(shape=(num_numerical, numerical_dim)) |
| 134 | +categorical_inputs = tf.keras.Input(shape=(num_categorical, categorical_dim)) |
| 135 | + |
| 136 | +attention_layer = MultiResolutionTabularAttention( |
| 137 | + num_heads=4, |
| 138 | + d_model=64, |
| 139 | + embedding_dim=32, |
| 140 | + dropout_rate=0.1 |
| 141 | +) |
| 142 | + |
| 143 | +num_attended, cat_attended = attention_layer(numerical_inputs, categorical_inputs) |
| 144 | +combined = tf.keras.layers.Concatenate(axis=1)([num_attended, cat_attended]) |
| 145 | +outputs = tf.keras.layers.Dense(1)(combined) |
| 146 | + |
| 147 | +model = tf.keras.Model( |
| 148 | + inputs=[numerical_inputs, categorical_inputs], |
| 149 | + outputs=outputs |
| 150 | +) |
| 151 | +``` |
| 152 | + |
| 153 | +### Layer Factory Usage |
| 154 | + |
| 155 | +```python |
| 156 | +from kdp.layers_factory import PreprocessorLayerFactory |
| 157 | + |
| 158 | +attention_layer = PreprocessorLayerFactory.multi_resolution_attention_layer( |
| 159 | + num_heads=4, |
| 160 | + d_model=64, |
| 161 | + embedding_dim=32, |
| 162 | + dropout_rate=0.1, |
| 163 | + name="custom_multi_attention" |
| 164 | +) |
| 165 | +``` |
| 166 | + |
| 167 | +## Performance Considerations |
| 168 | + |
| 169 | +1. **Memory Usage**: |
| 170 | + - MultiResolutionTabularAttention is more memory-efficient for categorical features |
| 171 | + - Uses lower-dimensional embeddings for categorical data |
| 172 | + - Consider batch size when using multiple attention heads |
| 173 | + |
| 174 | +2. **Computational Cost**: |
| 175 | + - Standard TabularAttention: O(n²) for n features |
| 176 | + - MultiResolutionTabularAttention: O(n_num² + n_cat²) for numerical and categorical features |
| 177 | + - Balance between resolution and performance |
| 178 | + |
| 179 | +3. **Training Tips**: |
| 180 | + - Start with smaller dimensions and increase if needed |
| 181 | + - Monitor memory usage and training time |
| 182 | + - Use gradient clipping to stabilize training |
| 183 | + |
| 184 | +## References |
| 185 | + |
| 186 | +- [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - Original transformer paper |
| 187 | +- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) - Attention for tabular data |
| 188 | +- [Heterogeneous Graph Attention Network](https://arxiv.org/abs/1903.07293) - Multi-type attention mechanisms |
0 commit comments