|
| 1 | +# Time Series Inference Guide |
| 2 | + |
| 3 | +This guide explains how to properly use time series features for inference in keras-data-processor, including handling the unique requirements and challenges they present. |
| 4 | + |
| 5 | +## Understanding Time Series Inference Requirements |
| 6 | + |
| 7 | +Time series features have special requirements that differ from other feature types: |
| 8 | + |
| 9 | +1. **Historical Context**: Unlike standard features which can operate on single data points, time series features require historical context to compute transformations like lags, moving averages, etc. |
| 10 | + |
| 11 | +2. **Temporal Ordering**: Data must be properly ordered chronologically for time series features to work correctly. |
| 12 | + |
| 13 | +3. **Group Integrity**: When using group-based time series (like store-level sales), the data for each group must maintain its integrity. |
| 14 | + |
| 15 | +4. **Minimum History Length**: Each transformation requires a specific minimum history length: |
| 16 | + - Lag features need at least `max(lags)` historical points |
| 17 | + - Rolling windows need at least `window_size` historical points |
| 18 | + - Differencing needs at least `order` historical points |
| 19 | + |
| 20 | +## The TimeSeriesInferenceFormatter |
| 21 | + |
| 22 | +The `TimeSeriesInferenceFormatter` class helps bridge the gap between raw time series data and the format required by the preprocessor during inference. It: |
| 23 | + |
| 24 | +1. **Analyzes Requirements**: Examines your preprocessor to determine the exact requirements for each time series feature |
| 25 | +2. **Validates Data**: Checks if your inference data meets these requirements |
| 26 | +3. **Formats Data**: Combines historical and new data, sorts by time and group |
| 27 | +4. **Converts to Tensors**: Automatically converts the data to TensorFlow tensors for prediction |
| 28 | + |
| 29 | +### Basic Usage |
| 30 | + |
| 31 | +```python |
| 32 | +from kdp.time_series.inference import TimeSeriesInferenceFormatter |
| 33 | + |
| 34 | +# Create a formatter with your trained preprocessor |
| 35 | +formatter = TimeSeriesInferenceFormatter(preprocessor) |
| 36 | + |
| 37 | +# Get human-readable description of requirements |
| 38 | +print(formatter.describe_requirements()) |
| 39 | + |
| 40 | +# Prepare data for inference |
| 41 | +formatted_data = formatter.prepare_inference_data( |
| 42 | + data=new_data, # The data point(s) to predict |
| 43 | + historical_data=historical_df, # Historical context for time series features |
| 44 | + to_tensors=True # Convert output to TensorFlow tensors |
| 45 | +) |
| 46 | + |
| 47 | +# Make a prediction |
| 48 | +prediction = preprocessor.predict(formatted_data) |
| 49 | +``` |
| 50 | + |
| 51 | +### Understanding Requirements |
| 52 | + |
| 53 | +To understand what your model needs for inference: |
| 54 | + |
| 55 | +```python |
| 56 | +# Check if the preprocessor has time series features |
| 57 | +has_ts_features = formatter.is_time_series_preprocessor() |
| 58 | + |
| 59 | +# Get detailed requirements |
| 60 | +requirements = formatter.min_history_requirements |
| 61 | + |
| 62 | +# For each time series feature |
| 63 | +for feature, reqs in requirements.items(): |
| 64 | + print(f"Feature: {feature}") |
| 65 | + print(f" Minimum history: {reqs['min_history']} data points") |
| 66 | + print(f" Sort by: {reqs['sort_by']}") |
| 67 | + print(f" Group by: {reqs['group_by']}") |
| 68 | +``` |
| 69 | + |
| 70 | +### Common Inference Scenarios |
| 71 | + |
| 72 | +#### Single-Point Inference (Will Fail) |
| 73 | + |
| 74 | +This will fail for time series features because they need historical context: |
| 75 | + |
| 76 | +```python |
| 77 | +single_point = { |
| 78 | + "date": "2023-02-01", |
| 79 | + "store_id": "Store_A", |
| 80 | + "sales": np.nan, # What we want to predict |
| 81 | +} |
| 82 | + |
| 83 | +# This will raise a ValueError about insufficient history |
| 84 | +formatter.prepare_inference_data(single_point) |
| 85 | +``` |
| 86 | + |
| 87 | +#### Inference with Historical Context |
| 88 | + |
| 89 | +```python |
| 90 | +# Historical data (past 14 days) |
| 91 | +historical_data = df.loc[df["date"] >= (prediction_date - pd.Timedelta(days=14))] |
| 92 | + |
| 93 | +# New point to predict |
| 94 | +new_point = { |
| 95 | + "date": prediction_date.strftime("%Y-%m-%d"), |
| 96 | + "store_id": "Store_A", |
| 97 | + "sales": np.nan, # What we want to predict |
| 98 | +} |
| 99 | + |
| 100 | +# Prepare the data with historical context |
| 101 | +formatted_data = formatter.prepare_inference_data( |
| 102 | + new_point, |
| 103 | + historical_data, |
| 104 | + to_tensors=True |
| 105 | +) |
| 106 | + |
| 107 | +# Make prediction |
| 108 | +prediction = preprocessor.predict(formatted_data) |
| 109 | +``` |
| 110 | + |
| 111 | +#### Multi-Step Forecasting |
| 112 | + |
| 113 | +For multi-step forecasting, you need to: |
| 114 | +1. Make the first prediction |
| 115 | +2. Add that prediction to the history |
| 116 | +3. Move forward and repeat |
| 117 | + |
| 118 | +```python |
| 119 | +# Start with historical data |
| 120 | +history = historical_df.copy() |
| 121 | +forecasts = [] |
| 122 | + |
| 123 | +# Generate 7-day forecast |
| 124 | +for i in range(7): |
| 125 | + # Calculate the next date to predict |
| 126 | + next_date = (pd.to_datetime(history["date"].iloc[-1]) + |
| 127 | + pd.Timedelta(days=1)).strftime("%Y-%m-%d") |
| 128 | + |
| 129 | + # Create the next point to predict |
| 130 | + next_point = { |
| 131 | + "date": next_date, |
| 132 | + "store_id": "Store_A", |
| 133 | + "sales": np.nan, # To be predicted |
| 134 | + } |
| 135 | + |
| 136 | + # Format data for prediction |
| 137 | + formatted_data = formatter.format_for_incremental_prediction( |
| 138 | + history, |
| 139 | + next_point, |
| 140 | + to_tensors=True |
| 141 | + ) |
| 142 | + |
| 143 | + # Make prediction |
| 144 | + prediction = preprocessor.predict(formatted_data) |
| 145 | + predicted_value = prediction["sales"][-1].numpy() |
| 146 | + |
| 147 | + # Record the forecast |
| 148 | + forecasts.append({ |
| 149 | + "date": next_date, |
| 150 | + "store_id": "Store_A", |
| 151 | + "sales": predicted_value |
| 152 | + }) |
| 153 | + |
| 154 | + # Add prediction to history for next step |
| 155 | + history = pd.concat([ |
| 156 | + history, |
| 157 | + pd.DataFrame([{"date": next_date, "store_id": "Store_A", "sales": predicted_value}]) |
| 158 | + ], ignore_index=True) |
| 159 | +``` |
| 160 | + |
| 161 | +## Best Practices for Time Series Inference |
| 162 | + |
| 163 | +1. **Provide Ample History**: Always provide more history than the minimum required - this improves prediction quality. |
| 164 | + |
| 165 | +2. **Maintain Data Format**: Keep the same data format between training and inference: |
| 166 | + - Same column names and types |
| 167 | + - Same temporal granularity (daily, hourly, etc.) |
| 168 | + - Same grouping structure |
| 169 | + |
| 170 | +3. **Handle Edge Cases**: |
| 171 | + - New groups that weren't in training data |
| 172 | + - Gaps in historical data |
| 173 | + - Irregularly sampled time series |
| 174 | + |
| 175 | +4. **Use the Formatter Methods**: |
| 176 | + - `describe_requirements()` to understand what's needed |
| 177 | + - `prepare_inference_data()` for one-off predictions |
| 178 | + - `format_for_incremental_prediction()` for step-by-step forecasting |
| 179 | + |
| 180 | +## Troubleshooting |
| 181 | + |
| 182 | +Common errors and their solutions: |
| 183 | + |
| 184 | +### "Feature requires historical context" |
| 185 | +- **Problem**: You're trying to use a single data point with time series features |
| 186 | +- **Solution**: Provide historical data as context |
| 187 | + |
| 188 | +### "Requires at least X data points" |
| 189 | +- **Problem**: You don't have enough history for the time series transformations |
| 190 | +- **Solution**: Provide more historical points (at least the minimum required) |
| 191 | + |
| 192 | +### "Requires grouping by X" |
| 193 | +- **Problem**: Missing the column used for grouping in time series features |
| 194 | +- **Solution**: Ensure your data includes all required grouping columns |
| 195 | + |
| 196 | +### "Requires sorting by X" |
| 197 | +- **Problem**: Missing the column used for sorting (usually a date/time column) |
| 198 | +- **Solution**: Ensure your data includes all required sorting columns |
| 199 | + |
| 200 | +## Advanced Usage |
| 201 | + |
| 202 | +For more complex scenarios, the formatter provides additional options: |
| 203 | + |
| 204 | +```python |
| 205 | +# When you need more control over data preparation |
| 206 | +formatted_data = formatter.prepare_inference_data( |
| 207 | + data=new_data, |
| 208 | + historical_data=historical_data, |
| 209 | + fill_missing=True, # Try to fill missing values or context |
| 210 | + to_tensors=False # Keep as Python/NumPy types for inspection |
| 211 | +) |
| 212 | + |
| 213 | +# Manual control of tensor conversion |
| 214 | +tf_data = formatter._convert_to_tensors(formatted_data) |
| 215 | + |
| 216 | +# Getting generated multi-step forecast |
| 217 | +forecast_df = formatter.generate_multi_step_forecast( |
| 218 | + history=historical_data, |
| 219 | + future_dates=future_dates_list, |
| 220 | + group_id="Store_A", |
| 221 | + steps=7 # Generate 7 steps ahead |
| 222 | +) |
| 223 | +``` |
| 224 | + |
| 225 | +## Example Code |
| 226 | + |
| 227 | +See the full examples in: |
| 228 | +- `examples/time_series_inference_simple.py` for a simplified example |
| 229 | +- `examples/time_series_inference.py` for a complete example with model prediction |
0 commit comments