UnicoLab
diff --git a/‎docs/features/time_series_features.md‎
Lines changed: 129 additions & 0 deletions b/‎docs/features/time_series_features.md‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎docs/time_series_inference.md‎
Lines changed: 229 additions & 0 deletions b/‎docs/time_series_inference.md‎
Lines changed: 229 additions & 0 deletions
@@ -933,6 +933,135 @@ sensor_feature = TimeSeriesFeature(
   </a>
 </div>
 
+## 🔍 Inference with Time Series Features
+
+<div class="inference-section">
+  <p>Time series preprocessing requires special consideration during inference. Unlike static features, time series transformations depend on historical data and context.</p>
+
+  <h3>Minimal Requirements for Inference</h3>
+
+  <div class="table-container">
+    <table class="inference-table">
+      <thead>
+        <tr>
+          <th>Transformation</th>
+          <th>Minimum Data Required</th>
+          <th>Notes</th>
+        </tr>
+      </thead>
+      <tbody>
+        <tr>
+          <td><code>Lag Features</code></td>
+          <td>max(lags) previous time points</td>
+          <td>If largest lag is 14, you need 14 previous data points</td>
+        </tr>
+        <tr>
+          <td><code>Rolling Statistics</code></td>
+          <td>window_size previous points</td>
+          <td>For a 7-day window, you need 7 previous points</td>
+        </tr>
+        <tr>
+          <td><code>Differencing</code></td>
+          <td>order previous points</td>
+          <td>First-order differencing requires 1 previous point</td>
+        </tr>
+        <tr>
+          <td><code>Moving Averages</code></td>
+          <td>max(periods) previous points</td>
+          <td>For periods [7,14,28], you need 28 previous points</td>
+        </tr>
+        <tr>
+          <td><code>Wavelet Transform</code></td>
+          <td>2^levels previous points</td>
+          <td>For 3 levels, you need at least 8 previous points</td>
+        </tr>
+      </tbody>
+    </table>
+  </div>
+
+  <h3>Example: Single-Point Inference</h3>
+
+  <p>For single-point or incremental inference with time series features:</p>
+
+  <div class="code-container">
+
+```python
+# INCORRECT - Will fail with time series features
+single_point = {"date": "2023-06-01", "store_id": "Store_1", "sales": 150.0}
+prediction = model.predict(single_point)  # ❌ Missing historical context
+
+# CORRECT - Include historical context
+inference_data = {
+    "date": ["2023-05-25", "2023-05-26", ..., "2023-06-01"],  # Include history
+    "store_id": ["Store_1", "Store_1", ..., "Store_1"],       # Same group
+    "sales": [125.0, 130.0, ..., 150.0]                       # Historical values
+}
+prediction = model.predict(inference_data)  # ✅ Last row will have prediction
+```
+
+  </div>
+
+  <h3>Strategies for Ongoing Predictions</h3>
+
+  <p>For forecasting multiple steps into the future:</p>
+
+  <div class="code-container">
+
+```python
+# Multi-step forecasting with KDP
+import pandas as pd
+
+# 1. Start with historical data
+history_df = pd.DataFrame({
+    "date": pd.date_range("2023-01-01", "2023-05-31"),
+    "store_id": "Store_1",
+    "sales": historical_values  # Your historical data
+})
+
+# 2. Create future dates to predict
+future_dates = pd.date_range("2023-06-01", "2023-06-30")
+forecast_horizon = len(future_dates)
+
+# 3. Initialize with history
+working_df = history_df.copy()
+
+# 4. Iterative forecasting
+for i in range(forecast_horizon):
+    # Prepare next date to forecast
+    next_date = future_dates[i]
+    next_row = pd.DataFrame({
+        "date": [next_date],
+        "store_id": ["Store_1"],
+        "sales": [None]  # Unknown value we want to predict
+    })
+
+    # Add to working data
+    temp_df = pd.concat([working_df, next_row])
+
+    # Make prediction (returns all rows, take last one)
+    prediction = model.predict(temp_df).iloc[-1]["sales"]
+
+    # Update the working dataframe with the prediction
+    next_row["sales"] = prediction
+    working_df = pd.concat([working_df, next_row])
+
+# Final forecast is in the last forecast_horizon rows
+forecast = working_df.tail(forecast_horizon)
+```
+
+  </div>
+
+  <h3>Key Considerations for Inference</h3>
+
+  <ul>
+    <li><strong>Group Integrity</strong>: Maintain the same groups used during training</li>
+    <li><strong>Chronological Order</strong>: Ensure data is properly sorted by time</li>
+    <li><strong>Sufficient History</strong>: Provide enough history for each group</li>
+    <li><strong>Empty Fields</strong>: For auto-regressive forecasting, leave future values as None or NaN</li>
+    <li><strong>Overlapping Windows</strong>: For multi-step forecasts, consider whether predictions should feed back as inputs</li>
+  </ul>
+</div>
+
 <style>
 /* Base styling */
 body {
 
@@ -0,0 +1,229 @@
+# Time Series Inference Guide
+
+This guide explains how to properly use time series features for inference in keras-data-processor, including handling the unique requirements and challenges they present.
+
+## Understanding Time Series Inference Requirements
+
+Time series features have special requirements that differ from other feature types:
+
+1. **Historical Context**: Unlike standard features which can operate on single data points, time series features require historical context to compute transformations like lags, moving averages, etc.
+
+2. **Temporal Ordering**: Data must be properly ordered chronologically for time series features to work correctly.
+
+3. **Group Integrity**: When using group-based time series (like store-level sales), the data for each group must maintain its integrity.
+
+4. **Minimum History Length**: Each transformation requires a specific minimum history length:
+   - Lag features need at least `max(lags)` historical points
+   - Rolling windows need at least `window_size` historical points
+   - Differencing needs at least `order` historical points
+
+## The TimeSeriesInferenceFormatter
+
+The `TimeSeriesInferenceFormatter` class helps bridge the gap between raw time series data and the format required by the preprocessor during inference. It:
+
+1. **Analyzes Requirements**: Examines your preprocessor to determine the exact requirements for each time series feature
+2. **Validates Data**: Checks if your inference data meets these requirements
+3. **Formats Data**: Combines historical and new data, sorts by time and group
+4. **Converts to Tensors**: Automatically converts the data to TensorFlow tensors for prediction
+
+### Basic Usage
+
+```python
+from kdp.time_series.inference import TimeSeriesInferenceFormatter
+
+# Create a formatter with your trained preprocessor
+formatter = TimeSeriesInferenceFormatter(preprocessor)
+
+# Get human-readable description of requirements
+print(formatter.describe_requirements())
+
+# Prepare data for inference
+formatted_data = formatter.prepare_inference_data(
+    data=new_data,                 # The data point(s) to predict
+    historical_data=historical_df, # Historical context for time series features
+    to_tensors=True                # Convert output to TensorFlow tensors
+)
+
+# Make a prediction
+prediction = preprocessor.predict(formatted_data)
+```
+
+### Understanding Requirements
+
+To understand what your model needs for inference:
+
+```python
+# Check if the preprocessor has time series features
+has_ts_features = formatter.is_time_series_preprocessor()
+
+# Get detailed requirements
+requirements = formatter.min_history_requirements
+
+# For each time series feature
+for feature, reqs in requirements.items():
+    print(f"Feature: {feature}")
+    print(f"  Minimum history: {reqs['min_history']} data points")
+    print(f"  Sort by: {reqs['sort_by']}")
+    print(f"  Group by: {reqs['group_by']}")
+```
+
+### Common Inference Scenarios
+
+#### Single-Point Inference (Will Fail)
+
+This will fail for time series features because they need historical context:
+
+```python
+single_point = {
+    "date": "2023-02-01",
+    "store_id": "Store_A",
+    "sales": np.nan,  # What we want to predict
+}
+
+# This will raise a ValueError about insufficient history
+formatter.prepare_inference_data(single_point)
+```
+
+#### Inference with Historical Context
+
+```python
+# Historical data (past 14 days)
+historical_data = df.loc[df["date"] >= (prediction_date - pd.Timedelta(days=14))]
+
+# New point to predict
+new_point = {
+    "date": prediction_date.strftime("%Y-%m-%d"),
+    "store_id": "Store_A",
+    "sales": np.nan,  # What we want to predict
+}
+
+# Prepare the data with historical context
+formatted_data = formatter.prepare_inference_data(
+    new_point,
+    historical_data,
+    to_tensors=True
+)
+
+# Make prediction
+prediction = preprocessor.predict(formatted_data)
+```
+
+#### Multi-Step Forecasting
+
+For multi-step forecasting, you need to:
+1. Make the first prediction
+2. Add that prediction to the history
+3. Move forward and repeat
+
+```python
+# Start with historical data
+history = historical_df.copy()
+forecasts = []
+
+# Generate 7-day forecast
+for i in range(7):
+    # Calculate the next date to predict
+    next_date = (pd.to_datetime(history["date"].iloc[-1]) +
+                pd.Timedelta(days=1)).strftime("%Y-%m-%d")
+
+    # Create the next point to predict
+    next_point = {
+        "date": next_date,
+        "store_id": "Store_A",
+        "sales": np.nan,  # To be predicted
+    }
+
+    # Format data for prediction
+    formatted_data = formatter.format_for_incremental_prediction(
+        history,
+        next_point,
+        to_tensors=True
+    )
+
+    # Make prediction
+    prediction = preprocessor.predict(formatted_data)
+    predicted_value = prediction["sales"][-1].numpy()
+
+    # Record the forecast
+    forecasts.append({
+        "date": next_date,
+        "store_id": "Store_A",
+        "sales": predicted_value
+    })
+
+    # Add prediction to history for next step
+    history = pd.concat([
+        history,
+        pd.DataFrame([{"date": next_date, "store_id": "Store_A", "sales": predicted_value}])
+    ], ignore_index=True)
+```
+
+## Best Practices for Time Series Inference
+
+1. **Provide Ample History**: Always provide more history than the minimum required - this improves prediction quality.
+
+2. **Maintain Data Format**: Keep the same data format between training and inference:
+   - Same column names and types
+   - Same temporal granularity (daily, hourly, etc.)
+   - Same grouping structure
+
+3. **Handle Edge Cases**:
+   - New groups that weren't in training data
+   - Gaps in historical data
+   - Irregularly sampled time series
+
+4. **Use the Formatter Methods**:
+   - `describe_requirements()` to understand what's needed
+   - `prepare_inference_data()` for one-off predictions
+   - `format_for_incremental_prediction()` for step-by-step forecasting
+
+## Troubleshooting
+
+Common errors and their solutions:
+
+### "Feature requires historical context"
+- **Problem**: You're trying to use a single data point with time series features
+- **Solution**: Provide historical data as context
+
+### "Requires at least X data points"
+- **Problem**: You don't have enough history for the time series transformations
+- **Solution**: Provide more historical points (at least the minimum required)
+
+### "Requires grouping by X"
+- **Problem**: Missing the column used for grouping in time series features
+- **Solution**: Ensure your data includes all required grouping columns
+
+### "Requires sorting by X"
+- **Problem**: Missing the column used for sorting (usually a date/time column)
+- **Solution**: Ensure your data includes all required sorting columns
+
+## Advanced Usage
+
+For more complex scenarios, the formatter provides additional options:
+
+```python
+# When you need more control over data preparation
+formatted_data = formatter.prepare_inference_data(
+    data=new_data,
+    historical_data=historical_data,
+    fill_missing=True,   # Try to fill missing values or context
+    to_tensors=False     # Keep as Python/NumPy types for inspection
+)
+
+# Manual control of tensor conversion
+tf_data = formatter._convert_to_tensors(formatted_data)
+
+# Getting generated multi-step forecast
+forecast_df = formatter.generate_multi_step_forecast(
+    history=historical_data,
+    future_dates=future_dates_list,
+    group_id="Store_A",
+    steps=7  # Generate 7 steps ahead
+)
+```
+
+## Example Code
+
+See the full examples in:
+- `examples/time_series_inference_simple.py` for a simplified example
+- `examples/time_series_inference.py` for a complete example with model prediction