Skip to content

Commit c83cd48

Browse files
feat(KDP): adding timeseries inference formatter
1 parent e1f77f4 commit c83cd48

File tree

14 files changed

+2285
-0
lines changed

14 files changed

+2285
-0
lines changed

docs/features/time_series_features.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -933,6 +933,135 @@ sensor_feature = TimeSeriesFeature(
933933
</a>
934934
</div>
935935

936+
## 🔍 Inference with Time Series Features
937+
938+
<div class="inference-section">
939+
<p>Time series preprocessing requires special consideration during inference. Unlike static features, time series transformations depend on historical data and context.</p>
940+
941+
<h3>Minimal Requirements for Inference</h3>
942+
943+
<div class="table-container">
944+
<table class="inference-table">
945+
<thead>
946+
<tr>
947+
<th>Transformation</th>
948+
<th>Minimum Data Required</th>
949+
<th>Notes</th>
950+
</tr>
951+
</thead>
952+
<tbody>
953+
<tr>
954+
<td><code>Lag Features</code></td>
955+
<td>max(lags) previous time points</td>
956+
<td>If largest lag is 14, you need 14 previous data points</td>
957+
</tr>
958+
<tr>
959+
<td><code>Rolling Statistics</code></td>
960+
<td>window_size previous points</td>
961+
<td>For a 7-day window, you need 7 previous points</td>
962+
</tr>
963+
<tr>
964+
<td><code>Differencing</code></td>
965+
<td>order previous points</td>
966+
<td>First-order differencing requires 1 previous point</td>
967+
</tr>
968+
<tr>
969+
<td><code>Moving Averages</code></td>
970+
<td>max(periods) previous points</td>
971+
<td>For periods [7,14,28], you need 28 previous points</td>
972+
</tr>
973+
<tr>
974+
<td><code>Wavelet Transform</code></td>
975+
<td>2^levels previous points</td>
976+
<td>For 3 levels, you need at least 8 previous points</td>
977+
</tr>
978+
</tbody>
979+
</table>
980+
</div>
981+
982+
<h3>Example: Single-Point Inference</h3>
983+
984+
<p>For single-point or incremental inference with time series features:</p>
985+
986+
<div class="code-container">
987+
988+
```python
989+
# INCORRECT - Will fail with time series features
990+
single_point = {"date": "2023-06-01", "store_id": "Store_1", "sales": 150.0}
991+
prediction = model.predict(single_point) # ❌ Missing historical context
992+
993+
# CORRECT - Include historical context
994+
inference_data = {
995+
"date": ["2023-05-25", "2023-05-26", ..., "2023-06-01"], # Include history
996+
"store_id": ["Store_1", "Store_1", ..., "Store_1"], # Same group
997+
"sales": [125.0, 130.0, ..., 150.0] # Historical values
998+
}
999+
prediction = model.predict(inference_data) # ✅ Last row will have prediction
1000+
```
1001+
1002+
</div>
1003+
1004+
<h3>Strategies for Ongoing Predictions</h3>
1005+
1006+
<p>For forecasting multiple steps into the future:</p>
1007+
1008+
<div class="code-container">
1009+
1010+
```python
1011+
# Multi-step forecasting with KDP
1012+
import pandas as pd
1013+
1014+
# 1. Start with historical data
1015+
history_df = pd.DataFrame({
1016+
"date": pd.date_range("2023-01-01", "2023-05-31"),
1017+
"store_id": "Store_1",
1018+
"sales": historical_values # Your historical data
1019+
})
1020+
1021+
# 2. Create future dates to predict
1022+
future_dates = pd.date_range("2023-06-01", "2023-06-30")
1023+
forecast_horizon = len(future_dates)
1024+
1025+
# 3. Initialize with history
1026+
working_df = history_df.copy()
1027+
1028+
# 4. Iterative forecasting
1029+
for i in range(forecast_horizon):
1030+
# Prepare next date to forecast
1031+
next_date = future_dates[i]
1032+
next_row = pd.DataFrame({
1033+
"date": [next_date],
1034+
"store_id": ["Store_1"],
1035+
"sales": [None] # Unknown value we want to predict
1036+
})
1037+
1038+
# Add to working data
1039+
temp_df = pd.concat([working_df, next_row])
1040+
1041+
# Make prediction (returns all rows, take last one)
1042+
prediction = model.predict(temp_df).iloc[-1]["sales"]
1043+
1044+
# Update the working dataframe with the prediction
1045+
next_row["sales"] = prediction
1046+
working_df = pd.concat([working_df, next_row])
1047+
1048+
# Final forecast is in the last forecast_horizon rows
1049+
forecast = working_df.tail(forecast_horizon)
1050+
```
1051+
1052+
</div>
1053+
1054+
<h3>Key Considerations for Inference</h3>
1055+
1056+
<ul>
1057+
<li><strong>Group Integrity</strong>: Maintain the same groups used during training</li>
1058+
<li><strong>Chronological Order</strong>: Ensure data is properly sorted by time</li>
1059+
<li><strong>Sufficient History</strong>: Provide enough history for each group</li>
1060+
<li><strong>Empty Fields</strong>: For auto-regressive forecasting, leave future values as None or NaN</li>
1061+
<li><strong>Overlapping Windows</strong>: For multi-step forecasts, consider whether predictions should feed back as inputs</li>
1062+
</ul>
1063+
</div>
1064+
9361065
<style>
9371066
/* Base styling */
9381067
body {

docs/time_series_inference.md

Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Time Series Inference Guide
2+
3+
This guide explains how to properly use time series features for inference in keras-data-processor, including handling the unique requirements and challenges they present.
4+
5+
## Understanding Time Series Inference Requirements
6+
7+
Time series features have special requirements that differ from other feature types:
8+
9+
1. **Historical Context**: Unlike standard features which can operate on single data points, time series features require historical context to compute transformations like lags, moving averages, etc.
10+
11+
2. **Temporal Ordering**: Data must be properly ordered chronologically for time series features to work correctly.
12+
13+
3. **Group Integrity**: When using group-based time series (like store-level sales), the data for each group must maintain its integrity.
14+
15+
4. **Minimum History Length**: Each transformation requires a specific minimum history length:
16+
- Lag features need at least `max(lags)` historical points
17+
- Rolling windows need at least `window_size` historical points
18+
- Differencing needs at least `order` historical points
19+
20+
## The TimeSeriesInferenceFormatter
21+
22+
The `TimeSeriesInferenceFormatter` class helps bridge the gap between raw time series data and the format required by the preprocessor during inference. It:
23+
24+
1. **Analyzes Requirements**: Examines your preprocessor to determine the exact requirements for each time series feature
25+
2. **Validates Data**: Checks if your inference data meets these requirements
26+
3. **Formats Data**: Combines historical and new data, sorts by time and group
27+
4. **Converts to Tensors**: Automatically converts the data to TensorFlow tensors for prediction
28+
29+
### Basic Usage
30+
31+
```python
32+
from kdp.time_series.inference import TimeSeriesInferenceFormatter
33+
34+
# Create a formatter with your trained preprocessor
35+
formatter = TimeSeriesInferenceFormatter(preprocessor)
36+
37+
# Get human-readable description of requirements
38+
print(formatter.describe_requirements())
39+
40+
# Prepare data for inference
41+
formatted_data = formatter.prepare_inference_data(
42+
data=new_data, # The data point(s) to predict
43+
historical_data=historical_df, # Historical context for time series features
44+
to_tensors=True # Convert output to TensorFlow tensors
45+
)
46+
47+
# Make a prediction
48+
prediction = preprocessor.predict(formatted_data)
49+
```
50+
51+
### Understanding Requirements
52+
53+
To understand what your model needs for inference:
54+
55+
```python
56+
# Check if the preprocessor has time series features
57+
has_ts_features = formatter.is_time_series_preprocessor()
58+
59+
# Get detailed requirements
60+
requirements = formatter.min_history_requirements
61+
62+
# For each time series feature
63+
for feature, reqs in requirements.items():
64+
print(f"Feature: {feature}")
65+
print(f" Minimum history: {reqs['min_history']} data points")
66+
print(f" Sort by: {reqs['sort_by']}")
67+
print(f" Group by: {reqs['group_by']}")
68+
```
69+
70+
### Common Inference Scenarios
71+
72+
#### Single-Point Inference (Will Fail)
73+
74+
This will fail for time series features because they need historical context:
75+
76+
```python
77+
single_point = {
78+
"date": "2023-02-01",
79+
"store_id": "Store_A",
80+
"sales": np.nan, # What we want to predict
81+
}
82+
83+
# This will raise a ValueError about insufficient history
84+
formatter.prepare_inference_data(single_point)
85+
```
86+
87+
#### Inference with Historical Context
88+
89+
```python
90+
# Historical data (past 14 days)
91+
historical_data = df.loc[df["date"] >= (prediction_date - pd.Timedelta(days=14))]
92+
93+
# New point to predict
94+
new_point = {
95+
"date": prediction_date.strftime("%Y-%m-%d"),
96+
"store_id": "Store_A",
97+
"sales": np.nan, # What we want to predict
98+
}
99+
100+
# Prepare the data with historical context
101+
formatted_data = formatter.prepare_inference_data(
102+
new_point,
103+
historical_data,
104+
to_tensors=True
105+
)
106+
107+
# Make prediction
108+
prediction = preprocessor.predict(formatted_data)
109+
```
110+
111+
#### Multi-Step Forecasting
112+
113+
For multi-step forecasting, you need to:
114+
1. Make the first prediction
115+
2. Add that prediction to the history
116+
3. Move forward and repeat
117+
118+
```python
119+
# Start with historical data
120+
history = historical_df.copy()
121+
forecasts = []
122+
123+
# Generate 7-day forecast
124+
for i in range(7):
125+
# Calculate the next date to predict
126+
next_date = (pd.to_datetime(history["date"].iloc[-1]) +
127+
pd.Timedelta(days=1)).strftime("%Y-%m-%d")
128+
129+
# Create the next point to predict
130+
next_point = {
131+
"date": next_date,
132+
"store_id": "Store_A",
133+
"sales": np.nan, # To be predicted
134+
}
135+
136+
# Format data for prediction
137+
formatted_data = formatter.format_for_incremental_prediction(
138+
history,
139+
next_point,
140+
to_tensors=True
141+
)
142+
143+
# Make prediction
144+
prediction = preprocessor.predict(formatted_data)
145+
predicted_value = prediction["sales"][-1].numpy()
146+
147+
# Record the forecast
148+
forecasts.append({
149+
"date": next_date,
150+
"store_id": "Store_A",
151+
"sales": predicted_value
152+
})
153+
154+
# Add prediction to history for next step
155+
history = pd.concat([
156+
history,
157+
pd.DataFrame([{"date": next_date, "store_id": "Store_A", "sales": predicted_value}])
158+
], ignore_index=True)
159+
```
160+
161+
## Best Practices for Time Series Inference
162+
163+
1. **Provide Ample History**: Always provide more history than the minimum required - this improves prediction quality.
164+
165+
2. **Maintain Data Format**: Keep the same data format between training and inference:
166+
- Same column names and types
167+
- Same temporal granularity (daily, hourly, etc.)
168+
- Same grouping structure
169+
170+
3. **Handle Edge Cases**:
171+
- New groups that weren't in training data
172+
- Gaps in historical data
173+
- Irregularly sampled time series
174+
175+
4. **Use the Formatter Methods**:
176+
- `describe_requirements()` to understand what's needed
177+
- `prepare_inference_data()` for one-off predictions
178+
- `format_for_incremental_prediction()` for step-by-step forecasting
179+
180+
## Troubleshooting
181+
182+
Common errors and their solutions:
183+
184+
### "Feature requires historical context"
185+
- **Problem**: You're trying to use a single data point with time series features
186+
- **Solution**: Provide historical data as context
187+
188+
### "Requires at least X data points"
189+
- **Problem**: You don't have enough history for the time series transformations
190+
- **Solution**: Provide more historical points (at least the minimum required)
191+
192+
### "Requires grouping by X"
193+
- **Problem**: Missing the column used for grouping in time series features
194+
- **Solution**: Ensure your data includes all required grouping columns
195+
196+
### "Requires sorting by X"
197+
- **Problem**: Missing the column used for sorting (usually a date/time column)
198+
- **Solution**: Ensure your data includes all required sorting columns
199+
200+
## Advanced Usage
201+
202+
For more complex scenarios, the formatter provides additional options:
203+
204+
```python
205+
# When you need more control over data preparation
206+
formatted_data = formatter.prepare_inference_data(
207+
data=new_data,
208+
historical_data=historical_data,
209+
fill_missing=True, # Try to fill missing values or context
210+
to_tensors=False # Keep as Python/NumPy types for inspection
211+
)
212+
213+
# Manual control of tensor conversion
214+
tf_data = formatter._convert_to_tensors(formatted_data)
215+
216+
# Getting generated multi-step forecast
217+
forecast_df = formatter.generate_multi_step_forecast(
218+
history=historical_data,
219+
future_dates=future_dates_list,
220+
group_id="Store_A",
221+
steps=7 # Generate 7 steps ahead
222+
)
223+
```
224+
225+
## Example Code
226+
227+
See the full examples in:
228+
- `examples/time_series_inference_simple.py` for a simplified example
229+
- `examples/time_series_inference.py` for a complete example with model prediction

0 commit comments

Comments
 (0)