### Rolling Window Sequence
A `rolling window sequence` in time series is a method where a fixed-size "window" or subset of consecutive time steps moves ("rolls") sequentially over the data. At each position, the window captures a segment of the data (for example, the last 30 cycles of sensor readings), which can then be used as input for models or calculations. The window shifts forward by one or more time steps, always covering the same number of points, allowing for dynamic analysis that reflects recent context while preserving temporal ordering.

- Why do we generate rolling window sequences?

  - This step is essential for time-series modeling techniques (like LSTMs or GRUs) that require input data shaped as sequences of fixed length rather than individual time points.

  - Rolling windows create these context-rich, fixed-size sequences from the continuous stream of data for each engine, capturing temporal dependencies and trends.

  - It allows models to learn from patterns that span multiple cycles, rather than isolated measurements.

  - Even after earlier steps that compute rolling statistics or aggregates, rolling window sequence generation formats the data structurally for model training.

In [None]:
# 1. Imports and Data Loading
import pandas as pd
import numpy as np

# Load the feature-engineered dataset from previous step (adjust path as needed)
# df = pd.read_csv('C:/Users/win10/Desktop/Project_Oct25/prognosAI-Infosys-intern-project/data/processed/processed_feature_matrix.csv')  # Assume feature engineered file
df = pd.read_csv('C:/Users/win10/Desktop/Project_Oct25/prognosAI-Infosys-intern-project/data/processed/cmapss_preprocessed.csv')  # Updated path

# Basic info
print("Dataset shape:", df.shape)
df.head()


In [None]:
exclude_cols = ['engine_id', 'cycle', 'dataset_id']  # ADD 'dataset_id' to exclusions
feature_cols = [col for col in df.columns if col not in exclude_cols]

print(f"Feature columns ({len(feature_cols)}): {feature_cols}")

# Verify all feature columns are numeric
numeric_check = df[feature_cols].dtypes.apply(lambda x: np.issubdtype(x, np.number)).all()
assert numeric_check, "Non-numeric columns found in feature_cols!"

# Sort data by engine_id and cycle to ensure correct temporal order
df = df.sort_values(['engine_id', 'cycle']).reset_index(drop=True)


In [None]:
def generate_rolling_windows(data, engine_col, features, window_size=30):
    sequences = []
    engine_ids = []
    cycle_ids = []
    
    for engine in data[engine_col].unique():
        engine_data = data[data[engine_col] == engine]
        engine_features = engine_data[features].values
        
        # Generate sequences with rolling window
        for i in range(window_size - 1, len(engine_data)):
            seq = engine_features[i - window_size + 1 : i + 1]
            sequences.append(seq)
            engine_ids.append(engine)
            cycle_ids.append(engine_data.iloc[i]['cycle'])
            
    # Convert to array for modeling
    sequences = np.array(sequences)
    return sequences, engine_ids, cycle_ids



In [None]:
window_size = 30  # Typical rolling window length; adjust as needed
sequences, engine_ids, cycle_ids = generate_rolling_windows(df, 'engine_id', feature_cols, window_size)

print("Shape of rolling window sequences:", sequences.shape)  # (num_sequences, window_size, num_features)
print("Example sequence shape:", sequences[0].shape)


In [None]:
# Print the first sequence info
print(f"Engine ID: {engine_ids[0]}, Cycle: {cycle_ids[0]}")
print("Sequence data for first time window (shape {}):".format(sequences[0].shape))
print(sequences[0])


In [None]:
assert sequences.shape[1] == window_size, "Sequence window length mismatch"

# Check sequences integrity: cycles should increase within each engine
# When engine changes, cycle resets (decreases), which is expected
for i in range(1, len(cycle_ids)):
    if engine_ids[i] == engine_ids[i-1]:
        # Same engine: cycle should increase
        assert cycle_ids[i] >= cycle_ids[i-1], f"Cycle order violation within engine {engine_ids[i]}"
    # Different engine: cycle can reset (no assertion needed)

print("✓ Cycle order validation passed")
print(f"  Total sequences: {len(sequences)}")
print(f"  Unique engines: {len(set(engine_ids))}")

In [None]:
# Save sequences and metadata for modeling
np.save('rolling_window_sequences.npy', sequences)
pd.DataFrame({'engine_id': engine_ids, 'cycle': cycle_ids}).to_csv('sequence_metadata.csv', index=False)

### Observations: