# Quick Start Guide - Order Reconstruction Challenge

This notebook demonstrates how to use the order reconstruction pipeline.

## What we'll cover:
1. Loading data with `DataLoader`
2. Exploring the data structure
3. Extracting features with `FeatureExtractor`
4. Using the `SignalProcessingAlgorithm`
5. Running the full pipeline

## 1. Setup and Imports

## 2. Load Data

Let's start by loading a small subset of the data files to explore.

## 3. Explore Data Structure

Let's look at what's in a single file.

### Visualize acceleration signal

## 4. Extract Features

Now let's extract features from all loaded files using the `FeatureExtractor`.

## 5. Using the SignalProcessingAlgorithm

Now let's use the full algorithm to predict the order.

### Visualize feature trends over predicted order

## 6. Running the Full Pipeline

Instead of running code manually in notebooks, you can use the main pipeline script:

```bash
# Edit src/config.py to configure your experiment
# Then run:
python -m src.main
```

This will:
1. Load all data files
2. Preprocess according to config
3. Run the selected algorithm
4. Save results to `results/<experiment_name>/`

## Next Steps

1. **Explore more features**: Check `src/feature_extractor.py` for all available features
2. **Try different algorithms**: Create your own algorithm class inheriting from `BaseAlgorithm`
3. **Tune parameters**: Adjust `src/config.py` for different preprocessing and algorithm settings
4. **Full dataset**: Remove the `limit` parameter in `load_all()` to process all files
5. **Evaluate**: If you have the true order, use `OrderInference.compute_spearman_footrule()` to score your prediction

In [None]:
# Reorder features by predicted chronological order
ordered_features = features.loc[predicted_order]

# Plot a few key features
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

features_to_plot = ['rms', 'kurtosis', 'spectral_energy', 'peak_frequency']

for idx, feature_name in enumerate(features_to_plot):
    ax = axes[idx // 2, idx % 2]
    values = ordered_features[feature_name].values
    
    ax.plot(values, marker='o', markersize=8, linewidth=2)
    ax.set_title(f'{feature_name} over Time', fontsize=14, fontweight='bold')
    ax.set_xlabel('Predicted Chronological Position', fontsize=12)
    ax.set_ylabel('Feature Value', fontsize=12)
    ax.grid(True, alpha=0.3)
    
plt.tight_layout()
plt.show()

In [None]:
# Configure the algorithm
algo_config = {
    'sampling_rate': config.DATASET_CONFIG['sampling_rate'],
    'frequency_bands': config.DATASET_CONFIG['fault_band_centers'],
    'features': ['rms', 'kurtosis', 'spectral_energy']
}

# Initialize algorithm
algorithm = SignalProcessingAlgorithm(algo_config)

# Run the algorithm
predicted_order, features = algorithm.run(data)

print(f"\nPredicted chronological order: {predicted_order}")
print(f"\nThis means:")
for i, file_id in enumerate(predicted_order):
    print(f"  Position {i+1}: file_{file_id}.csv")

In [None]:
# Initialize feature extractor
feature_extractor = FeatureExtractor(sampling_rate=config.DATASET_CONFIG['sampling_rate'])

# Extract features from all files
features = feature_extractor.extract_features_from_all(
    data,
    fault_bands=config.DATASET_CONFIG['fault_band_centers']
)

print(f"Extracted features shape: {features.shape}")
print(f"\nFeature names:")
for i, col in enumerate(features.columns, 1):
    print(f"  {i}. {col}")

print(f"\nFeatures DataFrame:")
display(features)

In [None]:
# Plot first 10000 samples
plt.figure(figsize=(15, 5))
plt.plot(first_file['acceleration'][:10000])
plt.title(f'Acceleration Signal - file_{first_file_id}.csv (first 10k samples)')
plt.xlabel('Sample')
plt.ylabel('Acceleration')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Get first file
first_file_id = list(data.keys())[0]
first_file = data[first_file_id]

print(f"File: file_{first_file_id}.csv")
print(f"Shape: {first_file.shape}")
print(f"\nColumns: {first_file.columns.tolist()}")
print(f"\nFirst few rows:")
display(first_file.head())

print(f"\nBasic statistics:")
display(first_file.describe())

In [None]:
# Initialize data loader
data_loader = DataLoader(config.DATA_DIR)

# Load first 5 files for quick exploration
print("Loading sample files...")
data = data_loader.load_all(limit=5)

print(f"\nLoaded {len(data)} files")
print(f"File IDs: {list(data.keys())}")

In [None]:
import sys
from pathlib import Path

# Add parent directory to path so we can import src modules
sys.path.insert(0, str(Path.cwd().parent))

# Import our modules
from src.data_loader import DataLoader
from src.preprocessor import Preprocessor
from src.feature_extractor import FeatureExtractor
from src.algorithms.signal_processing import SignalProcessingAlgorithm
from src import config

# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Visualization settings
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("✓ Imports successful!")