# Power Quality Anomaly Detection - Tutorial

This notebook demonstrates how to use the PQ Anomaly Detection system.

## Topics Covered:
1. Data Generation and Loading
2. Feature Extraction
3. Model Training
4. Evaluation and Visualization
5. Making Predictions

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
import os

# Add src to path
sys.path.insert(0, 'src')

from src.data_loader import PQDataLoader
from src.feature_extraction import FeatureExtractor
from src.model_training import PQModelTrainer
from src.visualization import PQVisualizer

print("✓ Libraries imported successfully")

## 1. Generate Synthetic Dataset

We'll generate a small dataset with 5 classes of power quality anomalies.

In [None]:
# Initialize data loader
data_loader = PQDataLoader(data_dir='data')

# Generate dataset (200 samples per class)
waveforms, labels = data_loader.generate_synthetic_dataset(n_samples=200)

print(f"Dataset shape: {waveforms.shape}")
print(f"Number of samples: {len(waveforms)}")
print(f"Classes: {np.unique(labels)}")
print(f"Samples per class: {len(labels) // len(np.unique(labels))}")

## 2. Visualize Sample Waveforms

Let's visualize one sample from each class.

In [None]:
# Initialize visualizer
visualizer = PQVisualizer()

# Get one sample from each class
unique_classes = np.unique(labels)
sample_waveforms = []
sample_labels = []

for class_name in unique_classes:
    idx = np.where(labels == class_name)[0][0]
    sample_waveforms.append(waveforms[idx])
    sample_labels.append(class_name)

# Plot multiple waveforms
fig = visualizer.plot_multiple_waveforms(
    sample_waveforms,
    sample_labels,
    title="Sample Waveforms by Class"
)
plt.show()

## 3. Detailed Waveform Analysis

Analyze a single waveform in both time and frequency domains.

In [None]:
# Select a harmonic waveform for analysis
harmonic_idx = np.where(labels == 'Harmonic')[0][0]
harmonic_waveform = waveforms[harmonic_idx]

# Plot combined analysis
fig = visualizer.plot_waveform_with_fft(
    harmonic_waveform,
    title="Harmonic Distortion Analysis"
)
plt.show()

## 4. Extract Features

Extract time-domain and frequency-domain features from all waveforms.

In [None]:
# Initialize feature extractor
feature_extractor = FeatureExtractor()

# Extract features from all waveforms
features = feature_extractor.extract_features_batch(waveforms)
feature_names = feature_extractor.get_feature_names()

print(f"Features extracted: {features.shape[1]}")
print(f"Feature names: {feature_names}")

# Create DataFrame for better visualization
features_df = pd.DataFrame(features, columns=feature_names)
features_df['label'] = labels

# Display first few rows
print("\nSample features:")
features_df.head()

## 5. Feature Analysis by Class

Analyze how features differ across anomaly types.

In [None]:
# Statistical summary by class
summary = features_df.groupby('label')[['rms_voltage', 'thd', 'dip_percentage', 'swell_percentage']].mean()
print("Average feature values by class:")
print(summary)

# Visualize
summary.plot(kind='bar', figsize=(12, 6))
plt.title('Average Feature Values by Anomaly Type')
plt.ylabel('Value')
plt.xticks(rotation=45)
plt.legend(loc='best')
plt.tight_layout()
plt.show()

## 6. Train Machine Learning Models

Train multiple ML models for classification.

In [None]:
# Initialize trainer
trainer = PQModelTrainer(model_dir='models')

# Prepare data (80-20 train-test split)
X_train, X_test, y_train, y_test = trainer.prepare_data(features, labels, test_size=0.2)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")

In [None]:
# Train Random Forest
print("Training Random Forest...")
rf_model = trainer.train_random_forest(X_train, y_train)
print("✓ Random Forest trained")

In [None]:
# Train XGBoost
print("Training XGBoost...")
xgb_model = trainer.train_xgboost(X_train, y_train)
print("✓ XGBoost trained")

In [None]:
# Train LightGBM
print("Training LightGBM...")
lgb_model = trainer.train_lightgbm(X_train, y_train)
print("✓ LightGBM trained")

## 7. Evaluate Models

Compare performance of all models.

In [None]:
# Evaluate all models
results = trainer.evaluate_all_models(X_test, y_test)

# Create comparison DataFrame
comparison = pd.DataFrame({
    'Model': list(results.keys()),
    'Accuracy': [r['accuracy'] for r in results.values()],
    'F1 (macro)': [r['f1_macro'] for r in results.values()],
    'F1 (weighted)': [r['f1_weighted'] for r in results.values()]
})

print("\nModel Comparison:")
print(comparison.to_string(index=False))

# Visualize
comparison.plot(x='Model', kind='bar', figsize=(10, 6))
plt.title('Model Performance Comparison')
plt.ylabel('Score')
plt.ylim(0.8, 1.0)
plt.legend(loc='lower right')
plt.tight_layout()
plt.show()

## 8. Confusion Matrix

Visualize prediction errors for the best model.

In [None]:
# Get best model (highest accuracy)
best_model_name = max(results.items(), key=lambda x: x[1]['accuracy'])[0]
best_result = results[best_model_name]

print(f"Best model: {best_model_name}")
print(f"Accuracy: {best_result['accuracy']:.4f}")

# Plot confusion matrix
fig = visualizer.plot_confusion_matrix(
    best_result['confusion_matrix'],
    trainer.class_names,
    title=f"Confusion Matrix - {best_model_name.upper()}"
)
plt.show()

## 9. Feature Importance

Identify which features are most important for classification.

In [None]:
# Get feature importance for tree-based model
importance = trainer.get_feature_importance('random_forest')

if importance is not None:
    # Plot feature importance
    fig = visualizer.plot_feature_importance(
        feature_names,
        importance,
        title="Feature Importance - Random Forest",
        top_n=15
    )
    plt.show()
    
    # Print top features
    indices = np.argsort(importance)[::-1]
    print("\nTop 10 Most Important Features:")
    for i in range(10):
        print(f"{i+1}. {feature_names[indices[i]]}: {importance[indices[i]]:.4f}")

## 10. Make Predictions on New Data

Use trained model to classify new waveforms.

In [None]:
# Generate a new test waveform
time = np.linspace(0, 0.2, 1280)
test_waveform = data_loader._generate_sag(time, 60)

# Extract features
test_features_dict = feature_extractor.extract_all_features(test_waveform)
test_features = np.array([list(test_features_dict.values())])

# Predict
predictions, probabilities = trainer.predict(test_features, model_name='xgboost')

print(f"\nPredicted class: {predictions[0]}")
print("\nClass probabilities:")
for class_name, prob in zip(trainer.class_names, probabilities[0]):
    print(f"  {class_name}: {prob:.4f} ({prob*100:.2f}%)")

# Visualize the test waveform
fig = visualizer.plot_waveform(
    test_waveform,
    title=f"Test Waveform - Predicted: {predictions[0]}"
)
plt.show()

## 11. Save Models

Save trained models for later use.

In [None]:
# Save all models
trainer.save_models()
print("✓ Models saved to 'models/' directory")

# Save dataset
data_loader.save_dataset(waveforms, labels)
print("✓ Dataset saved to 'data/' directory")

## 12. Load Saved Models

Demonstrate loading previously saved models.

In [None]:
# Create new trainer instance
new_trainer = PQModelTrainer(model_dir='models')

# Load saved models
new_trainer.load_models()
print(f"✓ Loaded {len(new_trainer.models)} models")
print(f"Available models: {list(new_trainer.models.keys())}")

## Summary

In this tutorial, we:
1. Generated synthetic power quality waveforms
2. Visualized different types of anomalies
3. Extracted signal processing features
4. Trained multiple ML models
5. Evaluated and compared model performance
6. Made predictions on new data
7. Saved models for future use

### Next Steps:
- Try the web application: `streamlit run app.py`
- Train on larger datasets for better accuracy
- Experiment with custom features
- Integrate with real power quality monitoring hardware