# Cloud Usage Anomaly Detection - Refactored Example

This notebook demonstrates the refactored, clean code implementation of the LSTM Autoencoder-based anomaly detection system.

## Improvements Over Original Code

1. **Modular Architecture**: Code separated into logical modules
2. **Type Safety**: Type hints throughout
3. **Error Handling**: Comprehensive validation and error messages
4. **Logging**: Professional logging instead of print statements
5. **Configuration**: Centralized, type-safe configuration
6. **Documentation**: Detailed docstrings for all functions
7. **Testability**: Easy to unit test components
8. **Reusability**: Components can be used independently

In [None]:
# Import necessary libraries
import sys
sys.path.append('../')

from anomaly_detector.pipeline import AnomalyDetectionPipeline
from anomaly_detector.config import Config
import pandas as pd
import matplotlib.pyplot as plt
import logging

# Configure logging
logging.basicConfig(level=logging.INFO)

print("âœ… Imports successful")

## 1. Configuration Setup

The refactored code uses a clean configuration system with type safety and default values.

In [None]:
# Create configuration
config = Config()

# Customize if needed
config.data.customer_id = 900
config.model.window_size = 7
config.model.epochs = 50
config.detection.decrease_threshold = -0.3

print(f"Configuration:")
print(f"  Customer ID: {config.data.customer_id}")
print(f"  Window Size: {config.model.window_size}")
print(f"  LSTM Units: {config.model.lstm_units}")
print(f"  Decrease Threshold: {config.detection.decrease_threshold * 100}%")

## 2. Initialize Pipeline

The pipeline orchestrates all components in a clean, maintainable way.

In [None]:
# Initialize the pipeline
pipeline = AnomalyDetectionPipeline(config)

print("âœ… Pipeline initialized")

## 3. Run Complete Pipeline

### Option A: Using Mock Data (for demonstration)

In [None]:
# Run pipeline with mock data
anomaly_df, explained_anomalies = pipeline.run(
    use_mock_data=True,
    use_llm_api=False,  # Set to True if you have Gemini API key
    verbose=1,
    save_plot='anomaly_results.png'
)

print("\nâœ… Pipeline execution completed")

### Option B: Using Real Data

In [None]:
# Uncomment to use real data
# df = pd.read_csv('../../total_clean.csv', parse_dates=['Date'])
# anomaly_df, explained_anomalies = pipeline.run(
#     df=df,
#     use_mock_data=False,
#     use_llm_api=False,
#     verbose=1
# )

## 4. Analyze Results

View the detected anomalies and their details.

In [None]:
print("\n" + "=" * 70)
print("ANOMALY DETECTION RESULTS")
print("=" * 70)

# Summary statistics
total_anomalies = len(anomaly_df[anomaly_df['anomaly_score'] > 0])
strong_anomalies = len(explained_anomalies)

print(f"\nTotal anomalies detected: {total_anomalies}")
print(f"Strong anomalies (score=1.0): {strong_anomalies}")
print(f"Weak anomalies (score=0.5): {total_anomalies - strong_anomalies}")

# Display all anomaly data
print("\nAll Detected Anomalies:")
anomalies_only = anomaly_df[anomaly_df['anomaly_score'] > 0]
if len(anomalies_only) > 0:
    display(anomalies_only[['TotalCost', 'anomaly_score', 'pct_change', 'lstm_flag', 'decrease_flag']])
else:
    print("No anomalies detected.")

In [None]:
# Display strong anomalies with explanations
if not explained_anomalies.empty:
    print("\n" + "=" * 70)
    print("STRONG ANOMALIES WITH EXPLANATIONS")
    print("=" * 70)
    
    for date, row in explained_anomalies.iterrows():
        print(f"\nðŸ“… Date: {date.strftime('%Y-%m-%d')}")
        print(f"   Score: {row['anomaly_score']:.1f}")
        print(f"   Cost: ${row['TotalCost']:.2f}")
        print(f"   Change: {row['pct_change']:.1f}%")
        print(f"   MSE: {row['mse']:.6f}")
        print(f"   ðŸ’¡ Explanation: {row.get('explanation', 'N/A')}")
else:
    print("\nNo strong anomalies detected.")

## 5. Visualization

View the generated plot showing anomalies in the time series.

In [None]:
# Display the plot
plt.show()

## 6. Export Results

Save results to CSV files for further analysis.

In [None]:
# Save to CSV
anomaly_df.to_csv('anomaly_results.csv')
print("âœ… All results saved to anomaly_results.csv")

if not explained_anomalies.empty:
    explained_anomalies.to_csv('strong_anomalies.csv')
    print("âœ… Strong anomalies saved to strong_anomalies.csv")

## 7. Advanced Usage: Individual Components

The refactored code allows you to use components independently.

In [None]:
# Example: Use only the data preprocessor
from anomaly_detector.data_preprocessing import DataPreprocessor, generate_mock_data
from anomaly_detector.config import DataConfig

# Generate data
mock_data = generate_mock_data(customer_id=100, n_days=60)

# Preprocess
preprocessor = DataPreprocessor(DataConfig())
ts = preprocessor.load_and_prepare_data(mock_data)

print(f"âœ… Loaded {len(ts)} data points")
print(ts.head())

## Summary of Improvements

### Code Quality
- âœ… Separated concerns into modules
- âœ… Added type hints everywhere
- âœ… Comprehensive error handling
- âœ… Professional logging
- âœ… Detailed documentation

### Architecture
- âœ… Configuration management
- âœ… Dependency injection
- âœ… Single Responsibility Principle
- âœ… Open/Closed Principle
- âœ… DRY (Don't Repeat Yourself)

### Maintainability
- âœ… Easy to test
- âœ… Easy to extend
- âœ… Easy to debug
- âœ… Reusable components
- âœ… Clear interfaces

### Security
- âœ… API keys from environment
- âœ… Input validation
- âœ… No hardcoded secrets
- âœ… Safe error messages