# seisCAE Quick Start Guide

This notebook demonstrates how to use seisCAE for seismic event clustering.

## Installation

```bash
pip install seisCAE
```

Or for development:

```bash
git clone https://github.com/iSuthar/seisCAE.git
cd seisCAE
pip install -e .
```

## 1. Full Pipeline Approach (Easiest)

In [None]:
from seiscae import Pipeline, load_config
from seiscae.utils import setup_logging

# Setup logging
setup_logging(level=20)  # INFO

# Load default configuration
config = load_config('../../configs/default.yaml')

# Modify config if needed
config.set('training.epochs', 100)  # Reduce for quick test
config.set('hardware.gpu', 0)  # Use GPU 0

# Create pipeline
pipeline = Pipeline(config)

In [None]:
# Run complete pipeline
results = pipeline.run(
    data_path='../../data/seismic',  # Your data directory
    output_dir='../../results/quickstart',
)

In [None]:
# View results
print(f"Total events detected: {results.n_events}")
print(f"Number of clusters: {results.n_clusters}")
print(f"Output directory: {results.output_dir}")

## 2. Explore Results

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load results
catalog_df = results.catalog.to_dataframe()
catalog_df['cluster'] = results.labels

# View catalog
catalog_df.head()

In [None]:
# Cluster distribution
cluster_counts = catalog_df['cluster'].value_counts().sort_index()

plt.figure(figsize=(10, 6))
cluster_counts.plot(kind='bar', color='steelblue', edgecolor='black')
plt.xlabel('Cluster ID')
plt.ylabel('Number of Events')
plt.title('Event Distribution Across Clusters')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Get events from specific cluster
cluster_0 = results.get_cluster(0)
print(f"Cluster 0 has {len(cluster_0)} events")

# Analyze cluster characteristics
cluster_0_df = cluster_0.to_dataframe()
print(f"\nCluster 0 Statistics:")
print(f"  Average duration: {cluster_0_df['duration'].mean():.2f} seconds")
print(f"  Average energy: {cluster_0_df['energy'].mean():.4f}")

## 3. View Visualizations

In [None]:
from IPython.display import Image, display

# Display training history
display(Image(filename=str(results.output_dir / 'visualizations' / 'training_history.png')))

In [None]:
# Display UMAP projection
display(Image(filename=str(results.output_dir / 'visualizations' / 'latent_umap.png')))

In [None]:
# Display cluster examples
display(Image(filename=str(results.output_dir / 'visualizations' / 'cluster_0_examples.png')))

## 4. Export Results

In [None]:
# Export catalog with clusters to Excel
catalog_df.to_excel(results.output_dir / 'catalog_with_clusters.xlsx', index=False)
print("Exported to Excel")

# Export cluster summaries
cluster_summary = catalog_df.groupby('cluster').agg({
    'duration': ['mean', 'std', 'count'],
    'energy': ['mean', 'std']
})

cluster_summary.to_excel(results.output_dir / 'cluster_summary.xlsx')
print("Cluster summary exported")

## Next Steps

- See `02_custom_models.ipynb` for using custom model architectures
- See `03_advanced_clustering.ipynb` for advanced clustering techniques
- Check the documentation for more features