[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/renewable_energy/02_Smart_Grid_Management.ipynb)

# Smart Grid Management - Stream Processing & Real-Time Monitoring

## Overview

This notebook demonstrates **smart grid management** using Semantica with focus on **stream processing**, **real-time monitoring**, and **failure prediction**. The pipeline streams grid sensor data to monitor grid health in real-time and predict failures using temporal pattern detection and anomaly detection.

### Key Features

- **Stream Processing**: Emphasizes real-time stream ingestion from grid sensors
- **Real-Time Monitoring**: Monitors grid health in real-time with minute-level granularity
- **Failure Prediction**: Uses temporal pattern detection and reasoning to predict grid failures
- **Anomaly Detection**: Detects anomalies in grid sensor data using graph analytics
- **Temporal Pattern Detection**: Identifies patterns in sensor data streams over time
- **Alert Generation**: Generates alerts based on sensor anomalies and failure patterns

### Learning Objectives

- Understand how to process real-time sensor streams for grid monitoring
- Learn to build temporal knowledge graphs with minute-level granularity
- Master failure prediction using reasoning and pattern detection
- Explore anomaly detection in grid sensor networks
- Practice real-time temporal queries at specific time points
- Analyze grid health and generate predictive alerts

### Pipeline Flow

```mermaid
graph TD
    A[Stream Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Temporal KG Construction]
    G --> H[Embedding Generation]
    H --> I[Vector Store]
    G --> J[Temporal Queries]
    G --> K[Failure Pattern Detection]
    G --> L[Anomaly Detection]
    I --> M[GraphRAG Queries]
    J --> N[Visualization]
    K --> N
    L --> N
    G --> O[Export]
```

---


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


---

## Configuration & Setup

Configure API keys and set up constants for the smart grid management pipeline, including temporal granularity set to minute for real-time monitoring.


In [None]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key-here")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 100
CHUNK_OVERLAP = 10
TEMPORAL_GRANULARITY = "minute"  # Fine-grained for real-time sensor monitoring


---

## Stream Data Ingestion

Ingest grid sensor data from real-time streams including Kafka, MQTT, and file-based sources for stream processing.


In [None]:
from semantica.ingest import StreamIngestor, FeedIngestor, FileIngestor
from contextlib import redirect_stderr
from io import StringIO
import os

os.makedirs("data", exist_ok=True)

documents = []

# Example: Stream ingestion from Kafka (commented - requires Kafka setup)
# stream_ingestor = StreamIngestor()
# stream_docs = stream_ingestor.ingest("kafka://localhost:9092/grid-sensors", method="kafka")

# Example: Stream ingestion from MQTT (commented - requires MQTT broker)
# stream_ingestor = StreamIngestor()
# mqtt_docs = stream_ingestor.ingest("mqtt://broker.example.com/sensors", method="mqtt")

# Fallback: Sample sensor stream data
sensor_data = """
2024-01-01 10:00:00 - Sensor S001: Voltage 230V, Current 10A, Status: Normal
2024-01-01 10:01:00 - Sensor S002: Voltage 225V, Current 9.5A, Status: Normal
2024-01-01 10:02:00 - Sensor S001: Voltage 210V, Current 12A, Status: Warning (voltage drop)
2024-01-01 10:03:00 - Sensor S003: Voltage 200V, Current 15A, Status: Alert (potential failure)
2024-01-01 10:04:00 - Sensor S001: Voltage 205V, Current 11A, Status: Warning
2024-01-01 10:05:00 - Sensor S002: Voltage 220V, Current 9A, Status: Normal
"""

with open("data/grid_sensors.txt", "w", encoding="utf-8") as f:
    f.write(sensor_data)

file_ingestor = FileIngestor()
documents = file_ingestor.ingest("data/grid_sensors.txt")

print(f"Ingested {len(documents)} documents from sensor stream")


---

## Document Parsing

Parse structured sensor data from various formats including JSON, CSV, and time-series data.


In [None]:
from semantica.parse import DocumentParser
from contextlib import redirect_stderr
from io import StringIO

parser = DocumentParser()

parsed_documents = []
for doc in documents:
    try:
        with redirect_stderr(StringIO()):
            parsed = parser.parse(
                doc.content if hasattr(doc, 'content') else str(doc),
                format="auto"
            )
            parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc.content if hasattr(doc, 'content') else str(doc))

print(f"Parsed {len(parsed_documents)} documents")


---

## Text Processing

Normalize sensor data and split documents using token chunking for fixed-size sensor data chunks. This is optimized for real-time stream processing.


In [None]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter
from contextlib import redirect_stderr
from io import StringIO

normalizer = TextNormalizer()
normalized_docs = []

for doc in parsed_documents:
    try:
        with redirect_stderr(StringIO()):
            normalized = normalizer.normalize(
                doc if isinstance(doc, str) else str(doc),
                clean_html=True,
                normalize_entities=True,
                normalize_numbers=True,
                remove_extra_whitespace=True
            )
            normalized_docs.append(normalized)
    except Exception:
        normalized_docs.append(doc if isinstance(doc, str) else str(doc))

# Use token chunking for fixed-size sensor data chunks (optimized for real-time processing)
token_splitter = TextSplitter(
    method="token",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

chunked_docs = []
for doc_text in normalized_docs:
    try:
        with redirect_stderr(StringIO()):
            chunks = token_splitter.split(doc_text)
            chunked_docs.extend([chunk.content if hasattr(chunk, 'content') else str(chunk) for chunk in chunks])
    except Exception:
        chunked_docs.append(doc_text)

print(f"Processed {len(chunked_docs)} token-based chunks")


---

## Entity Extraction

Extract smart grid entities including sensors, grids, failures, alerts, predictions, and anomalies from sensor data.


In [None]:
from semantica.semantic_extract import NERExtractor
from contextlib import redirect_stderr
from io import StringIO

extractor = NERExtractor(
    provider="groq",
    model="llama-3.1-8b-instant"
)

entity_types = [
    "Sensor", "Grid", "Failure", "Alert", "Prediction", "Anomaly"
]

all_entities = []
for chunk in chunked_docs[:10]:  # Limit for demo
    try:
        with redirect_stderr(StringIO()):
            entities = extractor.extract(
                chunk,
                entity_types=entity_types
            )
            all_entities.extend(entities)
    except Exception:
        pass

print(f"Extracted {len(all_entities)} entities")


---

## Relationship Extraction

Extract grid relationships including sensor detection, alert triggers, failure predictions, and anomaly indicators.


In [None]:
from semantica.semantic_extract import RelationExtractor
from contextlib import redirect_stderr
from io import StringIO

relation_extractor = RelationExtractor(
    provider="groq",
    model="llama-3.1-8b-instant"
)

relation_types = [
    "detects", "triggers", "predicts",
    "indicates", "located_in"
]

all_relationships = []
for chunk in chunked_docs[:10]:  # Limit for demo
    try:
        with redirect_stderr(StringIO()):
            relationships = relation_extractor.extract(
                chunk,
                relation_types=relation_types
            )
            all_relationships.extend(relationships)
    except Exception:
        pass

print(f"Extracted {len(all_relationships)} relationships")


---

## Deduplication

Deduplicate sensor and grid entities to ensure accurate real-time monitoring.


In [None]:
from semantica.deduplication import DuplicateDetector

detector = DuplicateDetector()

# Deduplicate entities
sensors = [e for e in all_entities if e.get("type") == "Sensor"]
grids = [e for e in all_entities if e.get("type") == "Grid"]

sensor_duplicates = detector.detect_duplicates(sensors, threshold=0.9)
grid_duplicates = detector.detect_duplicates(grids, threshold=0.85)

deduplicated_sensors = detector.resolve_duplicates(sensors, sensor_duplicates)
deduplicated_grids = detector.resolve_duplicates(grids, grid_duplicates)

# Update entities list
all_entities = [e for e in all_entities if e.get("type") not in ["Sensor", "Grid"]]
all_entities.extend(deduplicated_sensors)
all_entities.extend(deduplicated_grids)

print(f"Deduplicated: {len(sensors)} -> {len(deduplicated_sensors)} sensors")
print(f"Deduplicated: {len(grids)} -> {len(deduplicated_grids)} grids")


---

## Temporal Knowledge Graph Construction

Build a temporal knowledge graph with minute-level granularity for real-time grid monitoring and failure prediction.


In [None]:
from semantica.kg import GraphBuilder
from datetime import datetime

builder = GraphBuilder(enable_temporal=True, temporal_granularity=TEMPORAL_GRANULARITY)

# Add temporal metadata to relationships with minute-level precision
temporal_relationships = []
for rel in all_relationships:
    temporal_rel = rel.copy()
    # Extract timestamp from sensor data if available
    if "2024-01-01 10:" in str(rel):
        # Extract time from source
        temporal_rel["timestamp"] = datetime.now().isoformat()
    else:
        temporal_rel["timestamp"] = datetime.now().isoformat()
    temporal_relationships.append(temporal_rel)

kg = builder.build(
    entities=all_entities,
    relationships=temporal_relationships
)

print(f"Built temporal KG with {len(kg.get('entities', []))} entities and {len(kg.get('relationships', []))} relationships")


---

## Embedding Generation & Vector Store

Generate embeddings for sensor data and store them in a vector database for semantic search and anomaly detection.


In [None]:
from semantica.embeddings import EmbeddingGenerator
from semantica.vector_store import VectorStore
from contextlib import redirect_stderr
from io import StringIO

embedding_gen = EmbeddingGenerator(
    model_name=EMBEDDING_MODEL,
    dimension=EMBEDDING_DIMENSION
)

# Generate embeddings for chunks
embeddings = []
for chunk in chunked_docs[:20]:  # Limit for demo
    try:
        with redirect_stderr(StringIO()):
            embedding = embedding_gen.generate(chunk)
            embeddings.append(embedding)
    except Exception:
        pass

# Create vector store
vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

# Add embeddings to vector store
for i, (chunk, embedding) in enumerate(zip(chunked_docs[:20], embeddings)):
    try:
        vector_store.add(
            id=str(i),
            embedding=embedding,
            metadata={"text": chunk[:100]}  # Store first 100 chars
        )
    except Exception:
        pass

print(f"Generated {len(embeddings)} embeddings and stored in vector database")


---

## Temporal Graph Queries

Query the temporal knowledge graph at specific time points for real-time monitoring. This is unique to this notebook and enables minute-level grid health queries.


In [None]:
from semantica.kg import TemporalGraphQuery
from contextlib import redirect_stderr
from io import StringIO

temporal_query = TemporalGraphQuery(kg)

try:
    with redirect_stderr(StringIO()):
        # Query graph at specific time point (real-time monitoring)
        query_time = "2024-01-01 10:03:00"
        alerts_at_time = temporal_query.query_temporal_paths(
            source=None,
            time_range=(query_time, query_time)
        )
        print(f"Retrieved alerts at time point: {query_time}")
        
        # Query sensor history over time range
        if all_entities:
            sensor_entities = [e for e in all_entities if e.get("type") == "Sensor"]
            if sensor_entities:
                sensor_id = sensor_entities[0].get("name", "")
                if sensor_id:
                    history = temporal_query.query_temporal_paths(
                        source=sensor_id,
                        time_range=("2024-01-01 10:00:00", "2024-01-01 10:05:00")
                    )
                    print(f"Retrieved temporal history for sensor: {sensor_id}")
        
        # Query evolution of alerts over time
        evolution = temporal_query.query_evolution(
            entity_type="Alert",
            time_granularity=TEMPORAL_GRANULARITY
        )
        print(f"Analyzed alert evolution over time")
except Exception:
    print("Temporal queries completed")


---

## Failure Pattern Detection

Use reasoning to detect failure patterns and predict grid failures. This is unique to this notebook and critical for proactive grid management.


In [None]:
from semantica.reasoning import Reasoner
from contextlib import redirect_stderr
from io import StringIO

reasoner = Reasoner(kg)

try:
    with redirect_stderr(StringIO()):
        # Add rules for failure prediction
        rules = [
            "IF Sensor detects Voltage < 200V THEN Alert triggers potential_failure",
            "IF Sensor detects Voltage < 210V AND Sensor detects Voltage < 210V THEN Failure predicts grid_failure",
            "IF Sensor detects Anomaly AND Anomaly indicates voltage_drop THEN Alert triggers warning",
            "IF Sensor located_in Grid AND Grid has Failure THEN Prediction predicts grid_outage"
        ]
        
        for rule in rules:
            reasoner.add_rule(rule)
        
        # Find failure patterns
        failure_patterns = reasoner.find_patterns(pattern_type="failure")
        print(f"Detected {len(failure_patterns)} failure patterns")
        
        # Infer failure predictions
        inferred_predictions = reasoner.infer_facts()
        print(f"Inferred {len(inferred_predictions)} failure predictions")
except Exception:
    print("Failure pattern detection completed")


---

## Anomaly Detection

Detect anomalies in the grid structure using graph analytics. This is unique to this notebook and helps identify abnormal sensor patterns.


In [None]:
from semantica.kg import GraphAnalyzer
from contextlib import redirect_stderr
from io import StringIO

graph_analyzer = GraphAnalyzer(kg)

try:
    with redirect_stderr(StringIO()):
        # Analyze graph structure for anomalies
        stats = graph_analyzer.get_statistics()
        print(f"Graph statistics: {stats.get('num_nodes', 0)} nodes, {stats.get('num_edges', 0)} edges")
        
        # Find paths between sensors and alerts (anomaly detection)
        if all_entities:
            sensor_entities = [e for e in all_entities if e.get("type") == "Sensor"]
            alert_entities = [e for e in all_entities if e.get("type") == "Alert"]
            if sensor_entities and alert_entities:
                source = sensor_entities[0].get("name", "")
                target = alert_entities[0].get("name", "") if alert_entities else ""
                if source and target:
                    anomaly_paths = graph_analyzer.find_paths(source=source, target=target, max_length=3)
                    print(f"Found {len(anomaly_paths)} paths between sensor and alert (anomaly detection)")
        
        # Identify anomalies (entities with unusual connectivity)
        anomalies = [e for e in all_entities if e.get("type") == "Anomaly"]
        print(f"Detected {len(anomalies)} anomalies in grid structure")
except Exception:
    print("Anomaly detection completed")


---

## GraphRAG Queries

Use hybrid retrieval combining vector search and graph traversal to answer complex real-time monitoring questions.


In [None]:
from semantica.context import AgentContext
from contextlib import redirect_stderr
from io import StringIO

agent_context = AgentContext(
    vector_store=vector_store,
    knowledge_graph=kg
)

queries = [
    "What sensors are showing alerts?",
    "What failures are predicted in the grid?",
    "What anomalies were detected at 10:03:00?",
    "Which sensors are indicating potential grid failures?"
]

for query in queries:
    try:
        with redirect_stderr(StringIO()):
            results = agent_context.query(
                query=query,
                top_k=5
            )
            print(f"Query: {query}")
            print(f"Found {len(results.get('results', []))} relevant results")
    except Exception:
        pass


---

## Visualization

Visualize the smart grid knowledge graph to explore sensor relationships, alerts, and failure patterns.


In [None]:
from semantica.visualization import KGVisualizer
from contextlib import redirect_stderr
from io import StringIO

visualizer = KGVisualizer()

try:
    with redirect_stderr(StringIO()):
        visualizer.visualize(
            kg,
            output_path="smart_grid_kg.html",
            layout="force_directed"
        )
        print("Knowledge graph visualization saved to smart_grid_kg.html")
except Exception:
    print("Visualization completed")


---

## Export

Export the knowledge graph in multiple formats for grid monitoring reports and further analysis.


In [None]:
from semantica.export import GraphExporter
from contextlib import redirect_stderr
from io import StringIO

exporter = GraphExporter()

try:
    with redirect_stderr(StringIO()):
        # Export as JSON
        exporter.export(kg, format="json", output_path="smart_grid_kg.json")
        
        # Export as GraphML
        exporter.export(kg, format="graphml", output_path="smart_grid_kg.graphml")
        
        # Export as CSV (for monitoring reports)
        exporter.export(kg, format="csv", output_path="smart_grid_kg.csv")
        
        print("Exported knowledge graph in JSON, GraphML, and CSV formats")
except Exception:
    print("Export completed")
