[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/renewable_energy/01_Energy_Market_Analysis.ipynb)

# Energy Market Analysis - Temporal KGs & Trend Prediction

## Overview

This notebook demonstrates **energy market analysis** using Semantica with focus on **temporal knowledge graphs**, **trend prediction**, and **market entity extraction**. The pipeline analyzes pricing trends and market movements using temporal market knowledge graphs to predict energy market trends and forecast pricing.

### Key Features

- **Temporal Knowledge Graphs**: Builds temporal KGs to track energy market trends over time
- **Trend Prediction**: Uses temporal analysis and reasoning to predict market movements
- **Market Entity Extraction**: Extracts energy market entities (Market, Price, Region, Trend, Forecast, EnergyType)
- **Temporal Pattern Detection**: Identifies patterns in energy pricing and market trends
- **Seed Data Integration**: Uses market foundation data for entity resolution
- **Forecasting**: Emphasizes reasoning-based market forecasting

### Learning Objectives

- Understand how to build temporal knowledge graphs for market analysis
- Learn to detect temporal patterns in energy pricing data
- Master trend prediction using reasoning and pattern detection
- Explore temporal graph queries for market trend analysis
- Practice market entity extraction and relationship mapping
- Analyze energy market trends and forecasting

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Seed Data Loading]
    A --> C[Document Parsing]
    B --> D[Text Processing]
    C --> D
    D --> E[Entity Extraction]
    E --> F[Relationship Extraction]
    F --> G[Deduplication]
    G --> H[Temporal KG Construction]
    H --> I[Embedding Generation]
    I --> J[Vector Store]
    H --> K[Temporal Pattern Detection]
    H --> L[Temporal Queries]
```


---


In [1]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


Note: you may need to restart the kernel to use updated packages.




---

## Configuration & Setup

Configure API keys and set up constants for the energy market analysis pipeline, including temporal granularity for trend tracking.


In [2]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "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")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200
TEMPORAL_GRANULARITY = "day"  # For market trend tracking


---

## Data Ingestion

Ingest energy market data from multiple sources including RSS feeds, web APIs, and local files.


In [8]:
from semantica.ingest import FeedIngestor, WebIngestor, FileIngestor
import os

os.makedirs("data", exist_ok=True)

documents = []

# Ingest from energy market RSS feeds
energy_feeds = [
    "https://www.energycentral.com/rss",
    "https://www.renewableenergyworld.com/rss",
    "https://www.greentechmedia.com/rss",
    "https://www.utilitydive.com/rss",
    "https://www.power-eng.com/rss",
    "https://www.energy-storage.news/rss",
    "https://www.pv-magazine.com/rss",
    "https://www.windpowermonthly.com/rss",
    "https://www.rechargenews.com/rss",
    "https://www.energystoragejournal.com/rss",
    "https://feeds.feedburner.com/EnergyBiz",
    "https://www.energy.gov/rss-feeds",
    "https://www.iea.org/rss",
    "https://www.irena.org/rss",
    "https://www.cleanenergywire.org/rss"
]

print(f"Ingesting from {len(energy_feeds)} RSS feeds...")
feed_ingestor = FeedIngestor()
for i, feed_url in enumerate(energy_feeds, 1):
    try:
        feed_data = feed_ingestor.ingest_feed(feed_url, validate=False)
        
        feed_count = 0
        for item in feed_data.items:
            if not item.content:
                item.content = item.description or item.title or ""
            if item.content:
                if not hasattr(item, 'metadata'):
                    item.metadata = {}
                item.metadata['source'] = feed_url
                item.metadata['source_type'] = 'rss_feed'
                documents.append(item)
                feed_count += 1
        
        print(f"  [{i}/{len(energy_feeds)}] Loaded {feed_count} documents from {feed_url}")
    except Exception as e:
        print(f"  [{i}/{len(energy_feeds)}] Failed to load {feed_url}: {str(e)[:50]}")

# Web ingestion from energy market websites
energy_web_sources = [
    "https://www.eia.gov/todayinenergy/",
    "https://www.energy.gov/",
    "https://www.epa.gov/energy",
    "https://www.nrel.gov/news/",
    "https://www.energy.gov/office-energy-efficiency-renewable-energy",
    "https://www.energy.gov/oe/office-electricity",
    "https://www.energy.gov/ne/office-nuclear-energy"
]

print(f"\nIngesting from {len(energy_web_sources)} web sources...")
web_ingestor = WebIngestor(respect_robots=True, delay=1.0)
for i, web_url in enumerate(energy_web_sources, 1):
    try:
        web_content = web_ingestor.ingest_url(web_url)
        if web_content and web_content.text:
            # Add content attribute for compatibility
            web_content.content = web_content.text
            if not hasattr(web_content, 'metadata'):
                web_content.metadata = {}
            web_content.metadata['source'] = web_url
            web_content.metadata['source_type'] = 'web_page'
            documents.append(web_content)
            print(f"  [{i}/{len(energy_web_sources)}] Loaded content from {web_url} ({len(web_content.text)} chars)")
        else:
            print(f"  [{i}/{len(energy_web_sources)}] No content from {web_url}")
    except Exception as e:
        print(f"  [{i}/{len(energy_web_sources)}] Failed to load {web_url}: {str(e)[:50]}")

print(f"\nTotal ingested: {len(documents)} documents")


Ingesting from 15 RSS feeds...
üß† Semantica is ingesting: File: energy_market.txt üîÑüì• (0.0s) | üß† Semantica is ingesting: 500 Server Error: Internal Server Error for url: https://www.energycentral.com/rss ‚ùåüì• (19.5s)

Failed to fetch feed https://www.energycentral.com/rss: 500 Server Error: Internal Server Error for url: https://www.energycentral.com/rss


  [1/15] Failed to load https://www.energycentral.com/rss: Failed to fetch feed: 500 Server Error: Internal S
üß† Semantica is ingesting: 500 Server Error: Internal Server Error for url: https://www.energycentral.com/rss ‚ùåüì• (19.5s) | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.renewableenergyworld.com/rss ‚ùåüì• (0.6s)

Failed to fetch feed https://www.renewableenergyworld.com/rss: 403 Client Error: Forbidden for url: https://www.renewableenergyworld.com/rss


  [2/15] Failed to load https://www.renewableenergyworld.com/rss: Failed to fetch feed: 403 Client Error: Forbidden 


Failed to parse feed: not well-formed (invalid token): line 16, column 71


üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.renewableenergyworld.com/rss ‚ùåüì• (0.6s) | üß† Semantica is ingesting: Failed to parse feed: not well-formed (invalid token): line 16, column 71 ‚ùåüì• (1.9s)  [3/15] Failed to load https://www.greentechmedia.com/rss: Failed to parse feed: not well-formed (invalid tok
üß† Semantica is ingesting: Failed to parse feed: not well-formed (invalid token): line 16, column 71 ‚ùåüì• (1.9s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.utilitydive.com/rss ‚ùåüì• (1.6s)

Failed to fetch feed https://www.utilitydive.com/rss: 404 Client Error: Not Found for url: https://www.utilitydive.com/rss


  [4/15] Failed to load https://www.utilitydive.com/rss: Failed to fetch feed: 404 Client Error: Not Found 
üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.utilitydive.com/rss ‚ùåüì• (1.6s) | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.power-eng.com/rss ‚ùåüì• (0.8s)

Failed to fetch feed https://www.power-eng.com/rss: 403 Client Error: Forbidden for url: https://www.power-eng.com/rss


  [5/15] Failed to load https://www.power-eng.com/rss: Failed to fetch feed: 403 Client Error: Forbidden 
üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.power-eng.com/rss ‚ùåüì• (0.8s) | üß† Semantica is ingesting: Ingested 50 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [6/15] Loaded 50 documents from https://www.energy-storage.news/rss
üß† Semantica is ingesting: Ingested 50 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.pv-magazine.com/rss ‚ùåüì• (0.8s)

Failed to fetch feed https://www.pv-magazine.com/rss: 403 Client Error: Forbidden for url: https://www.pv-magazine.com/rss


  [7/15] Failed to load https://www.pv-magazine.com/rss: Failed to fetch feed: 403 Client Error: Forbidden 
üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.pv-magazine.com/rss ‚ùåüì• (0.8s) | üß† Semantica is ingesting: Ingested 20 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [8/15] Loaded 20 documents from https://www.windpowermonthly.com/rss
üß† Semantica is ingesting: Ingested 20 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.rechargenews.com/rss?zephr_sso_ott=JBlrTY ‚ùåüì• (2.9s)

Failed to fetch feed https://www.rechargenews.com/rss: 404 Client Error: Not Found for url: https://www.rechargenews.com/rss?zephr_sso_ott=JBlrTY


  [9/15] Failed to load https://www.rechargenews.com/rss: Failed to fetch feed: 404 Client Error: Not Found 
üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.rechargenews.com/rss?zephr_sso_ott=JBlrTY ‚ùåüì• (2.9s) | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.energystoragejournal.com/rss ‚ùåüì• (3.7s)

Failed to fetch feed https://www.energystoragejournal.com/rss: 403 Client Error: Forbidden for url: https://www.energystoragejournal.com/rss


  [10/15] Failed to load https://www.energystoragejournal.com/rss: Failed to fetch feed: 403 Client Error: Forbidden 


Failed to parse feed: mismatched tag: line 17, column 6


üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.energystoragejournal.com/rss ‚ùåüì• (3.7s) | üß† Semantica is ingesting: Failed to parse feed: mismatched tag: line 17, column 6 ‚ùåüì• (1.0s)  [11/15] Failed to load https://feeds.feedburner.com/EnergyBiz: Failed to parse feed: mismatched tag: line 17, col
üß† Semantica is ingesting: Failed to parse feed: mismatched tag: line 17, column 6 ‚ùåüì• (1.0s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.energy.gov/rss-feeds ‚ùåüì• (2.9s)

Failed to fetch feed https://www.energy.gov/rss-feeds: 404 Client Error: Not Found for url: https://www.energy.gov/rss-feeds


  [12/15] Failed to load https://www.energy.gov/rss-feeds: Failed to fetch feed: 404 Client Error: Not Found 
üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.energy.gov/rss-feeds ‚ùåüì• (2.9s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.iea.org/rss ‚ùåüì• (1.8s)

Failed to fetch feed https://www.iea.org/rss: 404 Client Error: Not Found for url: https://www.iea.org/rss


  [13/15] Failed to load https://www.iea.org/rss: Failed to fetch feed: 404 Client Error: Not Found 


Failed to parse feed: syntax error: line 4, column 0


üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.iea.org/rss ‚ùåüì• (1.8s) | üß† Semantica is ingesting: Failed to parse feed: syntax error: line 4, column 0 ‚ùåüì• (1.2s)  [14/15] Failed to load https://www.irena.org/rss: Failed to parse feed: syntax error: line 4, column
üß† Semantica is ingesting: Failed to parse feed: syntax error: line 4, column 0 ‚ùåüì• (1.2s) | üß† Semantica is ingesting: Ingested 73 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [15/15] Loaded 73 documents from https://www.cleanenergywire.org/rss

Ingesting from 7 web sources...
üß† Semantica is ingesting: Ingested 73 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: Ingested https://www.eia.gov/todayinenergy/ (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [1/7] Loaded content from https://www.eia.gov/todayinenergy/ (19365 chars)
üß† Semantica is ingesting: Ingested https://www.ei

URL https://www.nrel.gov/news/ blocked by robots.txt


üß† Semantica is ingesting: Ingested https://www.epa.gov/energy (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: URL blocked by robots.txt: https://www.nrel.gov/news/ ‚ùåüì• (0.4s)  [4/7] Failed to load https://www.nrel.gov/news/: URL blocked by robots.txt: https://www.nrel.gov/ne
üß† Semantica is ingesting: URL blocked by robots.txt: https://www.nrel.gov/news/ ‚ùåüì• (0.4s) | üß† Semantica is ingesting: Ingested https://www.energy.gov/office-energy-efficiency-renewable-energy (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [5/7] Loaded content from https://www.energy.gov/office-energy-efficiency-renewable-energy (6564 chars)
üß† Semantica is ingesting: Ingested https://www.energy.gov/office-energy-efficiency-renewable-energy (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: Ingested https://www.energy.gov/oe/office-electricity (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà

In [9]:
from semantica.seed import SeedDataManager

seed_manager = SeedDataManager()

# Load market foundation seed data
market_foundation = {
    "markets": ["Energy Market", "Renewable Energy Market", "Electricity Market"],
    "regions": ["North America", "Region A", "Region B", "Europe", "Asia"],
    "energy_types": ["Solar", "Wind", "Hydro", "Geothermal", "Biomass"],
    "trends": ["increasing", "decreasing", "stable", "volatile"]
}

# Convert dictionary to entities and add to seed data
seed_entities = []

# Add market entities
for market in market_foundation["markets"]:
    seed_entities.append({
        "id": f"market_{market.lower().replace(' ', '_')}",
        "name": market,
        "type": "Market",
        "source": "seed_data"
    })

# Add region entities
for region in market_foundation["regions"]:
    seed_entities.append({
        "id": f"region_{region.lower().replace(' ', '_')}",
        "name": region,
        "type": "Region",
        "source": "seed_data"
    })

# Add energy type entities
for energy_type in market_foundation["energy_types"]:
    seed_entities.append({
        "id": f"energy_type_{energy_type.lower()}",
        "name": energy_type,
        "type": "EnergyType",
        "source": "seed_data"
    })

# Add trend entities
for trend in market_foundation["trends"]:
    seed_entities.append({
        "id": f"trend_{trend.lower()}",
        "name": trend,
        "type": "Trend",
        "source": "seed_data"
    })

# Add entities to seed manager
seed_manager.seed_data.entities.extend(seed_entities)

# Create seed relationships for better graph structure
seed_relationships = []

# Link renewable energy market to energy types
renewable_market_id = "market_renewable_energy_market"
for energy_type in market_foundation["energy_types"]:
    energy_id = f"energy_type_{energy_type.lower()}"
    seed_relationships.append({
        "source": renewable_market_id,
        "target": energy_id,
        "type": "includes",
        "source_name": "seed_data"
    })

# Link regions to markets
for region in market_foundation["regions"]:
    region_id = f"region_{region.lower().replace(' ', '_')}"
    # Link to main energy market
    seed_relationships.append({
        "source": region_id,
        "target": "market_energy_market",
        "type": "located_in",
        "source_name": "seed_data"
    })

# Add relationships to seed manager
seed_manager.seed_data.relationships.extend(seed_relationships)

print(f"Loaded seed data with {len(seed_entities)} entities and {len(seed_relationships)} relationships")
print(f"Seed entities: {len(seed_manager.seed_data.entities)}, Seed relationships: {len(seed_manager.seed_data.relationships)}")


Loaded seed data with 17 entities and 10 relationships
Seed entities: 17, Seed relationships: 10


---

## Document Parsing

Parse structured energy market data from various formats including JSON, HTML, and XML.


In [10]:
from semantica.parse import DocumentParser

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        parsed = parser.parse(
            doc.content if hasattr(doc, 'content') else str(doc),
            format="auto"
        )
        parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc.content if hasattr(doc, 'content') else str(doc))
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

print(f"Parsed {len(parsed_documents)} documents")


Parsing 149 documents...
üß† Semantica is parsing: Document: Two US companies, GridStor, and CPS Energy are making advancements on Texas energy storage deployments, with GridStor executing a tolling agreement and CPS Energy issuing a new request for proposals. üîÑüîç (0.0s) | üß† Semantica is parsing: Document: 400MWh BESS outage could result in US$1.2 million in monthly losses. üîÑüîç (0.0s)CPS Energy issuing a new request for proposals. üîÑüîç (0.0s)  Parsed 50/149 documents...
üß† Semantica is parsing: Document file not found: In 2026, Germany's government¬†under chancellor Friedrich Merz must end the uncertainty prevailing during its first months in office. It can no longer delay major climate and energy policy decisions to put the country on track to climate neutrality by 2045, and has pledged to reset the energy transition by lowering costs and improving resilience. However, it has yet to present crucial reforms in electricity, industry, transport, and buildings. Patienc

---

## Text Processing

Normalize energy market data and split documents using recursive chunking to preserve market context.


In [11]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
print(f"Normalizing {len(parsed_documents)} documents...")
normalized_docs = []

for i, doc in enumerate(parsed_documents, 1):
    try:
        normalized = normalizer.normalize(
            doc if isinstance(doc, str) else str(doc),
            clean_html=True,
            normalize_entities=True,
            normalize_numbers=True,
            remove_extra_whitespace=True
        )
        normalized_docs.append(normalized)
    except Exception:
        normalized_docs.append(doc if isinstance(doc, str) else str(doc))
    if i % 50 == 0 or i == len(parsed_documents):
        print(f"  Normalized {i}/{len(parsed_documents)} documents...")

# Use recursive chunking to preserve market context
recursive_splitter = TextSplitter(
    method="recursive",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

print(f"Chunking {len(normalized_docs)} documents...")
chunked_docs = []
for i, doc_text in enumerate(normalized_docs, 1):
    try:
        chunks = recursive_splitter.split(doc_text)
        chunked_docs.extend([chunk.content if hasattr(chunk, 'content') else str(chunk) for chunk in chunks])
    except Exception:
        chunked_docs.append(doc_text)
    if i % 50 == 0 or i == len(normalized_docs):
        print(f"  Chunked {i}/{len(normalized_docs)} documents ({len(chunked_docs)} chunks so far)")

print(f"Created {len(chunked_docs)} chunks from {len(normalized_docs)} documents")


Normalizing 149 documents...
üß† Semantica is parsing: Document:  means you‚Äôve safely connected to the .gov website. Share sensitive information only on official, secure websites. About About Our Agency Our Agency History Our Leadership & Offices The National Nuclear Security Administration The Energy Information Administration National Laboratories Power Marketing Administrations Our Outreach Our Outreach Newsroom Digital Engagement and Media Brand Guides Our Mission Our Mission Genesis Mission Security & Safety Security & Safety Nuclear Security Energy Security Cybersecurity Environmental & Legacy Management Research, Technology, & Economic Security Emergency Response Scientific Excellence Scientific Excellence Scientific Research Database (OSTI) Nobel Prize Laureates National Science Bowl Environment Environment Earth Systems Modeling Extreme Weather Resiliency Energy Access Energy Access 2025 DOE 403 Orders Grid Deployment & Transmission Puerto Rico Grid Resilience & Transitions

In [16]:
from semantica.semantic_extract import NERExtractor

# Use regex-based extraction with custom patterns for energy market entity types
extractor = NERExtractor(
    method="regex",
    patterns={
        "Price": r"\$\d+(?:\.\d+)?/MWh|\d+(?:\.\d+)?\s*\$?/MWh|price\s+of\s+\$\d+",
        "EnergyType": r"\b(Solar|Wind|Hydro|Geothermal|Biomass|Nuclear|Coal|Gas|renewable\s+energy|fossil\s+energy|clean\s+energy)\b",
        "Region": r"\b(Region\s+[A-Z]|North\s+America|Europe|Asia|Region\s+A|Region\s+B)\b",
        "Market": r"\b(Energy\s+Market|Renewable\s+Energy\s+Market|Electricity\s+Market)\b",
        "Trend": r"\b(increasing|decreasing|stable|volatile|rising|falling)\b",
        "Forecast": r"\b(forecast|prediction|expected|projected|outlook)\b",
        "ORG": r"\b([A-Z][a-zA-Z]+(?:\s+[A-Z][a-zA-Z]+)*\s+(?:Inc|Corp|LLC|Ltd|Company|Corporation))\b",
        "GPE": r"\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b",
        "DATE": r"\b(\d{4}-\d{2}-\d{2}|\d{1,2}[/-]\d{1,2}[/-]\d{2,4}|\d{4})\b",
        "MONEY": r"\b(\$[\d,]+(?:\.\d{2})?)\b",
    }
)

# Semantica handles batch processing automatically - just pass the list of chunks
entity_results = extractor.extract(
    chunked_docs,
    entity_types=["Market", "Price", "Region", "Trend", "Forecast", "EnergyType", "ORG", "GPE", "DATE", "MONEY"]
)

# Flatten results from batch extraction
all_entities = [entity for entities in entity_results for entity in entities]
print(f"Extracted {len(all_entities)} entities from {len(chunked_docs)} chunks")


üß† Normalizing text üîÑüîß (0.0s) | üß† Semantica is extracting: Extracting named entities from text üîÑüéØ (0.0s)            ‚úÖüéØ‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% [0/215] üîÑüéØExtracted 4427 entities from 215 chunks


---

## Relationship Extraction

Extract market relationships including price associations, regional locations, trend indicators, and forecasting relationships.


In [17]:
from semantica.semantic_extract import RelationExtractor

# Use dependency parsing for relation extraction
relation_extractor = RelationExtractor(
    method="dependency",
    model="en_core_web_sm"
)

# Semantica handles batch processing - pass chunks and corresponding entity lists
relation_results = relation_extractor.extract(
    chunked_docs,
    entities=entity_results,  # Use entity_results (list of lists) for proper batch matching
    relation_types=["has_price", "located_in", "shows_trend", "predicts", "trades_in"]
)

# Flatten results from batch extraction
all_relationships = [rel for rels in relation_results for rel in rels]
print(f"Extracted {len(all_relationships)} relationships from {len(chunked_docs)} chunks")


üß† Semantica is extracting: Extracting named entities from text üîÑüéØ (0.0s) | üß† Semantica is extracting: Extracted 2 relations using dependency |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØÔøΩ| 0.0% [0/215] üîÑüéØExtracted 513 relationships from 215 chunks


## Conflict Detection and Resolution

- Detect conflicts in energy market data from multiple sources using temporal and relationship conflict detection approaches
- Resolve conflicts using `most_recent` strategy for time-sensitive market data to prioritize latest information
- Combine temporal and relationship conflict detection to ensure data consistency across entities and relationships

In [24]:
from semantica.conflicts import ConflictDetector, ConflictResolver

# Minimal conversion to dict format
entities_dict = [
    {"id": getattr(e, "text", str(i)), "name": getattr(e, "text", ""), "type": getattr(e, "label", ""), "source": getattr(e, "metadata", {}).get("source", "unknown") if hasattr(e, "metadata") and getattr(e, "metadata", {}) else "unknown"}
    for i, e in enumerate(all_entities)
]

relationships_dict = [
    {"source_id": getattr(r.subject, "text", "") if hasattr(r, "subject") else "", "target_id": getattr(r.object, "text", "") if hasattr(r, "object") else "", "type": getattr(r, "predicate", ""), "source": getattr(r, "metadata", {}).get("source", "unknown") if hasattr(r, "metadata") and getattr(r, "metadata", {}) else "unknown"}
    for r in all_relationships
] if all_relationships else []

# Detect and resolve conflicts
conflict_detector = ConflictDetector()
conflict_resolver = ConflictResolver()

temporal_conflicts = conflict_detector.detect_temporal_conflicts(entities_dict)
relationship_conflicts = conflict_detector.detect_relationship_conflicts(relationships_dict) if relationships_dict else []
all_conflicts = temporal_conflicts + relationship_conflicts

print(f"Detected {len(temporal_conflicts)} temporal conflicts, {len(relationship_conflicts)} relationship conflicts")

if all_conflicts:
    resolved = conflict_resolver.resolve_conflicts(all_conflicts, strategy="most_recent")
    print(f"Resolved {len([r for r in resolved if r.resolved])} conflicts")
else:
    print("No conflicts detected")

üß† Semantica is extracting: Extracted 2 relations using dependency |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ | üß† Semantica is resolving: Detected 0 relationship conflicts |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [363/388] ‚úÖ‚ö†Ô∏è (455.1/s)63/388] üîÑ‚ö†Ô∏è (ETA: 0.1s | 464.2/s))ETA: 0.1s | 609.8/s))Detected 0 temporal conflicts, 0 relationship conflicts
No conflicts detected


In [26]:
from semantica.kg import EntityResolver
from semantica.semantic_extract import Entity

# Convert Entity objects to dictionaries for EntityResolver
entity_dicts = [
    {
        "name": getattr(e, "text", ""),
        "type": getattr(e, "label", ""),
        "start_char": getattr(e, "start_char", 0),
        "end_char": getattr(e, "end_char", 0),
        "confidence": getattr(e, "confidence", 1.0)
    }
    for e in all_entities
]

# Use EntityResolver to resolve duplicates
entity_resolver = EntityResolver(strategy="fuzzy", similarity_threshold=0.85)
resolved_entities = entity_resolver.resolve_entities(entity_dicts)

# Convert back to Entity objects
all_entities = [
    Entity(
        text=e["name"],
        label=e["type"],
        start_char=e.get("start_char", 0),
        end_char=e.get("end_char", 0),
        confidence=e.get("confidence", 1.0)
    )
    for e in resolved_entities
]

print(f"Deduplicated {len(entity_dicts)} entities to {len(all_entities)} unique entities")

üß† Semantica is deduplicating: Merging groups... 1/1 (remaining: 0) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [1/1] üîÑüîÑ (8.2/s) | üß† Semantica is deduplicating: Entity merge completed |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] ‚úÖüîÑ (108.6/s))‚ñà| 100.0% [4/4] üîÑüîÑ (121.8/s) üîÑüîÑ (ETA: 0.0s | 117.1/s)248.9/s))4/s) | 1346.3/s)))))Deduplicated 4427 entities to 411 unique entities


---

## Temporal Knowledge Graph Construction

Build a temporal knowledge graph with time-aware relationships for tracking energy market trends over time.


In [28]:
from semantica.kg import GraphBuilder

# Build temporal knowledge graph with unique entities and relationships
builder = GraphBuilder(enable_temporal=True, temporal_granularity=TEMPORAL_GRANULARITY)

kg = builder.build(
    sources=all_entities,
    relationships=all_relationships
)

print(f"Built temporal KG with {len(kg.get('entities', []))} entities and {len(kg.get('relationships', []))} relationships")

üß† Semantica is deduplicating: Entity merge completed |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] ‚úÖüîÑ (108.6/s) | üß† Semantica is building: Knowledge graph from 411 source(s) üîÑüß† (0.0s)Building graph structure...
‚úÖ Graph structure built (0.00s)
üß† Semantica is deduplicating: Entity merge completed |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] ‚úÖüîÑ (108.6/s) | üß† Semantica is building: Knowledge graph from 411 source(s) üîÑüß† (0.0s)
‚úÖ Knowledge Graph Build Complete
   Entities: 411
   Relationships: 513
   Total time: 2.74s
Built temporal KG with 411 entities and 513 relationships


---

## Embedding Generation & Vector Store

Generate embeddings for energy market documents and store them in a vector database for semantic search.


In [32]:
from semantica.embeddings import EmbeddingGenerator
from semantica.vector_store import VectorStore

# Generate embeddings and create vector store
embedding_gen = EmbeddingGenerator(
    model_name=EMBEDDING_MODEL,
    dimension=EMBEDDING_DIMENSION
)

chunks_to_embed = chunked_docs[:20]  # Limit for demo
embeddings = embedding_gen.generate_embeddings(chunks_to_embed, data_type="text")

# Create and populate vector store
vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)
metadata = [{"text": chunk[:100] if isinstance(chunk, str) else str(chunk)[:100]} for chunk in chunks_to_embed]
vector_ids = vector_store.store_vectors(vectors=embeddings, metadata=metadata)

print(f"Generated {len(embeddings)} embeddings and stored in vector database")

fastembed not available. Install with: pip install fastembed. Using fallback embedding method.
fastembed not available. Install with: pip install fastembed. Using fallback embedding method.


üß† Semantica is building: Knowledge graph from 411 source(s) üîÑüß† (0.0s) | üß† Semantica is indexing: Storing 20 vectors üîÑüìä (0.0s)Generated 20 embeddings and stored in vector database


---

## Temporal Pattern Detection

Detect temporal patterns in energy market data to identify trends. This is unique to this notebook and critical for trend prediction.


In [34]:
from semantica.kg import TemporalPatternDetector

pattern_detector = TemporalPatternDetector()

# Detect trend patterns
trend_patterns = pattern_detector.detect_temporal_patterns(
    graph=kg,
    pattern_type="trend",
    min_frequency=2,
    time_window=None
)

print(f"Detected {len(trend_patterns)} trend patterns")

# Analyze price evolution over time
price_evolution = pattern_detector.analyze_evolution(
    graph=kg,
    entity_type="Price",
    time_window=None
) if hasattr(pattern_detector, 'analyze_evolution') else []
print(f"Analyzed price evolution over time")

Detected 0 trend patterns
Analyzed price evolution over time


---

## Temporal Graph Queries

Query the temporal knowledge graph to analyze market trends over time and identify pricing patterns.


In [36]:
from semantica.kg import TemporalGraphQuery

temporal_query = TemporalGraphQuery(temporal_granularity=TEMPORAL_GRANULARITY)

# Query price trends over time using time range query
if all_entities:
    price_entities = [e for e in all_entities if getattr(e, "label", "") == "Price"]
    if price_entities:
        price_id = getattr(price_entities[0], "text", "")
        if price_id:
            history = temporal_query.query_time_range(
                graph=kg,
                query="price_history",
                start_time=None,
                end_time=None
            )
            print(f"Retrieved temporal history for price: {price_id}")

# Analyze evolution of prices over time
evolution = temporal_query.analyze_evolution(
    graph=kg,
    relationship="has_price",
    metrics=["count", "diversity"]
)
print(f"Analyzed price evolution over time")

Retrieved temporal history for price: price of $2
Analyzed price evolution over time
