[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/supply_chain/01_Supply_Chain_Data_Integration.ipynb)

# Supply Chain Data Integration - Multi-Source Ingestion & Relationship Mapping

## Overview

This notebook demonstrates **supply chain data integration** using Semantica with focus on **multi-source ingestion**, **relationship mapping**, and **logistics tracking**. The pipeline ingests logistics and supplier data from multiple sources to build a comprehensive supply chain knowledge graph with supplier network analysis.

### Key Features

- **Multi-Source Ingestion**: Ingests data from multiple logistics and supplier sources (RSS, web APIs, files)
- **Relationship Mapping**: Maps supplier relationships and logistics routes using relation extraction
- **Logistics Tracking**: Tracks products, routes, locations, and warehouses
- **Supplier Network Analysis**: Analyzes supplier centrality and community clusters
- **Seed Data Integration**: Uses supplier foundation data for entity resolution
- **KG Construction**: Builds comprehensive supply chain knowledge graphs

### Learning Objectives

- Understand how to ingest data from multiple sources for supply chain integration
- Learn to map complex supplier relationships and logistics routes
- Master supplier network analysis using centrality and community detection
- Explore relationship extraction for supply chain entities
- Practice multi-source data integration and deduplication
- Analyze supplier networks and logistics connections

### Pipeline Flow

```mermaid
graph TD
    A[Multi-Source Ingestion] --> B[Seed Data Loading]
    A --> C[Document Parsing]
    B --> D[Text Processing]
    C --> D
    D --> E[Entity Extraction]
    E --> F[Relationship Extraction]
    F --> G[Deduplication]
    G --> H[KG Construction]
    H --> I[Embedding Generation]
    I --> J[Vector Store]
    H --> K[Network Analysis]
    H --> L[Community Detection]
    J --> M[GraphRAG Queries]
    K --> N[Visualization]
    L --> N
    H --> O[Export]
```


---


In [1]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


Note: you may need to restart the kernel to use updated packages.




---

## Configuration & Setup

Configure API keys and set up constants for the supply chain data integration pipeline.


In [2]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "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")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200


---

## Multi-Source Data Ingestion

Ingest supply chain data from multiple sources including RSS feeds, web APIs, and local files. This section emphasizes multi-source ingestion capabilities.


In [6]:
from semantica.ingest import FeedIngestor, WebIngestor, FileIngestor
import os

os.makedirs("data", exist_ok=True)

documents = []

# Ingest from logistics RSS feeds
logistics_feeds = [
    ("Supply Chain Dive", "https://www.supplychaindive.com/rss"),
    ("Logistics Management", "https://www.logisticsmgmt.com/rss"),
    ("SCMR", "https://www.scmr.com/rss"),
    ("DC Velocity", "https://www.dcvelocity.com/rss"),
    ("Inbound Logistics", "https://www.inboundlogistics.com/rss"),
    ("Supply Chain Brain", "https://www.supplychainbrain.com/rss"),
    ("MHL News", "https://www.mhlnews.com/rss")
]

feed_ingestor = FeedIngestor()
print(f"Ingesting from {len(logistics_feeds)} RSS feed sources...")
for i, (feed_name, feed_url) in enumerate(logistics_feeds, 1):
    try:
        feed_data = feed_ingestor.ingest_feed(feed_url, validate=False)
        feed_count = 0
        for item in feed_data.items:
            if not item.content:
                item.content = item.description or item.title or ""
            if item.content:
                if not hasattr(item, 'metadata'):
                    item.metadata = {}
                item.metadata['source'] = feed_name
                documents.append(item)
                feed_count += 1
        if feed_count > 0:
            print(f"  [{i}/{len(logistics_feeds)}] {feed_name}: {feed_count} documents")
    except Exception as e:
        print(f"  [{i}/{len(logistics_feeds)}] {feed_name}: Failed - {str(e)[:50]}")

# Web ingestion from supply chain data sources
web_sources = [
    ("Supply Chain Dive News", "https://www.supplychaindive.com/news"),
    ("Logistics Management News", "https://www.logisticsmgmt.com/news"),
    ("SCMR Articles", "https://www.scmr.com/articles"),
    ("DC Velocity Articles", "https://www.dcvelocity.com/articles")
]

web_ingestor = WebIngestor(respect_robots=True, delay=1.0)
print(f"\nIngesting from {len(web_sources)} web sources...")
for i, (source_name, web_url) in enumerate(web_sources, 1):
    try:
        web_content = web_ingestor.ingest_url(web_url)
        if web_content.text:
            # Create a document-like object from WebContent
            class WebDoc:
                def __init__(self, content, title, url, source):
                    self.content = content
                    self.title = title
                    self.url = url
                    self.metadata = {'source': source}
            doc = WebDoc(web_content.text, web_content.title, web_content.url, source_name)
            documents.append(doc)
            print(f"  [{i}/{len(web_sources)}] {source_name}: 1 document")
    except Exception as e:
        print(f"  [{i}/{len(web_sources)}] {source_name}: Failed - {str(e)[:50]}")

print(f"\nIngested {len(documents)} documents from multiple sources")


Ingesting from 7 RSS feed sources...
üß† Semantica is ingesting: File: supply_chain.txt üîÑüì• (0.0s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.supplychaindive.com/rss ‚ùåüì• (1.6s)

Failed to fetch feed https://www.supplychaindive.com/rss: 404 Client Error: Not Found for url: https://www.supplychaindive.com/rss


  [1/7] Supply Chain Dive: Failed - Failed to fetch feed: 404 Client Error: Not Found 
üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.supplychaindive.com/rss ‚ùåüì• (1.6s) | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.logisticsmgmt.com/rss ‚ùåüì• (1.4s)

Failed to fetch feed https://www.logisticsmgmt.com/rss: 403 Client Error: Forbidden for url: https://www.logisticsmgmt.com/rss


  [2/7] Logistics Management: Failed - Failed to fetch feed: 403 Client Error: Forbidden 
üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.logisticsmgmt.com/rss ‚ùåüì• (1.4s) | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.scmr.com/rss ‚ùåüì• (1.3s)

Failed to fetch feed https://www.scmr.com/rss: 403 Client Error: Forbidden for url: https://www.scmr.com/rss


  [3/7] SCMR: Failed - Failed to fetch feed: 403 Client Error: Forbidden 
üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.scmr.com/rss ‚ùåüì• (1.3s) | üß† Semantica is ingesting: Ingested 30 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [4/7] DC Velocity: 30 documents
üß† Semantica is ingesting: Ingested 30 items |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.inboundlogistics.com/rss ‚ùåüì• (0.9s)

Failed to fetch feed https://www.inboundlogistics.com/rss: 403 Client Error: Forbidden for url: https://www.inboundlogistics.com/rss


  [5/7] Inbound Logistics: Failed - Failed to fetch feed: 403 Client Error: Forbidden 


Failed to parse feed: not well-formed (invalid token): line 53, column 49


üß† Semantica is ingesting: 403 Client Error: Forbidden for url: https://www.inboundlogistics.com/rss ‚ùåüì• (0.9s) | üß† Semantica is ingesting: Failed to parse feed: not well-formed (invalid token): line 53, column 49 ‚ùåüì• (1.7s)  [6/7] Supply Chain Brain: Failed - Failed to parse feed: not well-formed (invalid tok
üß† Semantica is ingesting: Failed to parse feed: not well-formed (invalid token): line 53, column 49 ‚ùåüì• (1.7s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.mhlnews.com/rss ‚ùåüì• (1.1s)

Failed to fetch feed https://www.mhlnews.com/rss: 404 Client Error: Not Found for url: https://www.mhlnews.com/rss


  [7/7] MHL News: Failed - Failed to fetch feed: 404 Client Error: Not Found 

Ingesting from 4 web sources...


URL https://www.supplychaindive.com/news blocked by robots.txt


üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.mhlnews.com/rss ‚ùåüì• (1.1s) | üß† Semantica is ingesting: URL blocked by robots.txt: https://www.supplychaindive.com/news ‚ùåüì• (0.2s)  [1/4] Supply Chain Dive News: Failed - URL blocked by robots.txt: https://www.supplychain
üß† Semantica is ingesting: URL blocked by robots.txt: https://www.supplychaindive.com/news ‚ùåüì• (0.2s) | üß† Semantica is ingesting: Ingested https://www.logisticsmgmt.com/news (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì•  [2/4] Logistics Management News: 1 document
üß† Semantica is ingesting: Ingested https://www.logisticsmgmt.com/news (200) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüì• | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.scmr.com/articles ‚ùåüì• (3.6s)

Failed to fetch URL https://www.scmr.com/articles: 404 Client Error: Not Found for url: https://www.scmr.com/articles


  [3/4] SCMR Articles: Failed - Failed to fetch URL: 404 Client Error: Not Found f
üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.scmr.com/articles ‚ùåüì• (3.6s) | üß† Semantica is ingesting: 404 Client Error: Not Found for url: https://www.dcvelocity.com/articles ‚ùåüì• (1.5s)

Failed to fetch URL https://www.dcvelocity.com/articles: 404 Client Error: Not Found for url: https://www.dcvelocity.com/articles


  [4/4] DC Velocity Articles: Failed - Failed to fetch URL: 404 Client Error: Not Found f

Ingested 31 documents from multiple sources


In [8]:
from semantica.seed import SeedDataManager

seed_manager = SeedDataManager()

# Load supplier foundation seed data
supplier_foundation = {
    "suppliers": ["Supplier A", "Supplier B", "Supplier C", "Global Suppliers Inc"],
    "warehouses": ["Warehouse W1", "Warehouse W2", "Warehouse W3"],
    "locations": ["City C1", "City C2", "Region R1", "Region R2"],
    "products": ["Product X", "Product Y", "Product Z"],
    "routes": ["Route R1", "Route R2", "Route R3"]
}

# Convert dictionary to entity records
entity_records = []
for entity_type, entity_names in supplier_foundation.items():
    for name in entity_names:
        entity_records.append({
            "id": name.replace(" ", "_").lower(),
            "text": name,
            "name": name,
            "entity_type": entity_type.rstrip("s").capitalize(),  # Remove plural and capitalize
            "type": entity_type.rstrip("s").capitalize(),
            "source": "supplier_foundation",
            "verified": True
        })

# Add entities to seed data
seed_manager.seed_data.entities = entity_records

print(f"Loaded seed data with {len(entity_records)} entities")
print(f"Entity types: {set(e['type'] for e in entity_records)}")


Loaded seed data with 17 entities
Entity types: {'Supplier', 'Warehouse', 'Product', 'Location', 'Route'}


---

## Text Processing

Normalize supply chain data and split documents using entity-aware chunking to preserve supplier names and relationships.


In [9]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
normalized_docs = []

for doc in documents:
    try:
        doc_content = doc.content if hasattr(doc, 'content') else str(doc)
        normalized = normalizer.normalize(
            doc_content,
            clean_html=True,
            normalize_entities=True,
            normalize_numbers=True,
            remove_extra_whitespace=True
        )
        normalized_docs.append(normalized)
    except Exception:
        normalized_docs.append(doc.content if hasattr(doc, 'content') else str(doc))

# Use entity-aware chunking to preserve supplier names and relationships
entity_splitter = TextSplitter(
    method="entity_aware",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

chunked_docs = []
for doc_text in normalized_docs:
    try:
        chunks = entity_splitter.split(doc_text)
        chunked_docs.extend([chunk.content if hasattr(chunk, 'content') else str(chunk) for chunk in chunks])
    except Exception:
        chunked_docs.append(doc_text)

print(f"Processed {len(chunked_docs)} entity-aware chunks")


üß† Semantica is normalizing |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîß | üß† Semantica is extracting: Extracted 504 entities using ml |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîßProcessed 115 entity-aware chunks


In [11]:
from semantica.semantic_extract import NERExtractor

# Use ML-based approach (spaCy) for entity extraction
extractor = NERExtractor(
    method="ml",
    model="en_core_web_sm"
)

entity_types = [
    "Supplier", "Product", "Route", "Location", "Logistics", "Warehouse", "Makinson"
]

all_entities = []
chunks_to_process = chunked_docs  # Process all chunks
print(f"Extracting entities from {len(chunks_to_process)} chunks using ML-based approach...")
for i, chunk in enumerate(chunks_to_process, 1):
    try:
        entities = extractor.extract(
            chunk,
            entity_types=entity_types
        )
        all_entities.extend(entities)
    except Exception as e:
        print(f"  Error processing chunk {i}: {str(e)[:50]}")
        pass
    
    if i % 5 == 0 or i == len(chunks_to_process):
        print(f"  Processed {i}/{len(chunks_to_process)} chunks ({len(all_entities)} entities found)")

print(f"Extracted {len(all_entities)} entities")


Extracting entities from 115 chunks using ML-based approach...
üß† Semantica is normalizing |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîß | üß† Semantica is extracting: Extracted 0 entities |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ  Processed 5/115 chunks (0 entities found)
üß† Semantica is normalizing |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîß | üß† Semantica is extracting: Extracted 0 entities |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ  Processed 10/115 chunks (0 entities found)
üß† Semantica is normalizing |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîß | üß† Semantica is extracting: Extracted 1 entities using ml |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ  Processed 15/115 chunks (5 entities found)
üß† Semantica is normalizing |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüîß | üß† Semantica is extracting: Extracted 0 entities |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñ

---

## Relationship Extraction

Extract supply chain relationships with unique focus on supplier relationships including provides, located_in, connects, ships_via, and manages.


In [12]:
from semantica.semantic_extract import RelationExtractor, NERExtractor

# Use ML-based approach (dependency parsing with spaCy) for relation extraction
relation_extractor = RelationExtractor(
    method="dependency",  # ML-based dependency parsing
    model="en_core_web_sm"
)

# Create NER extractor once for efficiency
ner = NERExtractor(method="ml", model="en_core_web_sm")

relation_types = [
    "provides", "located_in", "connects",
    "ships_via", "manages"
]

all_relationships = []
chunks_to_process = chunked_docs  # Process all chunks
print(f"Extracting relationships from {len(chunks_to_process)} chunks using ML-based approach...")
for i, chunk in enumerate(chunks_to_process, 1):
    try:
        # Extract entities from chunk first (dependency parsing needs entities)
        chunk_entities = ner.extract(chunk)
        
        # Extract relationships using dependency parsing
        relationships = relation_extractor.extract(
            chunk,
            entities=chunk_entities,
            relation_types=relation_types
        )
        all_relationships.extend(relationships)
    except Exception as e:
        print(f"  Error processing chunk {i}: {str(e)[:50]}")
        pass
    
    if i % 5 == 0 or i == len(chunks_to_process):
        print(f"  Processed {i}/{len(chunks_to_process)} chunks ({len(all_relationships)} relationships found)")

print(f"Extracted {len(all_relationships)} relationships")


Extracting relationships from 115 chunks using ML-based approach...
üß† Semantica is extracting: Extracted 15 entities using ml |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ | üß† Semantica is extracting: Extracted 0 relations |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ  Processed 5/115 chunks (1 relationships found)
üß† Semantica is extracting: Extracted 50 entities using ml |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ | üß† Semantica is extracting: Extracted 0 relations |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØÔøΩ  Processed 10/115 chunks (3 relationships found)
üß† Semantica is extracting: Extracted 51 entities using ml |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ | üß† Semantica is extracting: Extracted 0 relations |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ  Processed 15/11

## Conflict Detection

Detect and resolve conflicts in supply chain data from multiple sources. Supply chain sources have different credibility levels.


In [17]:
from semantica.conflicts import ConflictDetector, ConflictResolver

detector = ConflictDetector()
resolver = ConflictResolver()

entity_dicts = [{"id": e.text, "text": e.text, "type": e.label, "confidence": e.confidence, "metadata": e.metadata} for e in all_entities]
relationship_dicts = [{"id": f"{r.subject.text}_{r.predicate}_{r.object.text}", "source_id": r.subject.text, "target_id": r.object.text, "type": r.predicate, "confidence": r.confidence, "metadata": r.metadata} for r in all_relationships] if all_relationships else []

conflicts = detector.detect_entity_conflicts(entity_dicts)
if relationship_dicts:
    conflicts.extend(detector.detect_relationship_conflicts(relationship_dicts))

print(f"Detected {len(conflicts)} conflicts")
if conflicts:
    resolver.resolve_conflicts(conflicts, strategy="credibility_weighted")


üß† Semantica is extracting: Extracted 0 relations |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüéØ | üß† Semantica is resolving: Detected 0 relationship conflicts |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [64/68] ‚úÖ‚ö†Ô∏è (192.2/s)/68] üîÑ‚ö†Ô∏è (ETA: 0.0s | 197.3/s))20.7/s))Detected 0 conflicts


---

## Deduplication

Deduplicate supplier entities using seed data for resolution to ensure accurate supply chain mapping.


In [20]:
from semantica.kg import EntityResolver
from semantica.semantic_extract import Entity

entity_dicts = [{"name": e.text, "type": e.label, "confidence": e.confidence} for e in all_entities]
resolved = EntityResolver(strategy="fuzzy", similarity_threshold=0.85).resolve_entities(entity_dicts)
all_entities = [Entity(text=e["name"], label=e["type"], start_char=0, end_char=0, confidence=e.get("confidence", 1.0)) for e in resolved]
print(f"Deduplicated {len(entity_dicts)} entities to {len(all_entities)} unique entities")


üß† Semantica is deduplicating: Merging groups... 1/1 (remaining: 0) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [1/1] üîÑüîÑ (0.5/s) | üß† Semantica is deduplicating: Building merged entity... (4/4, remaining: 0 steps) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] üîÑüîÑ (221.2/s)) üîÑüîÑ (ETA: 0.0s | 233.7/s)333.9/s)Deduplicated 89 entities to 15 unique entities


---

## Knowledge Graph Construction

Build a knowledge graph from supply chain entities and relationships to enable network analysis.


In [22]:
from semantica.kg import GraphBuilder

kg = GraphBuilder().build({"entities": all_entities, "relationships": all_relationships})
print(f"Built KG with {len(kg.get('entities', []))} entities and {len(kg.get('relationships', []))} relationships")


üß† Semantica is deduplicating: Building merged entity... (4/4, remaining: 0 steps) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] üîÑüîÑ (221.2/s) | üß† Semantica is building: Processing relationships... 72/72 |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [72/72] üîÑüß† (23237.1/s)Building graph structure...
‚úÖ Graph structure built (0.00s)
üß† Semantica is deduplicating: Building merged entity... (4/4, remaining: 0 steps) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [4/4] üîÑüîÑ (221.2/s) | üß† Semantica is building: Processing relationships... 72/72 |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [72/72] üîÑüß† (23237.1/s)
‚úÖ Knowledge Graph Build Complete
   Entities: 15
   Relationships: 72
   Total time: 0.48s
Built KG with 15 entities and 72 relationships


---

## Embedding Generation & Vector Store

Generate embeddings for supply chain documents and store them in a vector database for semantic search.


In [25]:
from semantica.embeddings import EmbeddingGenerator
from semantica.vector_store import VectorStore

gen = EmbeddingGenerator(model_name=EMBEDDING_MODEL, dimension=EMBEDDING_DIMENSION)
embeddings = gen.generate_embeddings(chunked_docs, data_type="text")

vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)
metadata = [{"text": chunk[:100]} for chunk in chunked_docs]
vector_store.store_vectors(vectors=embeddings, metadata=metadata)

print(f"Generated {len(embeddings)} embeddings and stored in vector database")


fastembed not available. Install with: pip install fastembed. Using fallback embedding method.
fastembed not available. Install with: pip install fastembed. Using fallback embedding method.


üß† Semantica is building: Processing relationships... 72/72 |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% [72/72] üîÑüß† (23237.1/s) | üß† Semantica is indexing: Storing 115 vectors üîÑüìä (0.0s)Generated 115 embeddings and stored in vector database


---

## Supplier Network Analysis

Analyze supplier network structure using centrality measures. This is unique to this notebook and critical for understanding supplier importance in the network.


In [28]:
from semantica.kg import CentralityCalculator

calc = CentralityCalculator()
degree_centrality = calc.calculate_degree_centrality(kg)
betweenness_centrality = calc.calculate_betweenness_centrality(kg)
print(f"Degree centrality: {len(degree_centrality.get('centrality', {}))} nodes")
print(f"Betweenness centrality: {len(betweenness_centrality.get('centrality', {}))} nodes")


üß† Semantica is indexing: Storing 115 vectors üîÑüìä (0.0s) | üß† Semantica is building: Calculating degree centrality üîÑüß† (0.0s)Degree centrality: 104 nodes
Betweenness centrality: 104 nodes


---

## Supplier Community Detection

Detect supplier communities and clusters in the supply chain network. This is unique to this notebook and helps identify supplier groups.


In [30]:
from semantica.kg import CommunityDetector

detector = CommunityDetector()
communities = detector.detect_communities(kg, algorithm="louvain")
overlapping = detector.detect_overlapping_communities(kg)
print(f"Detected {len(communities.get('communities', []))} communities and {len(overlapping.get('communities', []))} overlapping communities")


üß† Semantica is building: Calculating degree centrality üîÑüß† (0.0s) | üß† Semantica is building: Detecting communities using Louvain algorithm üîÑüß† (0.0s)Detected 37 communities and 0 overlapping communities


---

## GraphRAG Queries

Use hybrid retrieval combining vector search and graph traversal to answer complex supply chain questions.


In [32]:
from semantica.context import AgentContext, ContextGraph, ContextRetriever
from semantica.llms import Groq
import os

context_graph = ContextGraph()
context_graph.build_from_entities_and_relationships(
    entities=kg.get('entities', []),
    relationships=kg.get('relationships', [])
)

retriever = ContextRetriever(vector_store=vector_store, knowledge_graph=context_graph, hybrid_alpha=0.6, max_expansion_hops=2)
context = AgentContext(vector_store=vector_store, knowledge_graph=context_graph, use_graph_expansion=True, max_expansion_hops=2)

llm = Groq(model="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))

queries = [
    "Which suppliers provide products to Warehouse W1?",
    "What routes connect warehouses to distribution centers?",
    "Where is Supplier A located?",
    "What products are shipped via Route R1?"
]

for query in queries:
    result = context.query_with_reasoning(query, llm_provider=llm, max_results=10, max_hops=2)
    print(f"Query: {query}")
    print(f"Response: {result.get('response', 'No response generated')}")
    print(f"Confidence: {result.get('confidence', 0):.3f} | Sources: {result.get('num_sources', 0)} | Paths: {result.get('num_reasoning_paths', 0)}\n")


üß† Semantica is embedding: Text cannot be empty or whitespace-only ‚ùåüíæ (0.0s) | üß† Semantica is processing: Querying with reasoning: Which suppliers provide products to Warehouse W1?... üîÑüîó (0.0s)

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Query: Which suppliers provide products to Warehouse W1?
Response: Based on the retrieved context, I found that Warehouse W1 receives products from two suppliers: Supplier S1 and Supplier S2.

Reasoning Path:

- Context 1: Warehouse W1 is connected to Supplier S1 through a "ships_to" relationship (Score: 0.50).
  - Entity: Warehouse W1
  - Relationship: ships_to
  - Entity: Supplier S1
- Context 1: Warehouse W1 is also connected to Supplier S2 through a "ships_to" relationship (Score: 0.50).
  - Entity: Warehouse W1
  - Relationship: ships_to
  - Entity: Supplier S2

Multi-hop connections: There are no multi-hop connections in this reasoning path as the relationships are direct.

Therefore, the suppliers that provide products to Warehouse W1 are Supplier S1 and Supplier S2.
Confidence: 0.400 | Sources: 1 | Paths: 0

üß† Semantica is embedding: Text cannot be empty or whitespace-only ‚ùåüíæ (0.0s) | üß† Semantica is processing: Querying with reasoning: What routes connect warehouses 

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Query: What routes connect warehouses to distribution centers?
Response: Based on the retrieved context, I found a possible route that connects warehouses to distribution centers.

The route involves the following entities and relationships:

1. Warehouses are connected to Transportation Hubs (Score: 0.40) via the "serves" relationship.
2. Transportation Hubs are connected to Distribution Centers (Score: 0.60) via the "serves" relationship.

Therefore, the route that connects warehouses to distribution centers is:

Warehouses ‚Üí Transportation Hubs (serves) ‚Üí Distribution Centers (serves)

This multi-hop connection involves two relationships: "serves" between warehouses and transportation hubs, and "serves" between transportation hubs and distribution centers.

Note that the scores indicate the confidence level of each connection, with higher scores indicating more reliable information.
Confidence: 0.400 | Sources: 1 | Paths: 0

üß† Semantica is embedding: Generated embedding (dim:

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Query: Where is Supplier A located?
Response: Based on the retrieved context, I was unable to determine the exact location of Supplier A. However, I can provide some information that might be related.

Context 1 has a score of 0.50, but it does not provide any information about Supplier A's location. Context 2 mentions Latin America, but there is no direct connection to Supplier A. Context 3 contains a code (end_char=13714), which does not seem to be related to a location. Context 4 mentions Asia, but again, there is no direct connection to Supplier A. Context 5 mentions the Mediterranean, but it is not clear how this is related to Supplier A.

Unfortunately, without more specific information, I am unable to provide a precise answer to the question of where Supplier A is located.
Confidence: 0.255 | Sources: 10 | Paths: 0

üß† Semantica is embedding: Generated embedding (dim: 128) |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüíæ | üß† Semantica is processing: Querying w

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Query: What products are shipped via Route R1?
Response: Based on the retrieved context, I found the following information:

The context mentions Route R1, but it does not explicitly state which products are shipped via this route. However, it does mention a relationship between Route R1 and a warehouse (Entity: Warehouse W1), indicating that Route R1 is used for shipping products from Warehouse W1.

To answer the question, I need to make a multi-hop connection between Route R1, Warehouse W1, and the products shipped from Warehouse W1.

Reasoning Path:

1. Route R1 is associated with Warehouse W1 (Relationship: "serves").
2. Warehouse W1 ships products (Relationship: "ships").
3. The products shipped by Warehouse W1 are Electronics and Home Appliances (Relationship: "carries").

Based on this reasoning path, I can conclude that the products shipped via Route R1 are Electronics and Home Appliances.

Answer: The products shipped via Route R1 are Electronics and Home Appliances.
Confidenc

---

## Visualization

Visualize the supply chain knowledge graph to explore supplier relationships, logistics routes, and network structure.


In [33]:
from semantica.visualization import KGVisualizer

viz = KGVisualizer(layout="force")
fig = viz.visualize_network(kg, output="interactive")
fig.show() if fig else None


  warn(


üß† Semantica is processing: Querying with reasoning: What products are shipped via Route R1?... üîÑüîó (0.0s) | üß† Semantica is visualizing: Visualization generated: 15 nodes, 0 edges |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüìà

---

## Export

Export the knowledge graph in multiple formats for supply chain analysis and reporting.


In [34]:
from semantica.export import GraphExporter

exporter = GraphExporter()
exporter.export(kg, format="json", output_path="supply_chain_kg.json")
exporter.export(kg, format="graphml", output_path="supply_chain_kg.graphml")
print("Exported knowledge graph in JSON, GraphML, formats")


üß† Semantica is visualizing: Visualization generated: 15 nodes, 0 edges |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ‚úÖüìà | üß† Semantica is exporting: Exporting graph to json: supply_chain_kg.json üîÑüíæ (0.0s)Exported knowledge graph in JSON, GraphML, formats
