[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/supply_chain/01_Supply_Chain_Data_Integration.ipynb)

# Supply Chain Data Integration - Multi-Source Ingestion & KG Construction

## Overview

This notebook demonstrates **supply chain data integration** using Semantica with focus on **multi-source ingestion**, **relationship mapping**, and **logistics tracking**. The pipeline ingests logistics and supplier data to build a comprehensive supply chain knowledge graph.

### Key Features

- **Multi-Source Ingestion**: Ingests data from multiple logistics and supplier sources
- **Relationship Mapping**: Maps supplier relationships and logistics routes
- **Logistics Tracking**: Tracks products, routes, and locations
- **KG Construction**: Emphasizes ingestion and KG construction for supply chain
- **Supplier Relationship Mapping**: Maps complex supplier networks

### Pipeline Architecture

1. **Phase 0**: Setup & Configuration
2. **Phase 1**: Multi-Source Data Ingestion (Logistics, Suppliers)
3. **Phase 2**: Entity Extraction (Supplier, Product, Route, Location, Logistics)
4. **Phase 3**: Supply Chain Knowledge Graph Construction
5. **Phase 4**: Relationship Mapping
6. **Phase 5**: Logistics Tracking
7. **Phase 6**: Visualization & Export

---

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas groq


---

## Phase 0: Setup & Configuration


In [None]:
import os
from semantica.core import Semantica, ConfigManager

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key")

config_dict = {
    "project_name": "Supply_Chain_Data_Integration",
    "extraction": {"provider": "groq", "model": "llama-3.1-8b-instant"},
    "knowledge_graph": {"backend": "networkx"}
}

config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
print("Configured for supply chain data integration with multi-source ingestion focus")


---

## Phase 1: Real Data Ingestion (Logistics RSS Feeds)

Ingest supply chain data from logistics RSS feeds.


In [None]:
from semantica.ingest import FeedIngestor, FileIngestor
from semantica.normalize import TextNormalizer
from semantica.deduplication import DuplicateDetector
import os

os.makedirs("data", exist_ok=True)

# Ingest from logistics RSS feeds
logistics_feeds = [
    # Add logistics news RSS feeds here
]

documents = []
for feed_url in logistics_feeds:
    try:
        feed_ingestor = FeedIngestor()
        feed_documents = feed_ingestor.ingest(feed_url, method="rss")
        documents.extend(feed_documents)
    except Exception as e:
        print(f"Feed ingestion failed: {e}")

# Fallback: Sample data
if not documents:
    supply_data = """
    Supplier A provides Product X to Warehouse W1 located in City C1.
    Supplier B provides Product Y to Warehouse W2 located in City C2.
    Route R1 connects Warehouse W1 to Distribution Center D1.
    Logistics: Product X shipped via Route R1 from W1 to D1.
    """
    with open("data/supply_chain.txt", "w") as f:
        f.write(supply_data)
    documents = FileIngestor().ingest("data/supply_chain.txt")
    print(f"Ingested {len(documents)} documents from sample data")

# Normalize supplier names
normalizer = TextNormalizer()
normalized_documents = []
for doc in documents:
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        remove_extra_whitespace=True
    )
    normalized_documents.append(normalized_text)

print(f"Normalized {len(normalized_documents)} documents")

# Build supply chain knowledge graph
result = core.build_knowledge_base(
    sources=normalized_documents,
    custom_entity_types=["Supplier", "Product", "Route", "Location", "Logistics"],
    graph=True
)

kg = result["knowledge_graph"]
entities = result["entities"]

# Deduplicate suppliers
suppliers = [e for e in entities if e.get("type") == "Supplier" or "supplier" in e.get("type", "").lower()]
detector = DuplicateDetector()
duplicates = detector.detect_duplicates(suppliers, threshold=0.9)
deduplicated_suppliers = detector.resolve_duplicates(suppliers, duplicates)

print(f"Built supply chain KG with {len(kg.get('entities', []))} entities")
print(f"Deduplicated: {len(suppliers)} -> {len(deduplicated_suppliers)} unique suppliers")
print("Focus: Multi-source ingestion, relationship mapping, logistics tracking, KG construction")


In [None]:
# Map supplier relationships
supplier_relations = [r for r in kg.get("relationships", []) 
                     if "supplier" in str(r.get("predicate", "")).lower() or
                     "provides" in str(r.get("predicate", "")).lower()]

# Track logistics routes
logistics_routes = [e for e in kg.get("entities", []) 
                    if e.get("type") == "Route" or e.get("type") == "Logistics"]

print(f"Relationship mapping: {len(supplier_relations)} supplier relationships mapped")
print(f"Logistics tracking: {len(logistics_routes)} routes tracked")
print("This cookbook emphasizes multi-source ingestion and relationship mapping")


---

## Phase 6: Visualization


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(kg, output_path="supply_chain_kg.html")

print("Supply chain data integration complete")
print("Emphasizes: Multi-source ingestion, relationship mapping, logistics tracking")
