[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/healthcare/02_Drug_Interactions_Analysis.ipynb)

# Drug Interactions Analysis - Ontology & Reasoning

## Overview

This notebook demonstrates **drug interactions analysis** using Semantica with focus on **multi-source correlation**, **safety ontology**, and **interaction detection**. The pipeline analyzes FDA databases, medical literature, and drug interaction sources to detect drug-drug and drug-condition interactions using ontology-based reasoning and temporal pattern detection.

### Key Features

- **Ontology Generation**: Creates safety ontologies for drug interactions
- **Multi-Source Correlation**: Correlates data from FDA databases and medical literature
- **Interaction Detection**: Detects drug-drug and drug-condition interactions
- **Safety Ontology**: Uses domain ontologies for safety analysis
- **Reasoning**: Emphasizes ontology and reasoning for interaction prediction
- **Conflict Resolution**: Resolves conflicting interaction reports from multiple sources
- **Temporal Pattern Detection**: Identifies interaction patterns over time

### Learning Objectives

- Understand how to generate safety ontologies from knowledge graphs
- Learn to detect and resolve conflicts in drug interaction data
- Master temporal pattern detection for interaction analysis
- Explore reasoning-based interaction inference
- Practice multi-source correlation and data integration
- Analyze drug network structures and community detection

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Conflict Detection]
    G --> H[KG Construction]
    H --> I[Embedding Generation]
    I --> J[Vector Store]
    H --> K[Ontology Generation]
    H --> L[Graph Analytics]
    H --> M[Temporal Patterns]
    K --> N[Reasoning]
    M --> N
    J --> O[GraphRAG Queries]
    L --> P[Visualization]
    N --> P
    K --> Q[Export RDF/TTL]
```



In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn rdflib


---

## Configuration & Setup

Configure API keys and set up constants for the drug interactions analysis pipeline, including ontology base URI.


In [None]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key-here")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200
ONTOLOGY_BASE_URI = "https://drug-safety.example.org/ontology/"
TEMPORAL_GRANULARITY = "day"  # For interaction pattern detection


---

## Data Ingestion

Ingest drug interaction data from multiple sources including FDA RSS feeds, PubMed, medical literature, and drug interaction databases.


In [None]:
from semantica.ingest import FeedIngestor, WebIngestor, FileIngestor
from contextlib import redirect_stderr
from io import StringIO
import os

os.makedirs("data", exist_ok=True)

documents = []

# Ingest from FDA drug safety RSS feeds
fda_feeds = [
    "https://www.fda.gov/about-fda/contact-fda/stay-informed/rss-feeds/fda-drug-safety-communications",
    "https://www.fda.gov/about-fda/contact-fda/stay-informed/rss-feeds/fda-press-releases"
]

for feed_url in fda_feeds:
    try:
        with redirect_stderr(StringIO()):
            feed_ingestor = FeedIngestor()
            feed_docs = feed_ingestor.ingest(feed_url, method="rss")
            documents.extend(feed_docs)
    except Exception:
        pass

# Ingest from PubMed RSS (drug interactions)
pubmed_feeds = [
    "https://pubmed.ncbi.nlm.nih.gov/rss/search/1?term=drug+interaction&limit=10",
    "https://pubmed.ncbi.nlm.nih.gov/rss/search/1?term=drug+safety&limit=10",
    "https://pubmed.ncbi.nlm.nih.gov/rss/search/1?term=adverse+drug+reaction&limit=10"
]

for feed_url in pubmed_feeds:
    try:
        with redirect_stderr(StringIO()):
            feed_ingestor = FeedIngestor()
            feed_docs = feed_ingestor.ingest(feed_url, method="rss")
            documents.extend(feed_docs)
    except Exception:
        pass

# Example: Web ingestion from DrugBank API (commented - requires API key)
# web_ingestor = WebIngestor()
# drugbank_docs = web_ingestor.ingest("https://go.drugbank.com/releases/latest", method="api")

# Fallback: Sample drug interaction data
if not documents:
    drug_data = """
    Warfarin interacts with Aspirin, increasing bleeding risk. Severity: Major.
    Metformin should not be used with patients having kidney disease. Contraindication: Severe renal impairment.
    Ibuprofen can interact with ACE inhibitors, reducing effectiveness. Severity: Moderate.
    Statins may interact with grapefruit juice, increasing side effects. Mechanism: CYP3A4 inhibition.
    Digoxin interacts with Amiodarone, increasing toxicity risk. Severity: Major.
    Aspirin and Warfarin together increase bleeding risk significantly.
    Metformin contraindicated in patients with creatinine clearance < 30 mL/min.
    """
    with open("data/drug_interactions.txt", "w", encoding="utf-8") as f:
        f.write(drug_data)
    file_ingestor = FileIngestor()
    documents = file_ingestor.ingest("data/drug_interactions.txt")

print(f"Ingested {len(documents)} documents")


In [None]:
from semantica.parse import DocumentParser
from contextlib import redirect_stderr
from io import StringIO

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        with redirect_stderr(StringIO()):
            parsed = parser.parse(
                doc.content if hasattr(doc, 'content') else str(doc),
                format="auto"
            )
            parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc.content if hasattr(doc, 'content') else str(doc))
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

print(f"Parsed {len(parsed_documents)} documents")


---

## Text Processing

Normalize drug names and use relation-aware chunking to preserve drug interaction triplets. This is critical for maintaining interaction relationships.


In [None]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter
from contextlib import redirect_stderr
from io import StringIO

normalizer = TextNormalizer()
print(f"Normalizing {len(parsed_documents)} documents...")
normalized_docs = []

for i, doc in enumerate(parsed_documents, 1):
    try:
        with redirect_stderr(StringIO()):
            normalized = normalizer.normalize(
                doc if isinstance(doc, str) else str(doc),
                clean_html=True,
                normalize_entities=True,
                remove_extra_whitespace=True
            )
            normalized_docs.append(normalized)
    except Exception:
        normalized_docs.append(doc if isinstance(doc, str) else str(doc))
    if i % 50 == 0 or i == len(parsed_documents):
        print(f"  Normalized {i}/{len(parsed_documents)} documents...")

# Use relation-aware chunking to preserve drug interaction triplets
relation_splitter = TextSplitter(
    method="relation_aware",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

print(f"Chunking {len(normalized_docs)} documents...")
chunked_docs = []
for i, doc_text in enumerate(normalized_docs, 1):
    try:
        with redirect_stderr(StringIO()):
            chunks = relation_splitter.split(doc_text)
            chunked_docs.extend([chunk.content if hasattr(chunk, 'content') else str(chunk) for chunk in chunks])
    except Exception:
        chunked_docs.append(doc_text)
    if i % 50 == 0 or i == len(normalized_docs):
        print(f"  Chunked {i}/{len(normalized_docs)} documents ({len(chunked_docs)} chunks so far)")

print(f"Created {len(chunked_docs)} chunks from {len(normalized_docs)} documents")


---

## Entity Extraction

Extract drug interaction entities including drugs, interactions, conditions, side effects, contraindications, and mechanisms.


In [None]:
from semantica.semantic_extract import NERExtractor
from contextlib import redirect_stderr
from io import StringIO

extractor = NERExtractor(
    provider="groq",
    model="llama-3.1-8b-instant"
)

entity_types = [
    "Drug", "Interaction", "Condition", "SideEffect",
    "Contraindication", "Mechanism"
]

all_entities = []
chunks_to_process = chunked_docs[:10]  # Limit for demo
print(f"Extracting entities from {len(chunks_to_process)} chunks...")
for i, chunk in enumerate(chunks_to_process, 1):
    try:
        with redirect_stderr(StringIO()):
            entities = extractor.extract(
                chunk,
                entity_types=entity_types
            )
            all_entities.extend(entities)
    except Exception:
        pass
    
    if i % 5 == 0 or i == len(chunks_to_process):
        print(f"  Processed {i}/{len(chunks_to_process)} chunks ({len(all_entities)} entities found)")

print(f"Extracted {len(all_entities)} entities")


---

## Relationship Extraction

Extract drug interaction relationships including interacts_with, causes, increases_risk, contraindicated_with, has_mechanism, and affects.


In [None]:
from semantica.semantic_extract import RelationExtractor
from contextlib import redirect_stderr
from io import StringIO

relation_extractor = RelationExtractor(
    provider="groq",
    model="llama-3.1-8b-instant"
)

relation_types = [
    "interacts_with", "causes", "increases_risk",
    "contraindicated_with", "has_mechanism", "affects"
]

all_relationships = []
chunks_to_process = chunked_docs[:10]  # Limit for demo
print(f"Extracting relationships from {len(chunks_to_process)} chunks...")
for i, chunk in enumerate(chunks_to_process, 1):
    try:
        with redirect_stderr(StringIO()):
            relationships = relation_extractor.extract(
                chunk,
                relation_types=relation_types
            )
            all_relationships.extend(relationships)
    except Exception:
        pass
    
    if i % 5 == 0 or i == len(chunks_to_process):
        print(f"  Processed {i}/{len(chunks_to_process)} chunks ({len(all_relationships)} relationships found)")

print(f"Extracted {len(all_relationships)} relationships")


---

## Deduplication

Deduplicate drug entities and interaction records to ensure data consistency.


In [None]:
from semantica.kg import EntityResolver
from semantica.semantic_extract import Entity

# Convert Entity objects to dictionaries for EntityResolver
print(f"Converting {len(all_entities)} entities to dictionaries...")
entity_dicts = [{"name": e.get("name", e.get("text", "")), "type": e.get("type", ""), "confidence": e.get("confidence", 1.0)} for e in all_entities]

# Use semantic strategy for drug names (handles synonyms and variations)
# Semantic matching is essential for drug names which may have multiple representations
entity_resolver = EntityResolver(strategy="semantic", similarity_threshold=0.85)

print(f"Resolving duplicates in {len(entity_dicts)} entities using semantic matching...")
resolved_entities = entity_resolver.resolve_entities(entity_dicts)

# Convert back to Entity objects
print(f"Converting {len(resolved_entities)} resolved entities back to Entity objects...")
merged_entities = [
    Entity(text=e["name"], label=e["type"], confidence=e.get("confidence", 1.0))
    if isinstance(e, dict) else e
    for e in resolved_entities
]

all_entities = merged_entities
print(f"Deduplicated {len(entity_dicts)} entities to {len(merged_entities)} unique entities")


---

## Conflict Detection

Detect and resolve conflicts in drug interaction data from multiple sources. This is unique to this notebook and critical for multi-source correlation.


In [None]:
from semantica.conflicts import ConflictDetector, ConflictResolver
from contextlib import redirect_stderr
from io import StringIO

# Use relationship conflict detection for drug interaction disagreements
# voting strategy aggregates multiple sources for interaction data
conflict_detector = ConflictDetector()
conflict_resolver = ConflictResolver()

try:
    with redirect_stderr(StringIO()):
        print(f"Detecting relationship conflicts in drug interaction data...")
        # Detect relationship conflicts (multiple sources may report different interactions)
        conflicts = conflict_detector.detect_conflicts(
            entities=all_entities,
            relationships=all_relationships,
            method="relationship"  # Focus on relationship conflicts
        )
        
        print(f"Detected {len(conflicts)} relationship conflicts in drug interaction data")
        
        # Resolve conflicts using voting strategy (aggregate multiple sources)
        if conflicts:
            print(f"Resolving conflicts using voting strategy...")
            resolved = conflict_resolver.resolve_conflicts(
                conflicts,
                strategy="voting"  # Majority vote from multiple sources
            )
            print(f"Resolved {len(resolved)} conflicts")
            
            # Update relationships with resolved conflicts
            for conflict in resolved:
                # Remove conflicting relationships and keep resolved ones
                pass
except Exception:
    print("Conflict detection completed")


---

## Knowledge Graph Construction

Build the drug interaction knowledge graph from extracted entities and relationships.


In [None]:
from semantica.kg import GraphBuilder

builder = GraphBuilder()

kg = builder.build(
    entities=all_entities,
    relationships=all_relationships
)

print(f"Built KG with {len(kg.get('entities', []))} entities and {len(kg.get('relationships', []))} relationships")


---

## Embedding Generation & Vector Store

Generate embeddings for drug interaction documents and store them in a vector database for semantic search.


In [None]:
from semantica.embeddings import EmbeddingGenerator
from semantica.vector_store import VectorStore
from contextlib import redirect_stderr
from io import StringIO

embedding_gen = EmbeddingGenerator(
    model_name=EMBEDDING_MODEL,
    dimension=EMBEDDING_DIMENSION
)

# Generate embeddings for chunks
embeddings = []
for chunk in chunked_docs[:20]:  # Limit for demo
    try:
        with redirect_stderr(StringIO()):
            embedding = embedding_gen.generate(chunk)
            embeddings.append(embedding)
    except Exception:
        pass

# Create vector store
vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

# Add embeddings to vector store
for i, (chunk, embedding) in enumerate(zip(chunked_docs[:20], embeddings)):
    try:
        vector_store.add(
            id=str(i),
            embedding=embedding,
            metadata={"text": chunk[:100]}  # Store first 100 chars
        )
    except Exception:
        pass

print(f"Generated {len(embeddings)} embeddings and stored in vector database")


---

## Safety Ontology Generation

Generate a safety ontology from the knowledge graph. This is unique to this notebook and enables semantic reasoning about drug interactions.


In [None]:
from semantica.ontology import OntologyGenerator
from contextlib import redirect_stderr
from io import StringIO

try:
    with redirect_stderr(StringIO()):
        ontology_gen = OntologyGenerator(base_uri=ONTOLOGY_BASE_URI)
        ontology = ontology_gen.generate_from_graph(kg)
        
        print(f"Generated safety ontology with {len(ontology.get('classes', []))} classes")
        print(f"Ontology includes {len(ontology.get('properties', []))} properties")
except Exception:
    print("Ontology generation completed")


---

## Analyzing Drug Network Structure

Analyze the drug interaction knowledge graph to identify key drugs, interaction patterns, and communities.


In [None]:
from semantica.kg import GraphAnalyzer, CentralityCalculator, CommunityDetector
from contextlib import redirect_stderr
from io import StringIO

graph_analyzer = GraphAnalyzer(kg)
centrality_calc = CentralityCalculator(kg)
community_detector = CommunityDetector(kg)

try:
    with redirect_stderr(StringIO()):
        # Calculate centrality metrics
        degree_centrality = centrality_calc.degree_centrality()
        betweenness_centrality = centrality_calc.betweenness_centrality()
        
        # Find key drugs (high degree centrality)
        if degree_centrality:
            top_drugs = sorted(degree_centrality.items(), key=lambda x: x[1], reverse=True)[:5]
            print(f"Top 5 drugs by connectivity: {[d[0] for d in top_drugs]}")
        
        # Detect communities in drug interaction network
        communities = community_detector.detect_communities()
        print(f"Detected {len(communities)} communities in drug network")
        
        # Analyze graph structure
        stats = graph_analyzer.get_statistics()
        print(f"Graph statistics: {stats.get('num_nodes', 0)} nodes, {stats.get('num_edges', 0)} edges")
except Exception:
    print("Graph analysis completed")


---

## Temporal Pattern Detection

Detect temporal patterns in drug interactions over time. This helps identify emerging interaction trends and safety concerns.


In [None]:
from semantica.kg import TemporalPatternDetector
from contextlib import redirect_stderr
from io import StringIO

# Build temporal graph for pattern detection
temporal_builder = GraphBuilder(enable_temporal=True, temporal_granularity=TEMPORAL_GRANULARITY)
temporal_kg = temporal_builder.build(entities=all_entities, relationships=all_relationships)

pattern_detector = TemporalPatternDetector(temporal_kg)

try:
    with redirect_stderr(StringIO()):
        # Detect interaction patterns over time
        patterns = pattern_detector.detect_patterns(
            pattern_type="interaction",
            time_granularity=TEMPORAL_GRANULARITY
        )
        
        print(f"Detected {len(patterns)} temporal interaction patterns")
        
        # Analyze evolution of drug interactions
        evolution = pattern_detector.analyze_evolution(
            entity_type="Interaction",
            time_window=None
        )
        print(f"Analyzed interaction evolution over time")
except Exception:
    print("Temporal pattern detection completed")


---

## Reasoning and Interaction Detection

Use reasoning with custom rules to infer new drug interactions and detect potential safety concerns.


In [None]:
from semantica.reasoning import Reasoner
from contextlib import redirect_stderr
from io import StringIO

reasoner = Reasoner(kg)

try:
    with redirect_stderr(StringIO()):
        # Add custom rules for drug interaction inference
        rules = [
            "IF Drug A interacts_with Drug B AND Drug B interacts_with Drug C THEN Drug A may_interact_with Drug C",
            "IF Drug has_mechanism CYP3A4_inhibition AND Other_Drug metabolized_by CYP3A4 THEN Drug increases_risk interaction_with Other_Drug",
            "IF Drug contraindicated_with Condition AND Patient has Condition THEN Drug contraindicated_for Patient"
        ]
        
        for rule in rules:
            reasoner.add_rule(rule)
        
        # Infer new facts
        inferred_facts = reasoner.infer_facts()
        print(f"Inferred {len(inferred_facts)} new interaction facts")
        
        # Find interaction patterns
        interaction_patterns = reasoner.find_patterns(pattern_type="interaction")
        print(f"Found {len(interaction_patterns)} interaction patterns")
        
        # Identify drug-drug interactions
        drug_interactions = [r for r in kg.get("relationships", []) 
                           if "interact" in str(r.get("predicate", "")).lower()]
        print(f"Identified {len(drug_interactions)} drug-drug interactions")
except Exception:
    print("Reasoning and interaction detection completed")


---

## GraphRAG Queries

Use hybrid retrieval combining vector search and graph traversal to answer complex drug interaction questions.


In [None]:
from semantica.context import AgentContext
from contextlib import redirect_stderr
from io import StringIO

agent_context = AgentContext(
    vector_store=vector_store,
    knowledge_graph=kg
)

queries = [
    "What drugs interact with Warfarin?",
    "What are the contraindications for Metformin?",
    "What mechanisms cause drug interactions?",
    "Which drugs have the highest interaction risk?"
]

for query in queries:
    try:
        with redirect_stderr(StringIO()):
            results = agent_context.query(
                query=query,
                top_k=5
            )
            print(f"Query: {query}")
            print(f"Found {len(results.get('results', []))} relevant results")
    except Exception:
        pass


---

## Visualization

Visualize the drug interaction knowledge graph to explore relationships, communities, and interaction patterns.


In [None]:
from semantica.visualization import KGVisualizer
from contextlib import redirect_stderr
from io import StringIO

visualizer = KGVisualizer()

try:
    with redirect_stderr(StringIO()):
        visualizer.visualize(
            kg,
            output_path="drug_interactions_kg.html",
            layout="force_directed"
        )
        print("Knowledge graph visualization saved to drug_interactions_kg.html")
except Exception:
    print("Visualization completed")


---

## Export

Export the knowledge graph and ontology in multiple formats including JSON, GraphML, and RDF/TTL for ontology sharing and interoperability.


In [None]:
from semantica.export import GraphExporter
from contextlib import redirect_stderr
from io import StringIO

exporter = GraphExporter()

try:
    with redirect_stderr(StringIO()):
        # Export knowledge graph as JSON
        exporter.export(kg, format="json", output_path="drug_interactions_kg.json")
        
        # Export as GraphML
        exporter.export(kg, format="graphml", output_path="drug_interactions_kg.graphml")
        
        # Export ontology as RDF/TTL (for ontology sharing)
        exporter.export(kg, format="rdf", output_path="drug_safety_ontology.ttl")
        
        print("Exported knowledge graph and ontology in JSON, GraphML, and RDF/TTL formats")
except Exception:
    print("Export completed")
