[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/blockchain/01_DeFi_Protocol_Intelligence.ipynb)

# DeFi Protocol Intelligence - Risk Assessment & Ontology Reasoning

## Overview

This notebook demonstrates **DeFi protocol intelligence** using Semantica with focus on **risk assessment**, **ontology-based reasoning**, and **relationship analysis**. The pipeline ingests DeFi data from multiple sources, extracts protocol entities, builds knowledge graphs, and assesses risks using graph reasoning.

### Key Features

- **Risk Assessment Focus**: Emphasizes KG construction and reasoning for risk evaluation
- **Ontology-Based Reasoning**: Uses domain ontologies for DeFi protocol analysis
- **Relationship Analysis**: Analyzes protocol relationships and dependencies
- **Comprehensive Data Sources**: Multiple RSS feeds, APIs, and databases
- **Modular Architecture**: Direct use of Semantica modules without core orchestrator

### Learning Objectives

- Ingest DeFi data from multiple sources (RSS feeds, APIs, databases)
- Extract DeFi entities (Protocols, Tokens, Pools, Transactions, Risks)
- Build and analyze DeFi knowledge graphs
- Generate and utilize DeFi ontologies
- Perform risk assessment using graph reasoning
- Store and query DeFi data using vector stores and graph stores

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Conflict Detection]
    G --> H[Knowledge Graph]
    H --> I[Embeddings]
    I --> J[Vector Store]
    H --> K[Ontology Generation]
    K --> L[Reasoning & Risk]
    J --> M[GraphRAG Queries]
    L --> M
    H --> N[Graph Store]
    K --> O[Triplet Store]
    M --> P[Visualization]
    N --> P
    O --> P
    P --> Q[Export]
```

## Installation


In [1]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


Note: you may need to restart the kernel to use updated packages.




## Configuration & Setup


In [2]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "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")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200


## Ingesting DeFi Data from Multiple Sources


In [3]:
from semantica.ingest import FeedIngestor, FileIngestor, WebIngestor
import os
from contextlib import redirect_stderr
from io import StringIO

os.makedirs("data", exist_ok=True)

feed_sources = [
    # Crypto News RSS Feeds
    ("CoinDesk", "https://www.coindesk.com/arc/outboundfeeds/rss/"),
    ("CoinTelegraph", "https://cointelegraph.com/rss"),
    ("Decrypt", "https://decrypt.co/feed"),
    ("The Block", "https://www.theblock.co/rss.xml"),
    ("CryptoSlate", "https://cryptoslate.com/feed/"),
    ("CryptoNews", "https://cryptonews.com/news/feed/"),
]

feed_ingestor = FeedIngestor()
all_documents = []

print(f"Ingesting from {len(feed_sources)} feed sources...")
for i, (feed_name, feed_url) in enumerate(feed_sources, 1):
    try:
        with redirect_stderr(StringIO()):
            feed_data = feed_ingestor.ingest_feed(feed_url, validate=False)
        
        feed_count = 0
        for item in feed_data.items:
            if not item.content:
                item.content = item.description or item.title or ""
            if item.content:
                if not hasattr(item, 'metadata'):
                    item.metadata = {}
                item.metadata['source'] = feed_name
                all_documents.append(item)
                feed_count += 1
        
        if feed_count > 0:
            print(f"  [{i}/{len(feed_sources)}] {feed_name}: {feed_count} documents")
    except Exception:
        continue

if not all_documents:
    defi_data = """
    Uniswap is a decentralized exchange protocol with high liquidity pools. It uses automated market makers (AMMs) for token swaps.
    Aave is a lending protocol that offers variable and stable interest rates. Users can deposit assets to earn yield.
    Compound is a money market protocol for lending and borrowing cryptocurrencies. It uses algorithmic interest rates.
    MakerDAO uses collateralized debt positions (CDPs) for stablecoin generation. DAI is the stablecoin created.
    Curve Finance is a decentralized exchange optimized for stablecoin trading with low slippage.
    Yearn Finance aggregates yield farming strategies across multiple DeFi protocols.
    SushiSwap is a decentralized exchange and automated market maker with yield farming features.
    Balancer is a protocol for programmable liquidity and automated portfolio management.
    """
    with open("data/defi_protocols.txt", "w") as f:
        f.write(defi_data)
    file_ingestor = FileIngestor()
    all_documents = file_ingestor.ingest("data/defi_protocols.txt")

documents = all_documents
print(f"Ingested {len(documents)} documents")


Ingesting from 6 feed sources...


Status,Action,Module,Submodule,File,Time
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.00s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.01s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.01s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.00s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.00s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.00s
‚ùå,Semantica is parsing,üîç parse,DocumentParser,p>,0.01s
‚úÖ,Semantica is normalizing,üîß normalize,TextNormalizer,-,0.01s
‚úÖ,Semantica is extracting,üéØ semantic_extract,NERExtractor,-,3.68s
üîÑ,Semantica is extracting,üéØ semantic_extract,RelationExtractor,-,0.00s


  [1/6] CoinDesk: 25 documents
  [2/6] CoinTelegraph: 30 documents
  [3/6] Decrypt: 51 documents
  [4/6] The Block: 19 documents
  [5/6] CryptoSlate: 10 documents
  [6/6] CryptoNews: 20 documents
Ingested 155 documents


In [4]:
from semantica.parse import DocumentParser

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        parsed = parser.parse(
            doc.content if hasattr(doc, 'content') else str(doc),
            content_type="text"
        )
        parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc)
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

documents = parsed_documents


Parsing 155 documents...
  Parsed 50/155 documents...
  Parsed 100/155 documents...
  Parsed 150/155 documents...
  Parsed 155/155 documents...


## Normalizing and Chunking DeFi Documents


In [5]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
splitter = TextSplitter(
    method="entity_aware",
    ner_method="spacy",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

print(f"Normalizing {len(documents)} documents...")
normalized_documents = []
for i, doc in enumerate(documents, 1):
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        remove_extra_whitespace=True,
        lowercase=False
    )
    normalized_documents.append(normalized_text)
    if i % 50 == 0 or i == len(documents):
        print(f"  Normalized {i}/{len(documents)} documents...")

print(f"Chunking {len(normalized_documents)} documents...")
chunked_documents = []
for i, doc_text in enumerate(normalized_documents, 1):
    try:
        with redirect_stderr(StringIO()):
            chunks = splitter.split(doc_text)
        chunked_documents.extend(chunks)
    except Exception:
        simple_splitter = TextSplitter(method="recursive", chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
        chunks = simple_splitter.split(doc_text)
        chunked_documents.extend(chunks)
    if i % 50 == 0 or i == len(normalized_documents):
        print(f"  Chunked {i}/{len(normalized_documents)} documents ({len(chunked_documents)} chunks so far)")

print(f"Created {len(chunked_documents)} chunks from {len(normalized_documents)} documents")


Normalizing 155 documents...
  Normalized 50/155 documents...
  Normalized 100/155 documents...
  Normalized 150/155 documents...
  Normalized 155/155 documents...
Chunking 155 documents...
  Chunked 50/155 documents (50 chunks so far)
  Chunked 100/155 documents (101 chunks so far)
  Chunked 150/155 documents (151 chunks so far)
  Chunked 155/155 documents (156 chunks so far)
Created 156 chunks from 155 documents


## Extracting DeFi Entities


In [6]:
from semantica.semantic_extract import NERExtractor

entity_extractor = NERExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0
)

all_entities = []
error_count = 0
print(f"Extracting entities from {len(chunked_documents)} chunks...")
for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        entities = entity_extractor.extract_entities(
            chunk_text,
            entity_types=["Protocol", "Token", "Pool", "Transaction", "Risk"]
        )
        all_entities.extend(entities)
    except Exception as e:
        error_count += 1
        # Print first few errors for debugging
        if error_count <= 3:
            print(f"  Warning: Error processing chunk {i}: {str(e)[:100]}")
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_entities)} entities found)")

if error_count > 0:
    print(f"  Note: {error_count} chunks had errors during extraction")

protocols = [e for e in all_entities if e.label == "Protocol" or "protocol" in e.label.lower()]
tokens = [e for e in all_entities if e.label == "Token" or "token" in e.label.lower()]
risks = [e for e in all_entities if e.label == "Risk" or "risk" in e.label.lower()]

print(f"Extracted {len(protocols)} protocols, {len(tokens)} tokens, {len(risks)} risks")


Extracting entities from 156 chunks...
  Processed 20/156 chunks (32 entities found)
  Processed 40/156 chunks (100 entities found)
  Processed 60/156 chunks (386 entities found)
  Processed 80/156 chunks (410 entities found)
  Processed 100/156 chunks (556 entities found)
  Processed 120/156 chunks (647 entities found)
  Processed 140/156 chunks (717 entities found)
  Processed 156/156 chunks (1010 entities found)
Extracted 113 protocols, 792 tokens, 32 risks


## Extracting DeFi Relationships


In [None]:
from semantica.semantic_extract import RelationExtractor

relation_extractor = RelationExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0,
    verbose=True
)

all_relationships = []
error_count = 0
print(f"Extracting relationships from {len(chunked_documents)} chunks...")

for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        relationships = relation_extractor.extract_relations(
            chunk_text,
            entities=all_entities,
            relation_types=["uses", "governs", "provides", "has_risk", "interacts_with", "depends_on"],
            verbose=True
        )
        all_relationships.extend(relationships)
    except Exception as e:
        error_count += 1
        # Print first few errors for debugging
        if error_count <= 3:
            print(f"  Warning: Error processing chunk {i}: {str(e)[:100]}")
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_relationships)} relationships found)")

if error_count > 0:
    print(f"  Note: {error_count} chunks had errors during relation extraction")

print(f"Extracted {len(all_relationships)} relationships")


Extracting relationships from 156 chunks...


## Resolving Duplicate Entities


## Detecting and Resolving Conflicts


In [None]:
from semantica.conflicts import ConflictDetector, ConflictResolver

# Use relationship conflict detection for DeFi protocol interactions
# voting strategy aggregates multiple sources for protocol interaction data
conflict_detector = ConflictDetector()
conflict_resolver = ConflictResolver()

print(f"Detecting relationship conflicts in {len(merged_entities)} entities...")
conflicts = conflict_detector.detect_conflicts(
    entities=merged_entities,
    relationships=all_relationships,
    method="relationship"  # Detect conflicts in relationships
)

print(f"Detected {len(conflicts)} relationship conflicts")

if conflicts:
    print(f"Resolving conflicts using voting strategy...")
    resolved = conflict_resolver.resolve_conflicts(
        conflicts,
        strategy="voting"  # Majority vote from multiple sources
    )
    print(f"Resolved {len(resolved)} conflicts")
else:
    print("No conflicts detected")


## Building DeFi Knowledge Graph


In [None]:
from semantica.kg import GraphBuilder

graph_builder = GraphBuilder(
    merge_entities=True,
    resolve_conflicts=True,
    entity_resolution_strategy="fuzzy"
)

print(f"Building knowledge graph...")
kg_sources = [{
    "entities": [{"text": e.text, "type": e.label, "confidence": e.confidence} for e in merged_entities],
    "relationships": [{"source": r.source, "target": r.target, "type": r.label, "confidence": r.confidence} for r in all_relationships]
}]

kg = graph_builder.build(kg_sources)

entities_count = len(kg.get('entities', []))
relationships_count = len(kg.get('relationships', []))
print(f"Graph: {entities_count} entities, {relationships_count} relationships")


## Generating Embeddings for Protocols and Tokens


In [None]:
from semantica.embeddings import EmbeddingGenerator

embedding_gen = EmbeddingGenerator(
    provider="sentence_transformers",
    model=EMBEDDING_MODEL
)

print(f"Generating embeddings for {len(protocols)} protocols and {len(tokens)} tokens...")
protocol_texts = [p.text for p in protocols]
protocol_embeddings = embedding_gen.generate_embeddings(protocol_texts)

token_texts = [t.text for t in tokens]
token_embeddings = embedding_gen.generate_embeddings(token_texts)

print(f"Generated {len(protocol_embeddings)} protocol embeddings and {len(token_embeddings)} token embeddings")


## Populating Vector Store


In [None]:
from semantica.vector_store import VectorStore

vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

print(f"Storing {len(protocol_embeddings)} protocol vectors and {len(token_embeddings)} token vectors...")
protocol_ids = vector_store.store_vectors(
    vectors=protocol_embeddings,
    metadata=[{"type": "protocol", "name": p.text, "label": p.label} for p in protocols]
)

token_ids = vector_store.store_vectors(
    vectors=token_embeddings,
    metadata=[{"type": "token", "name": t.text, "label": t.label} for t in tokens]
)

print(f"Stored {len(protocol_ids)} protocol vectors and {len(token_ids)} token vectors")


## Generating DeFi Ontology


In [None]:
from semantica.ontology import OntologyGenerator

ontology_gen = OntologyGenerator(base_uri="https://defi.example.org/ontology/")
ontology = ontology_gen.generate_from_graph(kg)

print(f"Generated DeFi ontology with {len(ontology.get('classes', []))} classes")


## Reasoning and Risk Assessment


In [None]:
from semantica.reasoning import Reasoner

reasoner = Reasoner()

reasoner.add_rule("IF Protocol has_risk Risk AND Risk severity high THEN Protocol risk_level critical")
reasoner.add_rule("IF Protocol depends_on Protocol AND Protocol has_risk Risk THEN Protocol inherits Risk")

inferred_facts = reasoner.infer_facts(kg)

risk_paths = reasoner.find_paths(
    kg,
    source_type="Protocol",
    target_type="Risk",
    max_hops=2
)

print(f"Inferred {len(inferred_facts)} facts")
print(f"Found {len(risk_paths)} risk paths")


## Storing Knowledge Graph (Optional)


In [None]:
from semantica.graph_store import GraphStore

# Optional: Store to persistent graph database
# graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
# graph_store.store_graph(kg)

print("Graph store configured (commented out for demo)")


## Storing Ontology as RDF Triplets (Optional)


In [None]:
from semantica.triplet_store import TripletStore

# Optional: Store ontology as RDF triplets
# triplet_store = TripletStore(backend="blazegraph", endpoint="http://localhost:9999/blazegraph")
# triplet_store.add_triplets_from_ontology(ontology)

print("Triplet store configured (commented out for demo)")


## GraphRAG: Hybrid Vector + Graph Queries


In [None]:
from semantica.context import AgentContext

context = AgentContext(vector_store=vector_store, knowledge_graph=kg)

query = "What protocols have high risk?"
results = context.retrieve(
    query,
    max_results=10,
    use_graph=True,
    expand_graph=True,
    include_entities=True,
    include_relationships=True
)

print(f"GraphRAG query: '{query}'")
print(f"\nRetrieved {len(results)} results:\n")
for i, result in enumerate(results[:5], 1):
    print(f"{i}. Score: {result.get('score', 0):.3f}")
    print(f"   Content: {result.get('content', '')[:200]}...")
    if result.get('related_entities'):
        print(f"   Related entities: {len(result['related_entities'])}")
    print()


## Visualizing the DeFi Knowledge Graph


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(
    kg,
    output_path="defi_protocol_kg.html",
    layout="spring",
    node_size=20
)

print("Visualization saved to defi_protocol_kg.html")


## Exporting Results


In [None]:
from semantica.export import GraphExporter

exporter = GraphExporter()
exporter.export(kg, output_path="defi_protocol_kg.json", format="json")
exporter.export(kg, output_path="defi_protocol_kg.graphml", format="graphml")
exporter.export(ontology, output_path="defi_ontology.ttl", format="rdf")

print("Exported knowledge graph to JSON and GraphML formats")
print("Exported ontology to RDF/TTL format")
