[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/blockchain/02_Transaction_Network_Analysis.ipynb)

# Transaction Network Analysis - Pattern Detection & Graph Analytics

## Overview

This notebook demonstrates **blockchain transaction network analysis** using Semantica with focus on **pattern detection**, **network analytics**, and **real-time processing**. The pipeline analyzes blockchain transaction networks to detect patterns, identify whale movements, and analyze token flows.

### Key Features

- **Pattern Detection**: Emphasizes graph analytics for transaction pattern recognition
- **Network Analytics**: Uses centrality measures and community detection
- **Temporal Analysis**: Time-aware queries and transaction evolution tracking
- **Whale Tracking**: Identifies large transaction movements
- **Flow Analysis**: Analyzes token flows through the network
- **Comprehensive Data Sources**: Multiple blockchain APIs, analytics platforms, and databases

### Learning Objectives

- Ingest blockchain transaction data from multiple sources
- Extract transaction entities (Transactions, Wallets, Addresses, Blocks, Flows)
- Build temporal transaction network graphs
- Perform graph analytics (centrality, communities, connectivity)
- Detect patterns and whale movements
- Analyze token flows and transaction paths
- Store and query transaction data using vector stores and graph stores

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Conflict Detection]
    G --> H[Transaction Network Graph]
    H --> I[Embeddings]
    I --> J[Vector Store]
    H --> K[Graph Analytics]
    K --> L[Temporal Queries]
    L --> M[Pattern Detection]
    M --> N[Flow Analysis]
    J --> O[GraphRAG Queries]
    H --> P[Graph Store]
    O --> Q[Visualization]
    P --> Q
    Q --> R[Export]
```

## Installation


In [1]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


Note: you may need to restart the kernel to use updated packages.




## Configuration & Setup


In [2]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "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")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200
TEMPORAL_GRANULARITY = "day"


## Ingesting Blockchain Transaction Data


In [3]:
from semantica.ingest import WebIngestor, FileIngestor, FeedIngestor
import os
from contextlib import redirect_stderr
from io import StringIO

os.makedirs("data", exist_ok=True)

# Blockchain and crypto news RSS feeds
feed_sources = [
    ("CoinDesk", "https://www.coindesk.com/arc/outboundfeeds/rss/"),
    ("CoinTelegraph", "https://cointelegraph.com/rss"),
    ("Decrypt", "https://decrypt.co/feed"),
    ("The Block", "https://www.theblock.co/rss.xml"),
    ("CryptoSlate", "https://cryptoslate.com/feed/"),
    ("CryptoNews", "https://cryptonews.com/news/feed/"),
    ("Bitcoin Magazine", "https://bitcoinmagazine.com/.rss/full/"),
    ("Ethereum News", "https://ethereum.org/en/feed.xml"),
]

# Blockchain data and analytics web sources
web_sources = [
    ("Blockchain.com Stats", "https://www.blockchain.com/explorer"),
    ("Etherscan", "https://etherscan.io/"),
    ("Bitcoin Explorer", "https://blockstream.info/"),
]

# Initialize ingestors
feed_ingestor = FeedIngestor()
web_ingestor = WebIngestor()
all_documents = []

# Ingest from RSS feeds
print(f"Ingesting from {len(feed_sources)} RSS feed sources...")
for i, (feed_name, feed_url) in enumerate(feed_sources, 1):
    try:
        with redirect_stderr(StringIO()):
            feed_data = feed_ingestor.ingest_feed(feed_url, validate=False)
        
        feed_count = 0
        for item in feed_data.items:
            if not item.content:
                item.content = item.description or item.title or ""
            if item.content:
                if not hasattr(item, 'metadata'):
                    item.metadata = {}
                item.metadata['source'] = feed_name
                item.metadata['type'] = 'feed'
                all_documents.append(item)
                feed_count += 1
        
        if feed_count > 0:
            print(f"  [{i}/{len(feed_sources)}] {feed_name}: {feed_count} documents")
    except Exception as e:
        if i <= 3:  # Show first few errors
            print(f"  Warning: {feed_name} failed: {str(e)[:50]}")
        continue

# Ingest from web sources (transaction-related pages)
print(f"\nIngesting from {len(web_sources)} web sources...")
for i, (web_name, web_url) in enumerate(web_sources, 1):
    try:
        with redirect_stderr(StringIO()):
            web_documents = web_ingestor.ingest(web_url, method="url")
        
        web_count = 0
        for doc in web_documents:
            if not hasattr(doc, 'metadata'):
                doc.metadata = {}
            doc.metadata['source'] = web_name
            doc.metadata['type'] = 'web'
            all_documents.append(doc)
            web_count += 1
        
        if web_count > 0:
            print(f"  [{i}/{len(web_sources)}] {web_name}: {web_count} documents")
    except Exception as e:
        if i <= 2:  # Show first few errors
            print(f"  Warning: {web_name} failed: {str(e)[:50]}")
        continue

# Fallback to sample transaction data if no documents ingested
if not all_documents:
    print("\n‚ö†Ô∏è No documents ingested from feeds/web sources. Using sample transaction data...")
    tx_data = """
    Transaction 0x123 transfers 1000 ETH from wallet 0xABC to wallet 0xDEF at block 18500000.
    Transaction 0x456 transfers 500 BTC from wallet 0xGHI to wallet 0xJKL at block 18500001.
    Large transaction 0x789 moves 10000 ETH (whale movement) from wallet 0xMNO to wallet 0xPQR at block 18500002.
    Transaction 0xabc transfers 200 USDT from wallet 0xSTU to wallet 0xVWX at block 18500003.
    Transaction 0xdef transfers 5000 ETH from wallet 0xYZA to wallet 0xBCD at block 18500004.
    Transaction 0x111 transfers 3000 DAI from wallet 0xEFG to wallet 0xHIJ at block 18500005.
    Transaction 0x222 transfers 1500 USDC from wallet 0xKLM to wallet 0xNOP at block 18500006.
    Transaction 0x333 transfers 2500 LINK from wallet 0xQRS to wallet 0xTUV at block 18500007.
    Transaction 0x444 transfers 8000 MATIC from wallet 0xWXY to wallet 0xZAB at block 18500008.
    Transaction 0x555 transfers 12000 UNI from wallet 0xCDE to wallet 0xFGH at block 18500009.
    """
    with open("data/transactions.txt", "w") as f:
        f.write(tx_data)
    file_ingestor = FileIngestor()
    all_documents = file_ingestor.ingest("data/transactions.txt")
    for doc in all_documents:
        if not hasattr(doc, 'metadata'):
            doc.metadata = {}
        doc.metadata['source'] = 'Sample Data'
        doc.metadata['type'] = 'sample'

documents = all_documents

# Count unique sources properly handling different document types
unique_sources = set()
for d in documents:
    if hasattr(d, 'metadata') and d.metadata:
        source = d.metadata.get('source', 'Unknown')
        unique_sources.add(source)
    elif isinstance(d, dict) and 'metadata' in d:
        source = d['metadata'].get('source', 'Unknown')
        unique_sources.add(source)

print(f"\n‚úÖ Total ingested: {len(documents)} documents")
print(f"   Sources: {len(unique_sources)} unique sources")
if unique_sources:
    print(f"   Source list: {', '.join(sorted(unique_sources))}")


Ingesting from 8 RSS feed sources...


Status,Action,Module,Submodule,File,Time
‚úÖ,Semantica is building,üß† kg,CommunityDetector,-,0.03s
‚úÖ,Semantica is processing,üîó context,ContextRetriever,-,25.99s
‚ùå,Semantica is embedding,üíæ embeddings,TextEmbedder,-,0.00s
‚úÖ,Semantica is processing,üîó context,AgentMemory,-,0.03s
‚úÖ,Semantica is visualizing,üìà visualization,KGVisualizer,-,5.78s
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,transaction_network.json,0.09s
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,transaction_network.graphml,0.01s
‚ùå,Semantica is exporting,üíæ export,GraphExporter,transaction_network.csv,0.01s
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,transaction_network.gexf,0.01s
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,transaction_network.dot,0.01s


  [1/8] CoinDesk: 25 documents
  [2/8] CoinTelegraph: 30 documents
  [3/8] Decrypt: 52 documents
  [4/8] The Block: 19 documents
  [5/8] CryptoSlate: 10 documents
  [6/8] CryptoNews: 20 documents

Ingesting from 3 web sources...

‚úÖ Total ingested: 156 documents
   Sources: 6 unique sources
   Source list: CoinDesk, CoinTelegraph, CryptoNews, CryptoSlate, Decrypt, The Block


## Parsing Transaction Documents


In [4]:
from semantica.parse import DocumentParser

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        parsed = parser.parse(
            doc.content if hasattr(doc, 'content') else str(doc),
            content_type="text"
        )
        parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc)
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

documents = parsed_documents


Parsing 156 documents...
  Parsed 50/156 documents...
  Parsed 100/156 documents...
  Parsed 150/156 documents...
  Parsed 156/156 documents...


## Normalizing and Chunking Transaction Data


In [5]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
splitter = TextSplitter(
    method="entity_aware",
    ner_method="spacy",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

print(f"Normalizing {len(documents)} documents...")
normalized_documents = []
for i, doc in enumerate(documents, 1):
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        remove_extra_whitespace=True,
        lowercase=False
    )
    normalized_documents.append(normalized_text)
    if i % 50 == 0 or i == len(documents):
        print(f"  Normalized {i}/{len(documents)} documents...")

print(f"Chunking {len(normalized_documents)} documents...")
chunked_documents = []
for i, doc_text in enumerate(normalized_documents, 1):
    try:
        with redirect_stderr(StringIO()):
            chunks = splitter.split(doc_text)
        chunked_documents.extend(chunks)
    except Exception:
        simple_splitter = TextSplitter(method="recursive", chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
        chunks = simple_splitter.split(doc_text)
        chunked_documents.extend(chunks)
    if i % 50 == 0 or i == len(normalized_documents):
        print(f"  Chunked {i}/{len(normalized_documents)} documents ({len(chunked_documents)} chunks so far)")

print(f"Created {len(chunked_documents)} chunks from {len(normalized_documents)} documents")


Normalizing 156 documents...
  Normalized 50/156 documents...
  Normalized 100/156 documents...
  Normalized 150/156 documents...
  Normalized 156/156 documents...
Chunking 156 documents...
  Chunked 50/156 documents (50 chunks so far)
  Chunked 100/156 documents (101 chunks so far)
  Chunked 150/156 documents (151 chunks so far)
  Chunked 156/156 documents (157 chunks so far)
Created 157 chunks from 156 documents


## Extracting Transaction Entities


In [6]:
from semantica.semantic_extract import NERExtractor

# Initialize NERExtractor with ML method only (spaCy)
# Note: ML method extracts standard NER labels (PERSON, ORG, GPE, MONEY, etc.)
entity_extractor = NERExtractor(
    method=["ml"],
    min_confidence=0.5
)

# Extract all entities using Semantica's extract() method - handles batch processing
print(f"Extracting entities from {len(chunked_documents)} chunks using ML (spaCy)...")
batch_results = entity_extractor.extract(chunked_documents)

# Flatten results (extract() returns List[List[Entity]] for batch input)
all_entities = [entity for entity_list in batch_results for entity in entity_list]

# Use Semantica's classify_entities to group by standard labels
classified = entity_extractor.classify_entities(all_entities)

# Filter entities for blockchain transaction domain
# Look for transaction hashes, wallet addresses, block numbers, and crypto tokens
transaction_keywords = ["transaction", "tx", "0x", "transfer", "sent", "received"]
wallet_keywords = ["wallet", "address", "0x", "account"]
block_keywords = ["block", "blockchain", "height", "block number"]

transactions = [
    e for e in all_entities 
    if any(kw in e.text.lower() for kw in transaction_keywords) 
    or e.label == "MONEY"  # Money entities often represent transactions
]
wallets = [
    e for e in all_entities 
    if any(kw in e.text.lower() for kw in wallet_keywords)
    or (len(e.text) >= 26 and e.text.startswith("0x"))  # Ethereum addresses
]
blocks = [
    e for e in all_entities 
    if any(kw in e.text.lower() for kw in block_keywords)
    or e.label == "CARDINAL"  # Block numbers are often cardinal numbers
]

print(f"\n‚úÖ Extraction complete!")
print(f"   Total entities: {len(all_entities)}")
print(f"   Standard labels: {list(classified.keys())}")
print(f"   Transactions (filtered): {len(transactions)}")
print(f"   Wallets/Addresses (filtered): {len(wallets)}")
print(f"   Blocks (filtered): {len(blocks)}")


Extracting entities from 157 chunks using ML (spaCy)...

‚úÖ Extraction complete!
   Total entities: 2420
   Standard labels: ['DATE', 'CARDINAL', 'ORG', 'GPE', 'PERSON', 'FAC', 'MONEY', 'NORP', 'LOC', 'ORDINAL', 'PRODUCT', 'LAW', 'PERCENT', 'WORK_OF_ART', 'EVENT', 'TIME']
   Transactions (filtered): 161
   Wallets/Addresses (filtered): 10
   Blocks (filtered): 400


## Extracting Transaction Relationships


In [7]:
from semantica.semantic_extract import RelationExtractor

# Use ML-based dependency parsing to avoid rate limits
relation_extractor = RelationExtractor(
    method="dependency",  # ML/NLP method - no API calls needed
    verbose=True
)

all_relationships = []
error_count = 0
print(f"Extracting relationships from {len(chunked_documents)} chunks using ML (dependency parsing)...")

for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        relationships = relation_extractor.extract_relations(
            chunk_text,
            entities=all_entities,
            relation_types=["transfers", "from", "to", "in_block", "contains", "flows_to"],
            verbose=True
        )
        all_relationships.extend(relationships)
    except Exception as e:
        error_count += 1
        if error_count <= 3:
            print(f"  Warning: Error on chunk {i}: {str(e)[:100]}")
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_relationships)} relationships found)")

if error_count > 0:
    print(f"  Note: {error_count} chunks had errors during relation extraction")

print(f"\n‚úÖ Extracted {len(all_relationships)} relationships")


Extracting relationships from 157 chunks using ML (dependency parsing)...
  Processed 20/157 chunks (38 relationships found)
  Processed 40/157 chunks (103 relationships found)
  Processed 60/157 chunks (174 relationships found)
  Processed 80/157 chunks (218 relationships found)
  Processed 100/157 chunks (349 relationships found)
  Processed 120/157 chunks (443 relationships found)
  Processed 140/157 chunks (573 relationships found)
  Processed 157/157 chunks (656 relationships found)

‚úÖ Extracted 656 relationships


## Detecting Transaction Conflicts

-  **Multi-Type Detection**: Detects entity, relationship, and temporal conflicts across transaction network
- **Most Recent Strategy**: Uses `most_recent` resolution for transaction data (most accurate for blockchain)
- **Source-Aware Resolution**: Considers source reliability and confidence scores for conflict resolution


In [8]:
from semantica.conflicts import ConflictDetector, ConflictResolver

# Initialize conflict detection and resolution
conflict_detector = ConflictDetector()
conflict_resolver = ConflictResolver()

# Use Semantica's conflict detection methods directly
# Detects entity, relationship, and temporal conflicts
print(f"Detecting conflicts in {len(all_entities)} entities and {len(all_relationships)} relationships...")

# Convert to dict format for conflict detection (Semantica expects dicts)
entity_dicts = [{"id": e.text, "text": e.text, "type": e.label, "confidence": getattr(e, 'confidence', 1.0)} for e in all_entities]
relationship_dicts = [{"id": f"{r.subject.text}_{r.predicate}_{r.object.text}", "source_id": r.subject.text, "target_id": r.object.text, "type": r.predicate} for r in all_relationships]

# Detect all conflict types using Semantica's methods
all_conflicts = []
all_conflicts.extend(conflict_detector.detect_entity_conflicts(entity_dicts))
all_conflicts.extend(conflict_detector.detect_relationship_conflicts(relationship_dicts))
all_conflicts.extend(conflict_detector.detect_temporal_conflicts(entity_dicts))

print(f"Detected {len(all_conflicts)} total conflicts")

# Resolve conflicts using best strategy for transaction networks
if all_conflicts:
    print(f"Resolving conflicts using 'most_recent' strategy (best for transaction data)...")
    resolved = conflict_resolver.resolve_conflicts(all_conflicts, strategy="most_recent")
    resolved_count = len([r for r in resolved if r.resolved])
    print(f"‚úÖ Resolved {resolved_count}/{len(all_conflicts)} conflicts")
else:
    print("‚úÖ No conflicts detected - data is consistent")


Detecting conflicts in 2420 entities and 656 relationships...
Detected 0 total conflicts
‚úÖ No conflicts detected - data is consistent


## Building Temporal Transaction Network Graph


In [9]:
from semantica.kg import GraphBuilder

# Conflicts already resolved - disable expensive operations
graph_builder = GraphBuilder(
    merge_entities=False,  # Skip entity merging (already done in conflict resolution)
    resolve_conflicts=False,  # Conflicts already resolved
    entity_resolution_strategy="exact",  
    enable_temporal=True,
    temporal_granularity=TEMPORAL_GRANULARITY,
    track_history=True,  
    version_snapshots=True 
)

# Build graph - Semantica's build() method automatically shows progress and ETA
kg_sources = [{
    "entities": [{"text": e.text, "type": e.label, "confidence": getattr(e, 'confidence', 1.0)} for e in all_entities],
    "relationships": [{"source": r.subject.text, "target": r.object.text, "type": r.predicate, "confidence": getattr(r, 'confidence', 1.0)} for r in all_relationships]
}]

kg = graph_builder.build(kg_sources)


Processing 2420 entities, 656 relationships (3076 total)...
  Entities: 121/2420 (5.0%) | ETA: 41.3m | Rate: 0.9/s
  Entities: 242/2420 (10.0%) | ETA: 40.3m | Rate: 0.9/s
  Entities: 363/2420 (15.0%) | ETA: 38.0m | Rate: 0.9/s
  Entities: 484/2420 (20.0%) | ETA: 35.7m | Rate: 0.9/s
  Entities: 605/2420 (25.0%) | ETA: 33.4m | Rate: 0.9/s
  Entities: 726/2420 (30.0%) | ETA: 31.4m | Rate: 0.9/s
  Entities: 847/2420 (35.0%) | ETA: 29.1m | Rate: 0.9/s
  Entities: 968/2420 (40.0%) | ETA: 27.9m | Rate: 0.9/s
  Entities: 1089/2420 (45.0%) | ETA: 25.5m | Rate: 0.9/s
  Entities: 1210/2420 (50.0%) | ETA: 23.2m | Rate: 0.9/s
  Entities: 1331/2420 (55.0%) | ETA: 20.9m | Rate: 0.9/s
  Entities: 1452/2420 (60.0%) | ETA: 18.6m | Rate: 0.9/s
  Entities: 1573/2420 (65.0%) | ETA: 16.2m | Rate: 0.9/s
  Entities: 1694/2420 (70.0%) | ETA: 13.9m | Rate: 0.9/s
  Entities: 1815/2420 (75.0%) | ETA: 11.7m | Rate: 0.9/s
  Entities: 1936/2420 (80.0%) | ETA: 9.3m | Rate: 0.9/s
  Entities: 2057/2420 (85.0%) | ETA: 7

## Generating Embeddings for Transactions and Wallets


In [10]:
from semantica.embeddings import EmbeddingGenerator

embedding_gen = EmbeddingGenerator(
    provider="sentence_transformers",
    model=EMBEDDING_MODEL
)

print(f"Generating embeddings for {len(transactions)} transactions and {len(wallets)} wallets...")
transaction_texts = [t.text for t in transactions]
transaction_embeddings = embedding_gen.generate_embeddings(transaction_texts)

wallet_texts = [w.text for w in wallets]
wallet_embeddings = embedding_gen.generate_embeddings(wallet_texts)

print(f"Generated {len(transaction_embeddings)} transaction embeddings and {len(wallet_embeddings)} wallet embeddings")


fastembed not available. Install with: pip install fastembed. Using fallback embedding method.


Generating embeddings for 161 transactions and 10 wallets...
Generated 161 transaction embeddings and 10 wallet embeddings


## Populating Vector Store


In [11]:
from semantica.vector_store import VectorStore

vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

print(f"Storing {len(transaction_embeddings)} transaction vectors and {len(wallet_embeddings)} wallet vectors...")
transaction_ids = vector_store.store_vectors(
    vectors=transaction_embeddings,
    metadata=[{"type": "transaction", "name": t.text, "label": t.label} for t in transactions]
)

wallet_ids = vector_store.store_vectors(
    vectors=wallet_embeddings,
    metadata=[{"type": "wallet", "name": w.text, "label": w.label} for w in wallets]
)

print(f"Stored {len(transaction_ids)} transaction vectors and {len(wallet_ids)} wallet vectors")


fastembed not available. Install with: pip install fastembed. Using fallback embedding method.


Storing 161 transaction vectors and 10 wallet vectors...
Stored 161 transaction vectors and 10 wallet vectors


## Analyzing Graph Structure


In [12]:
from semantica.kg import GraphAnalyzer, CentralityCalculator, CommunityDetector

graph_analyzer = GraphAnalyzer()
centrality_calc = CentralityCalculator()
community_detector = CommunityDetector()

analysis = graph_analyzer.analyze_graph(kg)

degree_centrality = centrality_calc.calculate_degree_centrality(kg)
betweenness_centrality = centrality_calc.calculate_betweenness_centrality(kg)
closeness_centrality = centrality_calc.calculate_closeness_centrality(kg)

communities = community_detector.detect_communities(kg, method="louvain")
connectivity = graph_analyzer.analyze_connectivity(kg)

print(f"Graph analytics:")
print(f"  - Communities: {len(communities)}")
print(f"  - Connected components: {len(connectivity.get('components', []))}")
print(f"  - Graph density: {analysis.get('density', 0):.3f}")
print(f"  - Central nodes (degree): {len(degree_centrality)}")


Graph analytics:
  - Communities: 4
  - Connected components: 2
  - Graph density: 0.000
  - Central nodes (degree): 4


## Temporal Graph Queries


In [14]:
from semantica.kg import TemporalGraphQuery

temporal_query = TemporalGraphQuery(
    enable_temporal_reasoning=True,
    temporal_granularity=TEMPORAL_GRANULARITY
)

query_results = temporal_query.query_at_time(
    kg,
    query={"type": "Transaction"},
    at_time="2024-01-01"
)

evolution = temporal_query.analyze_evolution(kg)
temporal_patterns = temporal_query.query_temporal_pattern(kg, pattern="sequence")

print(f"Temporal queries: {query_results.get('num_relationships', 0)} relationships at query time")
print(f"Temporal patterns detected: {temporal_patterns.get('num_patterns', 0)}")


Temporal queries: 656 relationships at query time
Temporal patterns detected: 0


## Detecting Patterns and Whale Movements


In [18]:

# Detect whale movements (large transactions)
whale_wallets = []
for entity in kg.get("entities", []):
    if entity.get("type") in ["Wallet", "Address"]:
        # Check for large transaction relationships
        related_rels = [r for r in kg.get("relationships", []) 
                        if r.get("source") == entity.get("id") or r.get("target") == entity.get("id")]
        if any("large" in str(r.get("type", "")).lower() or "whale" in str(r.get("type", "")).lower() 
               for r in related_rels):
            whale_wallets.append(entity)

# Detect suspicious patterns (high frequency transactions)
suspicious_patterns = []
for wallet in wallets[:10]:
    wallet_name = wallet.text
    # Find all relationships where wallet is source or target, and connected to Transaction entities
    wallet_relationships = [
        r for r in kg.get("relationships", [])
        if (r.get("source") == wallet_name or r.get("target") == wallet_name)
    ]
    # Find connected Transaction entities
    transaction_ids = set()
    for rel in wallet_relationships:
        other_entity_id = rel.get("target") if rel.get("source") == wallet_name else rel.get("source")
        # Check if the other entity is a Transaction
        other_entity = next((e for e in kg.get("entities", []) if e.get("id") == other_entity_id), None)
        if other_entity and other_entity.get("type") == "Transaction":
            transaction_ids.add(other_entity_id)
    
    transaction_count = len(transaction_ids)
    if transaction_count > 5:  # High transaction frequency
        suspicious_patterns.append({
            'wallet': wallet_name,
            'transaction_count': transaction_count
        })

print(f"Whale tracking: {len(whale_wallets)} large transaction wallets identified")
print(f"Suspicious patterns: {len(suspicious_patterns)} high-frequency wallets")


Whale tracking: 0 large transaction wallets identified
Suspicious patterns: 0 high-frequency wallets


## Analyzing Token Flows


In [19]:
# Analyze token flows through the network
from collections import deque

flow_analysis = []
for transaction in transactions[:10]:
    tx_name = transaction.text
    
    # Build adjacency list from relationships
    adjacency = {}
    for rel in kg.get("relationships", []):
        source = rel.get("source")
        target = rel.get("target")
        if source and target:
            if source not in adjacency:
                adjacency[source] = []
            if target not in adjacency[source]:
                adjacency[source].append(target)
            if target not in adjacency:
                adjacency[target] = []
            if source not in adjacency[target]:
                adjacency[target].append(source)
    
    # BFS to find wallets within 2 hops
    if tx_name not in adjacency:
        continue
    
    queue = deque([(tx_name, [tx_name], 0)])
    visited = {tx_name}
    paths_to_wallets = []
    
    while queue:
        node, path, hops = queue.popleft()
        
        if hops > 2:  # Max 2 hops
            continue
        
        # Check if current node is a Wallet or Address
        entity = next((e for e in kg.get("entities", []) if e.get("id") == node), None)
        if entity and entity.get("type") in ["Wallet", "Address"] and node != tx_name:
            paths_to_wallets.append({
                'transaction': tx_name,
                'flow_path': path,
                'target': node,
                'path_length': len(path) - 1
            })
        
        # Continue BFS
        for neighbor in adjacency.get(node, []):
            if neighbor not in visited:
                visited.add(neighbor)
                queue.append((neighbor, path + [neighbor], hops + 1))
    
    flow_analysis.extend(paths_to_wallets)

flow_analysis.sort(key=lambda x: x['path_length'])

print(f"Flow analysis: {len(flow_analysis)} token flow paths identified")
for i, flow in enumerate(flow_analysis[:5], 1):
    print(f"{i}. {flow['transaction']} -> {flow['target']} (path length: {flow['path_length']})")


Flow analysis: 0 token flow paths identified


## Storing Transaction Network (Optional)


In [20]:
from semantica.graph_store import GraphStore

# Optional: Store to persistent graph database
# graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
# graph_store.store_graph(kg)

print("Graph store configured (commented out for demo)")


Graph store configured (commented out for demo)


## GraphRAG: Hybrid Vector + Graph Queries


In [21]:
from semantica.context import AgentContext

context = AgentContext(vector_store=vector_store, knowledge_graph=kg)

query = "What are the largest transactions?"
results = context.retrieve(
    query,
    max_results=10,
    use_graph=True,
    expand_graph=True,
    include_entities=True,
    include_relationships=True
)

print(f"GraphRAG query: '{query}'")
print(f"\nRetrieved {len(results)} results:\n")
for i, result in enumerate(results[:5], 1):
    print(f"{i}. Score: {result.get('score', 0):.3f}")
    print(f"   Content: {result.get('content', '')[:200]}...")
    if result.get('related_entities'):
        print(f"   Related entities: {len(result['related_entities'])}")
    print()


Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


GraphRAG query: 'What are the largest transactions?'

Retrieved 10 results:

1. Score: 0.574
   Content: Entity(text='canton is a ORG....

2. Score: 0.500
   Content: ...

3. Score: 0.475
   Content: Entity(text='Hong kong' is a GPE....

4. Score: 0.329
   Content: HPC is a ORG. identify the Cubic Kilometre Neutrino (ORG). be Vanguard ETFs (PERSON), CoinDesk....
   Related entities: 2

5. Score: 0.324
   Content: nearly 20% is a PERCENT....



## Visualizing the Transaction Network


In [23]:
from semantica.visualization import KGVisualizer

# Create visualizer with force-directed layout for better interactive visualization
visualizer = KGVisualizer(
    layout="force",  # Force-directed layout for better node distribution
    color_scheme="vibrant",  # Better color scheme
    node_size=15,
    edge_width=1.5
)

# Create interactive network visualization
fig = visualizer.visualize_network(
    kg,
    output="interactive",  # Interactive Plotly visualization
    file_path="transaction_network.html",  # Also save to HTML file
    node_color_by="type",  # Color nodes by entity type
    hover_data=["type", "label"]  # Show type and label in hover tooltip
)

# Display the interactive figure in the notebook
fig.show()

print("‚úÖ Interactive visualization displayed above")
print("üìÅ Visualization also saved to transaction_network.html")


‚úÖ Interactive visualization displayed above
üìÅ Visualization also saved to transaction_network.html


## Exporting Results


In [None]:
from semantica.export import GraphExporter, export_csv

# Export to graph formats using GraphExporter
graph_exporter = GraphExporter()
graph_exporter.export(kg, file_path="transaction_network.json", format="json")
graph_exporter.export(kg, file_path="transaction_network.graphml", format="graphml")
graph_exporter.export(kg, file_path="transaction_network.gexf", format="gexf")
graph_exporter.export(kg, file_path="transaction_network.dot", format="dot")

# Export to CSV using export_csv convenience function
# This creates separate CSV files for entities and relationships
export_csv(kg, "transaction_network")

print("‚úÖ Exported transaction network to multiple formats:")
print("   - JSON: transaction_network.json")
print("   - GraphML: transaction_network.graphml")
print("   - GEXF: transaction_network.gexf")
print("   - DOT: transaction_network.dot")
print("   - CSV: transaction_network_entities.csv, transaction_network_relationships.csv")


‚úÖ Exported transaction network to multiple formats:
   - JSON: transaction_network.json
   - GraphML: transaction_network.graphml
   - GEXF: transaction_network.gexf
   - DOT: transaction_network.dot
   - CSV: transaction_network.csv
