[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/blockchain/02_Transaction_Network_Analysis.ipynb)

# Transaction Network Analysis - Pattern Detection & Graph Analytics

## Overview

This notebook demonstrates **blockchain transaction network analysis** using Semantica with focus on **pattern detection**, **network analytics**, and **real-time processing**. The pipeline analyzes blockchain transaction networks to detect patterns, identify whale movements, and analyze token flows.

### Key Features

- **Pattern Detection**: Emphasizes graph analytics for transaction pattern recognition
- **Network Analytics**: Uses centrality measures and community detection
- **Temporal Analysis**: Time-aware queries and transaction evolution tracking
- **Whale Tracking**: Identifies large transaction movements
- **Flow Analysis**: Analyzes token flows through the network
- **Comprehensive Data Sources**: Multiple blockchain APIs, analytics platforms, and databases

### Learning Objectives

- Ingest blockchain transaction data from multiple sources
- Extract transaction entities (Transactions, Wallets, Addresses, Blocks, Flows)
- Build temporal transaction network graphs
- Perform graph analytics (centrality, communities, connectivity)
- Detect patterns and whale movements
- Analyze token flows and transaction paths
- Store and query transaction data using vector stores and graph stores

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Conflict Detection]
    G --> H[Transaction Network Graph]
    H --> I[Embeddings]
    I --> J[Vector Store]
    H --> K[Graph Analytics]
    K --> L[Temporal Queries]
    L --> M[Pattern Detection]
    M --> N[Flow Analysis]
    J --> O[GraphRAG Queries]
    H --> P[Graph Store]
    O --> Q[Visualization]
    P --> Q
    Q --> R[Export]
```

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


## Configuration & Setup


In [None]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key-here")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 1000
CHUNK_OVERLAP = 200
TEMPORAL_GRANULARITY = "day"


## Ingesting Blockchain Transaction Data


In [None]:
from semantica.ingest import WebIngestor, FileIngestor, FeedIngestor
import os
from contextlib import redirect_stderr
from io import StringIO

os.makedirs("data", exist_ok=True)

# Example blockchain API endpoints (in production, use actual API keys)
api_sources = [
    ("Blockchain.com Stats", "https://api.blockchain.info/stats"),
    # Add more API endpoints as needed
]

web_ingestor = WebIngestor()
all_documents = []

print(f"Ingesting from {len(api_sources)} API sources...")
for i, (api_name, api_url) in enumerate(api_sources, 1):
    try:
        with redirect_stderr(StringIO()):
            api_documents = web_ingestor.ingest(api_url, method="url")
        
        api_count = 0
        for doc in api_documents:
            if not hasattr(doc, 'metadata'):
                doc.metadata = {}
            doc.metadata['source'] = api_name
            all_documents.append(doc)
            api_count += 1
        
        if api_count > 0:
            print(f"  [{i}/{len(api_sources)}] {api_name}: {api_count} documents")
    except Exception:
        continue

if not all_documents:
    tx_data = """
    Transaction 0x123 transfers 1000 ETH from wallet 0xABC to wallet 0xDEF at block 18500000.
    Transaction 0x456 transfers 500 BTC from wallet 0xGHI to wallet 0xJKL at block 18500001.
    Large transaction 0x789 moves 10000 ETH (whale movement) from wallet 0xMNO to wallet 0xPQR at block 18500002.
    Transaction 0xabc transfers 200 USDT from wallet 0xSTU to wallet 0xVWX at block 18500003.
    Transaction 0xdef transfers 5000 ETH from wallet 0xYZA to wallet 0xBCD at block 18500004.
    Transaction 0x111 transfers 3000 DAI from wallet 0xEFG to wallet 0xHIJ at block 18500005.
    Transaction 0x222 transfers 1500 USDC from wallet 0xKLM to wallet 0xNOP at block 18500006.
    """
    with open("data/transactions.txt", "w") as f:
        f.write(tx_data)
    file_ingestor = FileIngestor()
    all_documents = file_ingestor.ingest("data/transactions.txt")

documents = all_documents
print(f"Ingested {len(documents)} documents")


## Parsing Transaction Documents


In [None]:
from semantica.parse import DocumentParser

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        parsed = parser.parse(
            doc.content if hasattr(doc, 'content') else str(doc),
            content_type="text"
        )
        parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc)
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

documents = parsed_documents


## Normalizing and Chunking Transaction Data


In [None]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
splitter = TextSplitter(
    method="entity_aware",
    ner_method="spacy",
    chunk_size=CHUNK_SIZE,
    chunk_overlap=CHUNK_OVERLAP
)

print(f"Normalizing {len(documents)} documents...")
normalized_documents = []
for i, doc in enumerate(documents, 1):
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        remove_extra_whitespace=True,
        lowercase=False
    )
    normalized_documents.append(normalized_text)
    if i % 50 == 0 or i == len(documents):
        print(f"  Normalized {i}/{len(documents)} documents...")

print(f"Chunking {len(normalized_documents)} documents...")
chunked_documents = []
for i, doc_text in enumerate(normalized_documents, 1):
    try:
        with redirect_stderr(StringIO()):
            chunks = splitter.split(doc_text)
        chunked_documents.extend(chunks)
    except Exception:
        simple_splitter = TextSplitter(method="recursive", chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
        chunks = simple_splitter.split(doc_text)
        chunked_documents.extend(chunks)
    if i % 50 == 0 or i == len(normalized_documents):
        print(f"  Chunked {i}/{len(normalized_documents)} documents ({len(chunked_documents)} chunks so far)")

print(f"Created {len(chunked_documents)} chunks from {len(normalized_documents)} documents")


## Extracting Transaction Entities


In [None]:
from semantica.semantic_extract import NERExtractor

entity_extractor = NERExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0
)

all_entities = []
print(f"Extracting entities from {len(chunked_documents)} chunks...")
for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        entities = entity_extractor.extract_entities(
            chunk_text,
            entity_types=["Transaction", "Wallet", "Address", "Block", "Flow"]
        )
        all_entities.extend(entities)
    except Exception:
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_entities)} entities found)")

transactions = [e for e in all_entities if e.label == "Transaction" or "transaction" in e.label.lower()]
wallets = [e for e in all_entities if e.label in ["Wallet", "Address"] or "wallet" in e.label.lower() or "address" in e.label.lower()]
blocks = [e for e in all_entities if e.label == "Block" or "block" in e.label.lower()]

print(f"Extracted {len(transactions)} transactions, {len(wallets)} wallets/addresses, {len(blocks)} blocks")


## Extracting Transaction Relationships


In [None]:
from semantica.semantic_extract import RelationExtractor

relation_extractor = RelationExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0
)

all_relationships = []
print(f"Extracting relationships from {len(chunked_documents)} chunks...")
for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        relationships = relation_extractor.extract_relations(
            chunk_text,
            entities=all_entities,
            relation_types=["transfers", "from", "to", "in_block", "contains", "flows_to"]
        )
        all_relationships.extend(relationships)
    except Exception:
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_relationships)} relationships found)")

print(f"Extracted {len(all_relationships)} relationships")


## Resolving Duplicate Transactions


In [None]:
from semantica.kg import EntityResolver
from semantica.semantic_extract import Entity

# Convert Entity objects to dictionaries for EntityResolver
print(f"Converting {len(all_entities)} entities to dictionaries...")
entity_dicts = [{"name": e.text, "type": e.label, "confidence": e.confidence} for e in all_entities]

# Use EntityResolver class to resolve duplicates
entity_resolver = EntityResolver(strategy="fuzzy", similarity_threshold=0.85)

print(f"Resolving duplicates in {len(entity_dicts)} entities...")
resolved_entities = entity_resolver.resolve_entities(entity_dicts)

# Convert back to Entity objects
print(f"Converting {len(resolved_entities)} resolved entities back to Entity objects...")
merged_entities = [
    Entity(text=e["name"], label=e["type"], confidence=e.get("confidence", 1.0))
    for e in resolved_entities
]

print(f"Deduplicated {len(entity_dicts)} entities to {len(merged_entities)} unique entities")


## Detecting Transaction Conflicts


In [None]:
from semantica.conflicts import ConflictDetector

conflict_detector = ConflictDetector()

conflicts = conflict_detector.detect_conflicts(merged_entities, all_relationships)

if conflicts:
    resolved = conflict_detector.resolve_conflicts(conflicts, strategy="highest_confidence")
    print(f"Detected {len(conflicts)} conflicts, resolved {len(resolved)}")
else:
    print("No conflicts detected")


## Building Temporal Transaction Network Graph


In [None]:
from semantica.kg import GraphBuilder

graph_builder = GraphBuilder(
    merge_entities=True,
    resolve_conflicts=True,
    entity_resolution_strategy="fuzzy",
    enable_temporal=True,
    temporal_granularity=TEMPORAL_GRANULARITY
)

print(f"Building knowledge graph...")
kg_sources = [{
    "entities": [{"text": e.text, "type": e.label, "confidence": e.confidence} for e in merged_entities],
    "relationships": [{"source": r.source, "target": r.target, "type": r.label, "confidence": r.confidence} for r in all_relationships]
}]

kg = graph_builder.build(kg_sources)

entities_count = len(kg.get('entities', []))
relationships_count = len(kg.get('relationships', []))
print(f"Graph: {entities_count} entities, {relationships_count} relationships")


## Generating Embeddings for Transactions and Wallets


In [None]:
from semantica.embeddings import EmbeddingGenerator

embedding_gen = EmbeddingGenerator(
    provider="sentence_transformers",
    model=EMBEDDING_MODEL
)

print(f"Generating embeddings for {len(transactions)} transactions and {len(wallets)} wallets...")
transaction_texts = [t.text for t in transactions]
transaction_embeddings = embedding_gen.generate_embeddings(transaction_texts)

wallet_texts = [w.text for w in wallets]
wallet_embeddings = embedding_gen.generate_embeddings(wallet_texts)

print(f"Generated {len(transaction_embeddings)} transaction embeddings and {len(wallet_embeddings)} wallet embeddings")


## Populating Vector Store


In [None]:
from semantica.vector_store import VectorStore

vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

print(f"Storing {len(transaction_embeddings)} transaction vectors and {len(wallet_embeddings)} wallet vectors...")
transaction_ids = vector_store.store_vectors(
    vectors=transaction_embeddings,
    metadata=[{"type": "transaction", "name": t.text, "label": t.label} for t in transactions]
)

wallet_ids = vector_store.store_vectors(
    vectors=wallet_embeddings,
    metadata=[{"type": "wallet", "name": w.text, "label": w.label} for w in wallets]
)

print(f"Stored {len(transaction_ids)} transaction vectors and {len(wallet_ids)} wallet vectors")


## Analyzing Graph Structure


In [None]:
from semantica.kg import GraphAnalyzer, CentralityCalculator, CommunityDetector

graph_analyzer = GraphAnalyzer()
centrality_calc = CentralityCalculator()
community_detector = CommunityDetector()

analysis = graph_analyzer.analyze_graph(kg)

degree_centrality = centrality_calc.calculate_degree_centrality(kg)
betweenness_centrality = centrality_calc.calculate_betweenness_centrality(kg)
closeness_centrality = centrality_calc.calculate_closeness_centrality(kg)

communities = community_detector.detect_communities(kg, method="louvain")
connectivity = graph_analyzer.analyze_connectivity(kg)

print(f"Graph analytics:")
print(f"  - Communities: {len(communities)}")
print(f"  - Connected components: {len(connectivity.get('components', []))}")
print(f"  - Graph density: {analysis.get('density', 0):.3f}")
print(f"  - Central nodes (degree): {len(degree_centrality)}")


## Temporal Graph Queries


In [None]:
from semantica.kg import TemporalGraphQuery

temporal_query = TemporalGraphQuery(
    enable_temporal_reasoning=True,
    temporal_granularity=TEMPORAL_GRANULARITY
)

query_results = temporal_query.query_at_time(
    kg,
    query={"type": "Transaction"},
    at_time="2024-01-01"
)

evolution = temporal_query.analyze_evolution(kg)
temporal_patterns = temporal_query.detect_temporal_patterns(kg, pattern_type="sequence")

print(f"Temporal queries: {len(query_results)} transactions at query time")
print(f"Temporal patterns detected: {len(temporal_patterns)}")


## Detecting Patterns and Whale Movements


In [None]:
# Detect whale movements (large transactions)
whale_wallets = []
for entity in kg.get("entities", []):
    if entity.get("type") in ["Wallet", "Address"]:
        # Check for large transaction relationships
        related_rels = [r for r in kg.get("relationships", []) 
                        if r.get("source") == entity.get("id") or r.get("target") == entity.get("id")]
        if any("large" in str(r.get("type", "")).lower() or "whale" in str(r.get("type", "")).lower() 
               for r in related_rels):
            whale_wallets.append(entity)

# Detect suspicious patterns (high frequency transactions)
suspicious_patterns = []
for wallet in wallets[:10]:
    wallet_name = wallet.text
    paths = graph_analyzer.find_paths(
        kg,
        source=wallet_name,
        target_type="Transaction",
        max_hops=1
    )
    if len(paths) > 5:  # High transaction frequency
        suspicious_patterns.append({
            'wallet': wallet_name,
            'transaction_count': len(paths)
        })

print(f"Whale tracking: {len(whale_wallets)} large transaction wallets identified")
print(f"Suspicious patterns: {len(suspicious_patterns)} high-frequency wallets")


## Analyzing Token Flows


In [None]:
# Analyze token flows through the network
flow_analysis = []
for transaction in transactions[:10]:
    tx_name = transaction.text
    paths = graph_analyzer.find_paths(
        kg,
        source=tx_name,
        target_type="Wallet",
        max_hops=2
    )
    for path in paths:
        if path.get('target_type') in ['Wallet', 'Address']:
            flow_analysis.append({
                'transaction': tx_name,
                'flow_path': path.get('path', []),
                'target': path.get('target'),
                'path_length': len(path.get('path', []))
            })

flow_analysis.sort(key=lambda x: x['path_length'])

print(f"Flow analysis: {len(flow_analysis)} token flow paths identified")
for i, flow in enumerate(flow_analysis[:5], 1):
    print(f"{i}. {flow['transaction']} -> {flow['target']} (path length: {flow['path_length']})")


## Storing Transaction Network (Optional)


In [None]:
from semantica.graph_store import GraphStore

# Optional: Store to persistent graph database
# graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
# graph_store.store_graph(kg)

print("Graph store configured (commented out for demo)")


## GraphRAG: Hybrid Vector + Graph Queries


In [None]:
from semantica.context import AgentContext

context = AgentContext(vector_store=vector_store, knowledge_graph=kg)

query = "What are the largest transactions?"
results = context.retrieve(
    query,
    max_results=10,
    use_graph=True,
    expand_graph=True,
    include_entities=True,
    include_relationships=True
)

print(f"GraphRAG query: '{query}'")
print(f"\nRetrieved {len(results)} results:\n")
for i, result in enumerate(results[:5], 1):
    print(f"{i}. Score: {result.get('score', 0):.3f}")
    print(f"   Content: {result.get('content', '')[:200]}...")
    if result.get('related_entities'):
        print(f"   Related entities: {len(result['related_entities'])}")
    print()


## Visualizing the Transaction Network


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(
    kg,
    output_path="transaction_network.html",
    layout="hierarchical",
    node_size=20
)

print("Visualization saved to transaction_network.html")


## Exporting Results


In [None]:
from semantica.export import GraphExporter

exporter = GraphExporter()
exporter.export(kg, output_path="transaction_network.json", format="json")
exporter.export(kg, output_path="transaction_network.graphml", format="graphml")
exporter.export(kg, output_path="transaction_network.csv", format="csv")

print("Exported transaction network to JSON, GraphML, and CSV formats")
