[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/finance/02_Fraud_Detection.ipynb)

# Fraud Detection - Temporal KGs & Pattern Detection

## Overview

This notebook demonstrates **fraud detection** using Semantica with focus on **temporal knowledge graphs**, **anomaly detection**, and **pattern recognition**. The pipeline analyzes transaction streams using temporal knowledge graphs to detect fraud patterns and anomalies in real-time.

### Key Features

- **Temporal Knowledge Graphs**: Builds temporal KGs to track transaction patterns over time
- **Anomaly Detection**: Uses graph-based pattern recognition to identify fraud
- **Real-Time Alerts**: Generates automated alerts for detected fraud patterns
- **Pattern Recognition**: Identifies suspicious transaction patterns
- **Stream Processing**: Demonstrates real-time transaction stream processing

### Pipeline Architecture

1. **Phase 0**: Setup & Configuration
2. **Phase 1**: Transaction Stream Ingestion
3. **Phase 2**: Transaction Entity Extraction
4. **Phase 3**: Temporal Knowledge Graph Construction
5. **Phase 4**: Pattern Detection
6. **Phase 5**: Anomaly Detection
7. **Phase 6**: Real-Time Alerting
8. **Phase 7**: Visualization & Export

---

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas groq


---

## Phase 0: Setup & Configuration


In [None]:
import os
from semantica.core import Semantica, ConfigManager
from semantica.reasoning import GraphReasoner

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "your-key")

config_dict = {
    "project_name": "Fraud_Detection",
    "extraction": {"provider": "groq", "model": "llama-3.1-8b-instant"},
    "knowledge_graph": {"backend": "networkx", "temporal": True}
}

config = ConfigManager().load_from_dict(config_dict)
core = Semantica(config=config)
print("Configured for fraud detection with temporal KGs focus")


---

## Phase 1: Real Data Ingestion (Transaction Stream)

Ingest transaction data from simulated stream using StreamIngestor.


In [None]:
from semantica.ingest import StreamIngestor, FileIngestor
import os

os.makedirs("data", exist_ok=True)

# Option 1: Ingest from transaction stream (simulated Kafka)
# In production: stream_ingestor = StreamIngestor()
# stream_documents = stream_ingestor.ingest("kafka://localhost:9092/transactions", method="kafka")

# Fallback: Sample transaction stream data
tx_data = """
2024-01-01 10:00:00 - Transaction $1000 from Account A123 to Account B456
2024-01-01 10:01:00 - Transaction $5000 from Account A123 to Account C789
2024-01-01 10:02:00 - Transaction $10000 from Account A123 to Account D012 (unusual pattern)
2024-01-01 10:03:00 - Multiple rapid transactions from Account A123 (suspicious)
2024-01-01 10:04:00 - Transaction $2000 from Account B456 to Account E789
2024-01-01 10:05:00 - Large transaction $50000 from Account A123 to Account F012 (fraud alert)
"""

with open("data/transactions.txt", "w") as f:
    f.write(tx_data)

documents = FileIngestor().ingest("data/transactions.txt")
print(f"Ingested {len(documents)} documents from transaction stream")


---

## Phase 2: Text Normalization & Conflict Detection

Normalize transaction data and detect conflicts from multiple sources.


In [None]:
from semantica.normalize import TextNormalizer
from semantica.conflicts import ConflictDetector

# Normalize transaction data
normalizer = TextNormalizer()
normalized_documents = []
for doc in documents:
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        normalize_numbers=True,  # Normalize transaction amounts
        remove_extra_whitespace=True
    )
    normalized_documents.append(normalized_text)

print(f"Normalized {len(normalized_documents)} documents")

# Build temporal knowledge graph
result = core.build_knowledge_base(
    sources=normalized_documents,
    custom_entity_types=["Transaction", "Account", "Device", "Pattern", "Anomaly"],
    graph=True,
    temporal=True
)

kg = result["knowledge_graph"]
entities = result["entities"]

# Detect conflicts in transaction data (e.g., conflicting account information)
detector = ConflictDetector()
conflicts = detector.detect_conflicts(entities, kg.get("relationships", []))

print(f"Built temporal transaction KG with {len(kg.get('entities', []))} entities")
print(f"Detected {len(conflicts)} conflicts in transaction data")
if conflicts:
    resolved = detector.resolve_conflicts(conflicts, strategy="highest_confidence")
    print(f"Resolved {len(resolved)} conflicts")
print("Focus: Temporal KGs, anomaly detection, pattern recognition, real-time alerts")


---

## Phase 3-4: Temporal Pattern Detection

Use TemporalPatternDetector for fraud pattern detection.


In [None]:
from semantica.kg import TemporalGraphQuery, TemporalPatternDetector

# Initialize temporal pattern detector
temporal_query = TemporalGraphQuery(enable_temporal_reasoning=True, temporal_granularity="minute")
pattern_detector = TemporalPatternDetector()

# Detect temporal fraud patterns
fraud_patterns = pattern_detector.detect_patterns(kg, pattern_type="fraud")
temporal_patterns = temporal_query.detect_temporal_patterns(kg, pattern_type="sequence")

print(f"Detected {len(fraud_patterns)} fraud patterns")
print(f"Detected {len(temporal_patterns)} temporal patterns")


In [None]:
# Detect fraud patterns using reasoning
reasoner = GraphReasoner(kg)
reasoning_patterns = reasoner.find_patterns(pattern_type="fraud")

# Identify suspicious accounts
suspicious_accounts = [e for e in kg.get("entities", []) 
                       if e.get("type") == "Account" and 
                       any("suspicious" in str(r.get("predicate", "")).lower() or
                           "fraud" in str(r.get("predicate", "")).lower()
                           for r in kg.get("relationships", []) 
                           if r.get("source") == e.get("id"))]

print(f"Pattern detection: {len(reasoning_patterns)} fraud patterns from reasoning")
print(f"Temporal patterns: {len(temporal_patterns)} temporal fraud patterns")
print(f"Anomaly detection: {len(suspicious_accounts)} suspicious accounts flagged")
print("This cookbook emphasizes temporal KGs, TemporalPatternDetector, and pattern-based fraud detection")


---

## Phase 7: Visualization & Summary

Visualize temporal fraud detection knowledge graph.


In [None]:
from semantica.visualization import KGVisualizer

visualizer = KGVisualizer()
visualizer.visualize(kg, output_path="fraud_detection_kg.html", layout="temporal")

print("Fraud detection analysis complete")
print("\n=== Pipeline Summary ===")
print(f"✓ Ingested {len(documents)} documents from transaction stream")
print(f"✓ Normalized {len(normalized_documents)} documents")
print(f"✓ Detected and resolved {len(conflicts)} conflicts")
print(f"✓ Built temporal KG with {len(kg.get('entities', []))} entities")
print(f"✓ Detected {len(reasoning_patterns)} fraud patterns and {len(suspicious_accounts)} suspicious accounts")
print(f"✓ Emphasizes: Temporal KGs, TemporalPatternDetector, anomaly detection, pattern recognition, real-time alerts")
