[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/finance/02_Fraud_Detection.ipynb)

# Fraud Detection - Temporal KGs & Pattern Detection

## Overview

This notebook demonstrates **fraud detection** using Semantica with focus on **temporal knowledge graphs**, **anomaly detection**, and **pattern recognition**. The pipeline analyzes transaction streams using temporal knowledge graphs to detect fraud patterns and anomalies in real-time.

### Key Features

- **Temporal Knowledge Graphs**: Builds temporal KGs to track transaction patterns over time
- **Pattern Detection**: Uses TemporalPatternDetector and reasoning for fraud detection
- **Anomaly Detection**: Uses graph-based pattern recognition to identify fraud
- **Conflict Detection**: Detects conflicting transaction data from multiple sources
- **Real-Time Stream Processing**: Demonstrates real-time transaction stream processing
- **Comprehensive Data Sources**: Multiple transaction streams, APIs, and fraud databases
- **Modular Architecture**: Direct use of Semantica modules without core orchestrator

### Learning Objectives

- Ingest transaction data from streams and APIs
- Extract transaction entities (Transactions, Accounts, Devices, Patterns, Anomalies)
- Build temporal transaction knowledge graphs
- Perform temporal queries and pattern detection
- Detect fraud patterns using graph reasoning
- Analyze transaction networks using graph analytics
- Store and query transaction data using vector stores

### Pipeline Flow

```mermaid
graph TD
    A[Data Ingestion] --> B[Document Parsing]
    B --> C[Text Processing]
    C --> D[Entity Extraction]
    D --> E[Relationship Extraction]
    E --> F[Deduplication]
    F --> G[Conflict Detection]
    G --> H[Temporal Knowledge Graph]
    H --> I[Embeddings]
    I --> J[Vector Store]
    H --> K[Temporal Queries]
    K --> L[Temporal Pattern Detection]
    L --> M[Reasoning & Fraud]
    M --> N[Graph Analytics]
    J --> O[GraphRAG Queries]
    N --> O
    O --> P[Visualization]
    P --> Q[Export]
```

## Installation


In [None]:
%pip install -qU semantica networkx matplotlib plotly pandas faiss-cpu beautifulsoup4 groq sentence-transformers scikit-learn


## Configuration & Setup


In [1]:
import os

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY", "")

# Configuration constants
EMBEDDING_DIMENSION = 384
EMBEDDING_MODEL = "all-MiniLM-L6-v2"
CHUNK_SIZE = 500
CHUNK_OVERLAP = 50
TEMPORAL_GRANULARITY = "minute"


## Ingesting Transaction Data from Streams


In [2]:
from semantica.ingest import StreamIngestor, WebIngestor, FileIngestor
import os
from contextlib import redirect_stderr
from io import StringIO

os.makedirs("data", exist_ok=True)

all_documents = []

# ============================================================================
# REAL DATA SOURCE 1: CSV Transaction Data
# ============================================================================
# Ingest transaction data from CSV file with comprehensive fraud indicators
print("üìä Ingesting transaction data from CSV...")
csv_path = "data/transactions.csv"
file_ingestor = FileIngestor()

try:
    csv_documents = file_ingestor.ingest(csv_path)
    for doc in csv_documents:
        if not hasattr(doc, 'metadata'):
            doc.metadata = {}
        doc.metadata['source'] = 'Transaction CSV'
        doc.metadata['data_type'] = 'transaction'
        all_documents.append(doc)
    print(f"  ‚úÖ Loaded {len(csv_documents)} documents from CSV")
except Exception as e:
    print(f"  ‚ö†Ô∏è  CSV ingestion failed: {e}")

# ============================================================================
# REAL DATA SOURCE 2: JSON Account & Fraud Pattern Data
# ============================================================================
# Ingest account metadata, device information, and fraud patterns from JSON
print("üìä Ingesting account and fraud pattern data from JSON...")
json_path = "data/accounts.json"

try:
    json_documents = file_ingestor.ingest(json_path)
    for doc in json_documents:
        if not hasattr(doc, 'metadata'):
            doc.metadata = {}
        doc.metadata['source'] = 'Account JSON'
        doc.metadata['data_type'] = 'account_metadata'
        all_documents.append(doc)
    print(f"  ‚úÖ Loaded {len(json_documents)} documents from JSON")
except Exception as e:
    print(f"  ‚ö†Ô∏è  JSON ingestion failed: {e}")

# ============================================================================
# REAL DATA SOURCE 3: External Payment Processor API (Example)
# ============================================================================
# In production, you would use real APIs like:
# - Stripe API: https://api.stripe.com/v1/charges
# - PayPal API: https://api.paypal.com/v1/payments
# - Square API: https://connect.squareup.com/v2/payments
# 
# For demonstration, we'll use a mock API endpoint that returns transaction data
# In production, replace with actual API endpoints and authentication
print("üìä Attempting to ingest from payment processor API...")
payment_apis = [
    "https://api.stripe.com/v1/charges",  # Stripe (requires API key)
    "https://api.paypal.com/v1/payments",  # PayPal (requires OAuth)
    # Add your actual API endpoints here
]

web_ingestor = WebIngestor()
api_success = False

for api_url in payment_apis:
    try:
        with redirect_stderr(StringIO()):
            api_documents = web_ingestor.ingest(api_url, method="url")
        if api_documents:
            for doc in api_documents:
                if not hasattr(doc, 'metadata'):
                    doc.metadata = {}
                doc.metadata['source'] = f'Payment API ({api_url.split("//")[1].split("/")[0]})'
                doc.metadata['data_type'] = 'api_transaction'
                all_documents.append(doc)
            print(f"  ‚úÖ Loaded {len(api_documents)} documents from {api_url}")
            api_success = True
            break
    except Exception:
        continue

if not api_success:
    print("  ‚ÑπÔ∏è  API endpoints require authentication. Using local data sources.")

# ============================================================================
# REAL DATA SOURCE 4: Stream Ingestion (Kafka/RabbitMQ)
# ============================================================================
# For real-time fraud detection, ingest from message streams
# Example Kafka configuration:
# stream_config = {
#     "bootstrap_servers": "localhost:9092",
#     "topic": "transactions",
#     "group_id": "fraud_detection"
# }
# stream_ingestor = StreamIngestor()
# stream_documents = stream_ingestor.ingest(stream_config, method="kafka")
print("üìä Stream ingestion (Kafka/RabbitMQ) - Configure in production")

# ============================================================================
# PUBLIC FRAUD DETECTION DATASETS (References)
# ============================================================================
# For additional training and testing, consider these public datasets:
# 
# 1. Credit Card Fraud Detection (Kaggle)
#    URL: https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
#    - 284,807 transactions, 492 fraudulent (0.172% fraud rate)
#    - Download and ingest: file_ingestor.ingest("creditcard.csv")
#
# 2. IEEE-CIS Fraud Detection (Kaggle)
#    URL: https://www.kaggle.com/competitions/ieee-fraud-detection
#    - 590,540 transactions, ~3.5% fraudulent
#    - 431 features (400 numerical, 31 categorical)
#
# 3. PaySim Synthetic Financial Dataset
#    URL: https://www.kaggle.com/datasets/ealaxi/paysim1
#    - 6,000,000 mobile money transactions
#    - ~0.14% fraud rate
#
# 4. UCI Credit Card Dataset
#    URL: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
#    - 30,000 credit card clients
#    - 23 features including payment history
#
# To use these datasets:
# 1. Download the dataset files
# 2. Place them in the data/ directory
# 3. Use FileIngestor to load: file_ingestor.ingest("data/dataset.csv")

# ============================================================================
# Fallback: Generate sample data if no sources available
# ============================================================================
if not all_documents:
    print("‚ö†Ô∏è  No data sources found. Generating sample transaction data...")
    tx_data = """
    2024-01-01 10:00:00 - Transaction $1000 from Account A123 to Account B456
    2024-01-01 10:01:00 - Transaction $5000 from Account A123 to Account C789
    2024-01-01 10:02:00 - Transaction $10000 from Account A123 to Account D012 (unusual pattern)
    2024-01-01 10:03:00 - Multiple rapid transactions from Account A123 (suspicious)
    2024-01-01 10:04:00 - Transaction $2000 from Account B456 to Account E789
    2024-01-01 10:05:00 - Large transaction $50000 from Account A123 to Account F012 (fraud alert)
    2024-01-01 10:06:00 - Transaction $1500 from Account C789 to Account G345
    2024-01-01 10:07:00 - Unusual device login from Account A123 (suspicious activity)
    """
    with open("data/transactions.txt", "w") as f:
        f.write(tx_data)
    file_ingestor = FileIngestor()
    all_documents = file_ingestor.ingest("data/transactions.txt")

documents = all_documents
print(f"\n‚úÖ Total ingested: {len(documents)} documents from {len(set(doc.metadata.get('source', 'unknown') for doc in documents))} sources")


üìä Ingesting transaction data from CSV...


Status,Action,Module,Submodule,Progress,ETA,Rate,Time,Extracted
‚úÖ,Semantica is indexing,üìä vector_store,VectorStore,100.0%,-,-,0.01s,-
‚úÖ,Semantica is reasoning,ü§î reasoning,Reasoner,100.0%,-,-,0.00s,-
‚úÖ,Semantica is building,üß† kg,CentralityCalculator,100.0%,-,-,0.01s,-
‚úÖ,Semantica is building,üß† kg,CommunityDetector,100.0%,-,-,0.03s,-
‚úÖ,Semantica is processing,üîó context,ContextGraph,100.0%,-,-,0.02s,-
‚úÖ,Semantica is processing,üîó context,AgentMemory,100.0%,-,-,0.04s,-
‚ùå,Semantica is embedding,üíæ embeddings,TextEmbedder,-,-,-,0.00s,-
‚úÖ,Semantica is processing,üîó context,ContextRetriever,100.0%,-,-,9.55s,-
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,100.0%,-,-,0.10s,-
‚úÖ,Semantica is exporting,üíæ export,GraphExporter,100.0%,-,-,0.01s,-


üîÑ Semantica is ingesting: File: transactions.csv üì• ingest FileIngestor |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.03s Extracted: -  ‚úÖ Loaded 1 documents from CSV
üìä Ingesting account and fraud pattern data from JSON...
üîÑ Semantica is ingesting: File: accounts.json üì• ingest FileIngestor |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -  ‚úÖ Loaded 1 documents from JSON
üìä Attempting to ingest from payment processor API...
  ‚ÑπÔ∏è  API endpoints require authentication. Using local data sources.
üìä Stream ingestion (Kafka/RabbitMQ) - Configure in production

‚úÖ Total ingested: 2 documents from 2 sources


In [22]:
documents

[FileObject(path='d:\\Work\\semantica\\cookbook\\use_cases\\finance\\data\\transactions.csv', name='transactions.csv', size=3356, file_type='csv', mime_type='application/vnd.ms-excel', content=b'timestamp,transaction_id,from_account,to_account,amount,currency,transaction_type,merchant_category,device_id,ip_address,location,is_fraud,risk_score,notes\r\n2024-01-01 10:00:00,TX001,A123,B456,1000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.15,Normal transfer\r\n2024-01-01 10:01:00,TX002,A123,C789,5000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.25,Second transfer from same account\r\n2024-01-01 10:02:00,TX003,A123,D012,10000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,1,0.85,Unusual pattern - rapid large transfers\r\n2024-01-01 10:03:00,TX004,A123,E345,7500.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.90,Multiple rapid transactions - device change\r\n2024-01-01 10:04:00,TX005,B456,E789,2000.00,USD,transfer,internal,DEV003,198.51.100.1,Chicago

## Parsing Transaction Documents


In [3]:
from semantica.parse import DocumentParser

parser = DocumentParser()

print(f"Parsing {len(documents)} documents...")
parsed_documents = []
for i, doc in enumerate(documents, 1):
    try:
        parsed = parser.parse(
            doc.content if hasattr(doc, 'content') else str(doc),
            content_type="text"
        )
        parsed_documents.append(parsed)
    except Exception:
        parsed_documents.append(doc)
    if i % 50 == 0 or i == len(documents):
        print(f"  Parsed {i}/{len(documents)} documents...")

documents = parsed_documents


Parsing 2 documents...
  Parsed 2/2 documents...


In [5]:
documents[0].content

b'timestamp,transaction_id,from_account,to_account,amount,currency,transaction_type,merchant_category,device_id,ip_address,location,is_fraud,risk_score,notes\r\n2024-01-01 10:00:00,TX001,A123,B456,1000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.15,Normal transfer\r\n2024-01-01 10:01:00,TX002,A123,C789,5000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.25,Second transfer from same account\r\n2024-01-01 10:02:00,TX003,A123,D012,10000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,1,0.85,Unusual pattern - rapid large transfers\r\n2024-01-01 10:03:00,TX004,A123,E345,7500.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.90,Multiple rapid transactions - device change\r\n2024-01-01 10:04:00,TX005,B456,E789,2000.00,USD,transfer,internal,DEV003,198.51.100.1,Chicago,0,0.20,Normal transfer\r\n2024-01-01 10:05:00,TX006,A123,F012,50000.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.95,Large transaction - fraud alert\r\n2024-01-01 10:06:00,TX007,C

## Normalizing and Chunking Transaction Data


In [6]:
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

normalizer = TextNormalizer()
# Use sentence chunking for transaction logs
splitter = TextSplitter(method="sentence", chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)

print(f"Normalizing {len(documents)} documents...")
normalized_documents = []
for i, doc in enumerate(documents, 1):
    normalized_text = normalizer.normalize(
        doc.content if hasattr(doc, 'content') else str(doc),
        clean_html=True,
        normalize_entities=True,
        normalize_numbers=True,
        remove_extra_whitespace=True,
        lowercase=False
    )
    normalized_documents.append(normalized_text)
    if i % 50 == 0 or i == len(documents):
        print(f"  Normalized {i}/{len(documents)} documents...")

print(f"Chunking {len(normalized_documents)} documents...")
chunked_documents = []
for i, doc_text in enumerate(normalized_documents, 1):
    try:
        with redirect_stderr(StringIO()):
            chunks = splitter.split(doc_text)
        chunked_documents.extend(chunks)
    except Exception:
        simple_splitter = TextSplitter(method="recursive", chunk_size=CHUNK_SIZE, chunk_overlap=CHUNK_OVERLAP)
        chunks = simple_splitter.split(doc_text)
        chunked_documents.extend(chunks)
    if i % 50 == 0 or i == len(normalized_documents):
        print(f"  Chunked {i}/{len(normalized_documents)} documents ({len(chunked_documents)} chunks so far)")

print(f"Created {len(chunked_documents)} chunks from {len(normalized_documents)} documents")


Normalizing 2 documents...
üîÑ Normalizing text üîß normalize TextNormalizer |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -  Normalized 2/2 documents...
Chunking 2 documents...
  Chunked 2/2 documents (7 chunks so far)
Created 7 chunks from 2 documents


In [8]:
chunked_documents

[Chunk(text="b'timestamp,transaction_id,from_account,to_account,amount,currency,transaction_type,merchant_category,device_id,ip_address,location,is_fraud,risk_score,notes\\r\\n2024-01-01 10:00:00,TX001,A123,B456,1000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.15,Normal transfer\\r\\n2024-01-01 10:01:00,TX002,A123,C789,5000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.25,Second transfer from same account\\r\\n2024-01-01 10:02:00,TX003,A123,D012,10000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,1,0.85,Unusual pattern - rapid large transfers\\r\\n2024-01-01 10:03:00,TX004,A123,E345,7500.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.90,Multiple rapid transactions - device change\\r\\n2024-01-01 10:04:00,TX005,B456,E789,2000.00,USD,transfer,internal,DEV003,198.51.100.1,Chicago,0,0.20,Normal transfer\\r\\n2024-01-01 10:05:00,TX006,A123,F012,50000.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.95,Large transaction - fraud alert\\r\\n

In [7]:
normalized_documents

["b'timestamp,transaction_id,from_account,to_account,amount,currency,transaction_type,merchant_category,device_id,ip_address,location,is_fraud,risk_score,notes\\r\\n2024-01-01 10:00:00,TX001,A123,B456,1000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.15,Normal transfer\\r\\n2024-01-01 10:01:00,TX002,A123,C789,5000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.25,Second transfer from same account\\r\\n2024-01-01 10:02:00,TX003,A123,D012,10000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,1,0.85,Unusual pattern - rapid large transfers\\r\\n2024-01-01 10:03:00,TX004,A123,E345,7500.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.90,Multiple rapid transactions - device change\\r\\n2024-01-01 10:04:00,TX005,B456,E789,2000.00,USD,transfer,internal,DEV003,198.51.100.1,Chicago,0,0.20,Normal transfer\\r\\n2024-01-01 10:05:00,TX006,A123,F012,50000.00,USD,transfer,internal,DEV002,203.0.113.5,California,1,0.95,Large transaction - fraud alert\\r\\n2024-01-01 

## Extracting Transaction Entities


In [32]:
from semantica.semantic_extract import NERExtractor

entity_extractor = NERExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0
)

all_entities = []
print(f"Extracting entities from {len(chunked_documents)} chunks...")
for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        entities = entity_extractor.extract_entities(
            chunk_text,
            entity_types=["Transaction", "Account", "Device", "Pattern", "Anomaly"]
        )
        all_entities.extend(entities)
    except Exception:
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_entities)} entities found)")

transactions = [e for e in all_entities if e.label == "Transaction" or "transaction" in e.label.lower()]
accounts = [e for e in all_entities if e.label == "Account" or "account" in e.label.lower()]
anomalies = [e for e in all_entities if e.label in ["Anomaly", "Pattern"] or "anomaly" in e.label.lower() or "pattern" in e.label.lower()]

print(f"Extracted {len(transactions)} transactions, {len(accounts)} accounts, {len(anomalies)} anomalies/patterns")


Extracting entities from 7 chunks...
üîÑ Semantica is extracting: Extracting named entities from text üéØ semantic_extract NERExtractor |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -‚úÖ Semantica is extracting: Extracted 110 entities using llm üéØ semantic_extract NERExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 1.70s Extracted: -üîÑ Semantica is extracting: Extracting named entities from text üéØ semantic_extract NERExtractor |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -‚úÖ Semantica is extracting: Extracted 37 entities using llm üéØ semantic_extract NERExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 0.87s Extracted: -‚úÖ Semantica is extracting: Extracted 30 entities using llm üéØ semantic_extract NERExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 0.68s Extracted: -‚úÖ Se

In [27]:
all_entities

[Entity(text='TX001', label='Transaction', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='A123', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='B456', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='1000.00', label='Amount', start_char=0, end_char=0, confidence=0.8846153846153846, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='USD', label='Currency', start_char=0, end_char=0, confidence=0.8797455728054047, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed'

In [33]:
accounts

[Entity(text='A123', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='B456', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='C789', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='D012', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='E345', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}),
 Entity(text='B

## Extracting Transaction Relationships


In [34]:
from semantica.semantic_extract import RelationExtractor

relation_extractor = RelationExtractor(
    method="llm",
    provider="groq",
    llm_model="llama-3.1-8b-instant",
    temperature=0.0
)

all_relationships = []
print(f"Extracting relationships from {len(chunked_documents)} chunks...")
for i, chunk in enumerate(chunked_documents, 1):
    chunk_text = chunk.text if hasattr(chunk, 'text') else str(chunk)
    try:
        relationships = relation_extractor.extract_relations(
            chunk_text,
            entities=all_entities,
            relation_types=["from", "to", "triggers", "detects", "associated_with", "causes"]
        )
        all_relationships.extend(relationships)
    except Exception:
        continue
    
    if i % 20 == 0 or i == len(chunked_documents):
        print(f"  Processed {i}/{len(chunked_documents)} chunks ({len(all_relationships)} relationships found)")

print(f"Extracted {len(all_relationships)} relationships")


Extracting relationships from 7 chunks...
üîÑ Semantica is extracting: Extracting relations from 217 entities üéØ semantic_extract RelationExtractor |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -‚úÖ Semantica is extracting: Extracted 219 relations using llm üéØ semantic_extract RelationExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 7.63s Extracted: -‚úÖ Semantica is extracting: Extracted 11 relations using llm üéØ semantic_extract RelationExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 22.02s Extracted: -‚úÖ Semantica is extracting: Extracted 28 relations using llm üéØ semantic_extract RelationExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 13.41s Extracted: -‚úÖ Semantica is extracting: Extracted 221 relations using llm üéØ semantic_extract RelationExtractor |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - R

In [35]:
all_relationships

[Relation(subject=Entity(text='A123', label='Account', start_char=0, end_char=0, confidence=1.0, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}), predicate='triggers', object=Entity(text='unusual transaction patterns', label='Pattern', start_char=0, end_char=0, confidence=0.95, metadata={'provider': 'groq', 'model': 'llama-3.1-8b-instant', 'extraction_method': 'llm_typed', 'batch_index': 0}), confidence=0.975, context="b'timestamp,transaction_id,from_account,to_account,amount,currency,transaction_type,merchant_category,device_id,ip_address,location,is_fraud,risk_score,notes\\r\\n2024-01-01 10:00:00,TX001,A123,B456,1000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.15,Normal transfer\\r\\n2024-01-01 10:01:00,TX002,A123,C789,5000.00,USD,transfer,internal,DEV001,192.168.1.1,New York,0,0.25,Second transfer from same account\\r\\n2024-01-01 10:02:00,TX003,A123,D012,10000.00,USD,transfer,internal,DEV001,192.168.1.

## Detecting Transaction Conflicts


In [11]:
from semantica.conflicts import ConflictDetector, ConflictResolver
from semantica.conflicts.methods import detect_conflicts

# Use logical conflict detection for fraud rules
# expert_review strategy flags conflicts for manual review by fraud analysts
conflict_detector = ConflictDetector()
conflict_resolver = ConflictResolver()

# Convert all entities to dictionaries for conflict detection
print(f"Converting {len(all_entities)} entities to dictionaries...")
entity_dicts = []
for e in all_entities:
    entity_dict = {
        "id": e.text if hasattr(e, 'text') else str(e),
        "name": e.text if hasattr(e, 'text') else str(e),
        "text": e.text if hasattr(e, 'text') else str(e),
        "type": e.label if hasattr(e, 'label') else "ENTITY",
        "label": e.label if hasattr(e, 'label') else "ENTITY",
        "confidence": getattr(e, 'confidence', 1.0),
        "metadata": getattr(e, 'metadata', {})
    }
    entity_dicts.append(entity_dict)

# Convert all relationships to dictionaries for conflict detection
print(f"Converting {len(all_relationships)} relationships to dictionaries...")
relationship_dicts = []
for r in all_relationships:
    # Handle different relationship object formats
    if hasattr(r, 'subject') and hasattr(r, 'predicate') and hasattr(r, 'object'):
        # Relation object format: subject, predicate, object
        source = r.subject.text if hasattr(r.subject, 'text') else str(r.subject)
        target = r.object.text if hasattr(r.object, 'text') else str(r.object)
        rel_type = r.predicate if isinstance(r.predicate, str) else str(r.predicate)
    elif hasattr(r, 'source') and hasattr(r, 'target'):
        # Alternative format: source, target, type/label
        source = r.source.text if hasattr(r.source, 'text') else str(r.source)
        target = r.target.text if hasattr(r.target, 'text') else str(r.target)
        rel_type = getattr(r, 'label', getattr(r, 'type', 'RELATED_TO'))
    else:
        # Fallback: try to extract from dict-like object
        source = getattr(r, 'source_id', getattr(r, 'source', 'UNKNOWN'))
        target = getattr(r, 'target_id', getattr(r, 'target', 'UNKNOWN'))
        rel_type = getattr(r, 'type', getattr(r, 'label', 'RELATED_TO'))
    
    relationship_dict = {
        "id": f"{source}_{rel_type}_{target}",
        "source_id": source,
        "target_id": target,
        "type": rel_type,
        "confidence": getattr(r, 'confidence', 1.0),
        "metadata": getattr(r, 'metadata', {})
    }
    relationship_dicts.append(relationship_dict)

print(f"Detecting logical conflicts in {len(entity_dicts)} entities and {len(relationship_dicts)} relationships...")

# Detect logical conflicts (e.g., conflicting fraud indicators)
# Use the standalone function from methods module which accepts method as keyword argument
conflicts = detect_conflicts(entity_dicts, method="logical")

# Also detect relationship conflicts
relationship_conflicts = conflict_detector.detect_relationship_conflicts(relationship_dicts)
all_conflicts = conflicts + relationship_conflicts

print(f"Detected {len(conflicts)} entity conflicts and {len(relationship_conflicts)} relationship conflicts (total: {len(all_conflicts)} conflicts)")

if all_conflicts:
    print(f"Resolving conflicts using expert_review strategy...")
    resolved = conflict_resolver.resolve_conflicts(
        all_conflicts,
        strategy="expert_review"  # Manual review by fraud analysts
    )
    print(f"Resolved {len(resolved)} conflicts (flagged for expert review)")
else:
    print("No conflicts detected")


Converting 305 entities to dictionaries...
Converting 320 relationships to dictionaries...
Detecting logical conflicts in 305 entities and 320 relationships...
üîÑ Semantica is resolving: Detecting logical conflicts in 305 entities ‚ö†Ô∏è conflicts ConflictDetector |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -üîÑ Semantica is resolving: Grouping entities... 0/305 (remaining: 305) ‚ö†Ô∏è conflicts ConflictDetector |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -üîÑ Semantica is resolving: Grouping entities... 1/305 (remaining: 304) ‚ö†Ô∏è conflicts ConflictDetector |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.3% ETA: 2.9s Rate: 61.0/s Time: 0.02s Extracted: -üîÑ Semantica is resolving: Grouping entities... 3/305 (remaining: 302) ‚ö†Ô∏è conflicts ConflictDetector |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 1.0% ETA: 1.7s Rate: 132.0/s Time: 0.02s Extracted: -üîÑ Semantica is resolving

## Building Temporal Transaction Knowledge Graph


In [12]:
from semantica.kg import GraphBuilder

graph_builder = GraphBuilder(
    merge_entities=True,
    resolve_conflicts=True,
    entity_resolution_strategy="fuzzy",
    enable_temporal=True,
    temporal_granularity=TEMPORAL_GRANULARITY
)

print(f"Building knowledge graph from {len(all_entities)} entities and {len(all_relationships)} relationships...")

# GraphBuilder handles Entity and Relationship objects directly
kg = graph_builder.build({
    "entities": all_entities,
    "relationships": all_relationships
})

entities_count = len(kg.get('entities', []))
relationships_count = len(kg.get('relationships', []))
print(f"Graph: {entities_count} entities, {relationships_count} relationships")


Building knowledge graph from 305 entities and 320 relationships...
üîÑ Semantica is building: Knowledge graph from 1 source(s) üß† kg GraphBuilder |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.02s Extracted: -üîÑ Semantica is building: Processing entities... 100/305 üß† kg GraphBuilder |‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 32.8% ETA: - Time: 0.00s Extracted: -üîÑ Semantica is building: Processing entities... 200/305 üß† kg GraphBuilder |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 65.6% ETA: - Rate: 17495.9/s Time: 0.01s Extracted: -üîÑ Semantica is building: Processing entities... 300/305 üß† kg GraphBuilder |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñë| 98.4% ETA: 0.0s Rate: 17527.6/s Time: 0.02s Extracted: -üîÑ Semantica is building: Processing entities... 305/305 üß† kg GraphBuilder |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: 17819.8/s Time: 0.02s Extracted: -üîÑ Semantica is building: Processing relatio

## Generating Embeddings for Transactions and Accounts


In [36]:
from semantica.embeddings import EmbeddingGenerator

embedding_gen = EmbeddingGenerator(
    provider="sentence_transformers",
    model=EMBEDDING_MODEL
)

print(f"Generating embeddings for {len(transactions)} transactions and {len(accounts)} accounts...")
transaction_texts = [t.text for t in transactions]
transaction_embeddings = embedding_gen.generate_embeddings(transaction_texts)

account_texts = [a.text for a in accounts]
account_embeddings = embedding_gen.generate_embeddings(account_texts)

print(f"Generated {len(transaction_embeddings)} transaction embeddings and {len(account_embeddings)} account embeddings")


Generating embeddings for 28 transactions and 58 accounts...
Generated 28 transaction embeddings and 58 account embeddings


## Populating Vector Store


In [14]:
from semantica.vector_store import VectorStore

vector_store = VectorStore(backend="faiss", dimension=EMBEDDING_DIMENSION)

print(f"Storing {len(transaction_embeddings)} transaction vectors and {len(account_embeddings)} account vectors...")
transaction_ids = vector_store.store_vectors(
    vectors=transaction_embeddings,
    metadata=[{"type": "transaction", "name": t.text, "label": t.label} for t in transactions]
)

account_ids = vector_store.store_vectors(
    vectors=account_embeddings,
    metadata=[{"type": "account", "name": a.text, "label": a.label} for a in accounts]
)

print(f"Stored {len(transaction_ids)} transaction vectors and {len(account_ids)} account vectors")


Storing 47 transaction vectors and 61 account vectors...
üîÑ Semantica is indexing: Storing 47 vectors üìä vector_store VectorStore |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -Stored 47 transaction vectors and 61 account vectors


## Temporal Graph Queries


In [15]:
from semantica.kg import TemporalGraphQuery

temporal_query = TemporalGraphQuery(
    enable_temporal_reasoning=True,
    temporal_granularity=TEMPORAL_GRANULARITY
)

query_results = temporal_query.query_at_time(
    kg,
    query={"type": "Transaction"},
    at_time="2024-01-01 10:05:00"
)

evolution = temporal_query.analyze_evolution(kg)
pattern_results = temporal_query.query_temporal_pattern(kg, pattern="sequence")

print(f"Temporal queries: {len(query_results.get('entities', []))} transactions at query time")
print(f"Temporal patterns detected: {pattern_results.get('num_patterns', 0)}")


Temporal queries: 79 transactions at query time
Temporal patterns detected: 0


## Temporal Pattern Detection


In [16]:
from semantica.kg import TemporalPatternDetector

pattern_detector = TemporalPatternDetector()

# Detect temporal fraud patterns (using sequence pattern detection)
fraud_patterns = pattern_detector.detect_temporal_patterns(kg, pattern_type="sequence")

# Detect sequence patterns (rapid transactions, unusual timing)
sequence_patterns = pattern_detector.detect_temporal_patterns(kg, pattern_type="sequence")

print(f"Detected {len(fraud_patterns)} fraud patterns")
print(f"Detected {len(sequence_patterns)} sequence patterns")


Detected 0 fraud patterns
Detected 0 sequence patterns


## Reasoning and Fraud Detection


In [37]:
from semantica.reasoning import Reasoner
from semantica.kg import ConnectivityAnalyzer

reasoner = Reasoner()

reasoner.add_rule("IF Account from Transaction AND Transaction amount > 10000 AND Transaction count > 3 THEN Account triggers Anomaly")
reasoner.add_rule("IF Transaction from Account AND Account triggers Anomaly THEN Transaction associated_with Pattern")

inferred_facts = reasoner.infer_facts(kg)

# Use Semantica's built-in analyze_connectivity for path finding
accounts = [e for e in kg.get('entities', []) if e.get('type') == 'Account']
anomalies = [e for e in kg.get('entities', []) if e.get('type') == 'Anomaly']

fraud_paths = []
if accounts and anomalies:
    account_id = accounts[0].get('id') or accounts[0].get('text') or accounts[0].get('name')
    anomaly_id = anomalies[0].get('id') or anomalies[0].get('text') or anomalies[0].get('name')
    if account_id and anomaly_id:
        path_result = analyze_connectivity(kg, method="paths", source=account_id, target=anomaly_id)
        if path_result.get('exists'):
            fraud_paths = [path_result]

print(f"Inferred {len(inferred_facts)} facts")
print(f"Found {len(fraud_paths)} fraud paths")

üîÑ Semantica is reasoning: Inferring facts ü§î reasoning Reasoner |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -Inferred 0 facts
Found 0 fraud paths


## Analyzing Transaction Network Structure


In [18]:
from semantica.kg import GraphAnalyzer, CommunityDetector

graph_analyzer = GraphAnalyzer()
community_detector = CommunityDetector()

analysis = graph_analyzer.analyze_graph(kg)

communities = community_detector.detect_communities(kg, method="louvain")
connectivity = graph_analyzer.analyze_connectivity(kg)

# Detect suspicious account communities
suspicious_communities = []
for community in communities:
    community_accounts = [e for e in kg.get("entities", []) 
                          if e.get("id") in community and e.get("type") == "Account"]
    if len(community_accounts) > 0:
        # Check if community has suspicious patterns
        suspicious_communities.append({
            "community_id": len(suspicious_communities),
            "account_count": len(community_accounts)
        })

print(f"Graph analytics:")
print(f"  - Communities: {len(communities)}")
print(f"  - Connected components: {len(connectivity.get('components', []))}")
print(f"  - Graph density: {analysis.get('density', 0):.3f}")
print(f"  - Suspicious communities: {len(suspicious_communities)}")


üîÑ Semantica is building: Calculating degree centrality üß† kg CentralityCalculator |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -‚úÖ Semantica is building: Detected 17 communities üß† kg CommunityDetector |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 0.03s Extracted: -Graph analytics:
  - Communities: 4
  - Connected components: 12
  - Graph density: 0.000
  - Suspicious communities: 0


## GraphRAG: Hybrid Vector + Graph Queries


In [19]:
from semantica.context import AgentContext, ContextGraph, ContextRetriever
from semantica.llms import Groq

# Initialize AgentContext with knowledge graph for GraphRAG
context = AgentContext(
    vector_store=vector_store,
    knowledge_graph=kg,
    hybrid_alpha=0.7,  # 70% graph, 30% vector
    use_graph_expansion=True
)

# Build context graph using ContextGraph directly
print("Building context graph from knowledge graph...")
context_graph = ContextGraph()

# Convert KG entities and relationships to context graph format
kg_entities = kg.get('entities', [])[:50]
kg_relationships = kg.get('relationships', [])[:100]

# Build context graph from entities and relationships
graph_result = context_graph.build_from_entities_and_relationships(
    entities=kg_entities,
    relationships=kg_relationships
)

print(f"Context graph built: {len(context_graph.nodes)} nodes, {len(context_graph.edges)} edges")

# Store transaction data in context graph for better retrieval
print("\nStoring transaction data in context graph...")
for i, entity in enumerate(kg.get('entities', [])[:20]):  # Store sample entities
    entity_text = f"{entity.get('text', entity.get('name', ''))} is a {entity.get('type', 'Entity')}"
    context.store(
        content=entity_text,
        metadata={"type": entity.get('type'), "source": "fraud_detection"},
        entities=[entity],
        extract_entities=False,  # Already extracted
        link_entities=True
    )

# Get context graph statistics
stats = context.stats()
print(f"\nContext Graph Statistics:")
print(f"  - Total memories: {stats.get('total_memories', 0)}")
print(f"  - Graph nodes: {stats.get('graph_nodes', 0)}")
print(f"  - Graph edges: {stats.get('graph_edges', 0)}")

# Initialize Groq LLM for reasoning
llm = Groq(
    model="llama-3.1-8b-instant",
    api_key=os.getenv("GROQ_API_KEY")
)

# Query with multi-hop reasoning using Groq LLM and context graph
queries = [
    "What accounts have suspicious transaction patterns?",
    "Which accounts show signs of fraud based on device changes?",
    "What are the relationships between fraudulent transactions and accounts?"
]

print("\n" + "=" * 80)
print("GraphRAG with Multi-Hop Reasoning (Groq LLM + Context Graph)")
print("=" * 80)

for query in queries:
    print(f"\n{'='*80}")
    print(f"Query: {query}")
    print(f"{'='*80}\n")
    
    # Use query_with_reasoning for better responses with context graph
    result = context.query_with_reasoning(
        query=query,
        llm_provider=llm,
        max_results=15,
        max_hops=3,  # Multi-hop reasoning through context graph
        min_score=0.2
    )
    
    print(f"Generated Response:\n{result.get('response', 'No response')}\n")
    
    if result.get('reasoning_path'):
        print(f"Reasoning Path:\n{result.get('reasoning_path')}\n")
    
    print(f"Confidence: {result.get('confidence', 0):.3f}")
    print(f"Sources Used: {result.get('num_sources', 0)}")
    print(f"Reasoning Paths: {result.get('num_reasoning_paths', 0)}")
    print()

litellm library not installed. Install with: pip install litellm


Building context graph from knowledge graph...
üîÑ Semantica is processing: Building graph from 50 entities and 100 relationships üîó context ContextGraph |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.02s Extracted: -Context graph built: 50 nodes, 0 edges

Storing transaction data in context graph...
üîÑ Semantica is embedding: Using FastEmbed model... üíæ embeddings TextEmbedder |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -üîÑ Semantica is indexing: Updating vector index... üìä vector_store VectorStore |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -üîÑ Semantica is embedding: Generating text embedding: internal is a Merchant Category... üíæ embeddings TextEmbedder |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.00s Extracted: -üîÑ Semantica is processing: Generating embedding... üîó context AgentMemory |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Generated Response:
Based on the retrieved context and reasoning paths, I can identify accounts with suspicious transaction patterns.

The Account takeover pattern (Pattern) is triggered by multiple instances of A123 (Account) and is associated with unusual transaction patterns (Context 1). This pattern is also triggered by A123 (Account) in Context 2, which is related to fraud (fraud_related Pattern).

Additionally, A123 (Account) is associated with suspicious_inflow (Pattern) (Context 3), which suggests that there may be unusual or suspicious inflows of funds into this account.

Furthermore, rapid_transfers (Pattern) is triggered by A123 (Account) and is associated with Account takeover pattern (Pattern) (Context 4). This suggests that A123 (Account) may be involved in rapid or suspicious transfers.

Therefore, based on the retrieved context and reasoning paths, A123 (Account) appears to have suspicious transaction patterns.

**Reasoning Path:** A123 (Account) --[triggers]--> Account

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Generated Response:
Based on the retrieved context and reasoning paths, I can identify accounts that show signs of fraud based on device changes.

The key relationships that indicate fraud are:

- "triggers" relationship between A123 (Account) and Account takeover pattern (Pattern) (Context 2 and Context 3).
- "associated_with" relationship between multiple untrusted devices (Device) and unusual transaction patterns (Context 4).
- "device_hopping" relationship that triggers multiple untrusted devices (Context 4).

From these relationships, we can infer that A123 (Account) is associated with device changes that indicate fraud. Specifically, the multi-hop connections reveal that:

- A123 (Account) triggers Account takeover pattern (Pattern), which is associated with unusual transaction patterns (Path 2).
- A123 (Account) is also associated with multiple untrusted devices, which are triggered by device_hopping (Path 1).

Therefore, based on the context and reasoning paths, A123 (Account) 

Embedding generation failed: Text cannot be empty or whitespace-only
Using random fallback embedding


Generated Response:
Based on the retrieved context and reasoning paths, the relationships between fraudulent transactions and accounts can be summarized as follows:

Fraudulent transactions are often associated with specific accounts that exhibit unusual patterns. The account A123 (Context 3) is linked to the Account takeover pattern (Context 2), which is a known pattern of fraudulent activity. This pattern is triggered by A123 and causes account compromise (Context 2).

A123 is also associated with triggering unusual transaction patterns (Context 3), which is a common indicator of fraudulent activity. Furthermore, A123 is linked to multiple anomalous transactions (ANOM001, ANOM002, ANOM003) (Context 3).

The account F012 (Context 1) is linked to the fraud-related pattern, which suggests that it may be involved in fraudulent activity. However, the specific nature of this relationship is not explicitly stated.

The account D012 (Context 4) and F012 (Context 5) do not have direct connect

## Visualizing the Temporal Fraud Detection Knowledge Graph


In [20]:
from semantica.visualization import TemporalVisualizer
from datetime import datetime, timedelta

# Prepare temporal KG with timestamps for interactive visualization
# Extract timestamps from entities (if they have temporal metadata)
timestamps = {}
entities = kg.get('entities', [])
relationships = kg.get('relationships', [])

# Build timestamps map from entity metadata or relationships
for entity in entities:
    entity_id = entity.get('id') or entity.get('text') or entity.get('name', '')
    if entity_id:
        # Extract timestamp from entity metadata if available
        entity_times = []
        if 'timestamp' in entity:
            entity_times.append(entity['timestamp'])
        elif 'temporal' in entity:
            entity_times.extend(entity.get('temporal', []))
        else:
            # Use relationships to infer timestamps
            for rel in relationships:
                if rel.get('source') == entity_id or rel.get('target') == entity_id:
                    if 'timestamp' in rel:
                        entity_times.append(rel['timestamp'])
        
        if entity_times:
            timestamps[entity_id] = sorted(list(set(entity_times)))

# If no timestamps found, create synthetic timestamps based on entity order
if not timestamps:
    base_time = datetime(2024, 1, 1, 10, 0, 0)
    for i, entity in enumerate(entities[:50]):  # Limit to first 50 for performance
        entity_id = entity.get('id') or entity.get('text') or entity.get('name', '')
        if entity_id:
            # Assign timestamps in sequence
            entity_time = base_time + timedelta(minutes=i)
            timestamps[entity_id] = [entity_time.strftime("%Y-%m-%d %H:%M:%S")]

# Create temporal KG structure
temporal_kg = {
    "entities": entities[:50],  # Limit for performance
    "relationships": relationships[:100],  # Limit for performance
    "timestamps": timestamps
}

# Initialize TemporalVisualizer
temporal_viz = TemporalVisualizer()

print("Generating interactive temporal dashboard...")
# Create interactive temporal dashboard
dashboard_fig = temporal_viz.visualize_temporal_dashboard(
    temporal_kg,
    output="interactive",
    title="Fraud Detection - Temporal Knowledge Graph Dashboard"
)

# Display the interactive figure
if dashboard_fig:
    dashboard_fig.show()
    print("\n‚úÖ Interactive temporal dashboard displayed above")
else:
    print("‚ö†Ô∏è  Dashboard generation failed")

print("\nGenerating interactive network evolution animation...")
# Create interactive network evolution animation
evolution_fig = temporal_viz.visualize_network_evolution(
    temporal_kg,
    output="interactive",
    title="Fraud Detection - Network Evolution Over Time"
)

# Display the interactive animation
if evolution_fig:
    evolution_fig.show()
    print("\n‚úÖ Interactive network evolution animation displayed above")
else:
    print("‚ö†Ô∏è  Network evolution visualization failed")

Generating interactive temporal dashboard...



‚úÖ Interactive temporal dashboard displayed above

Generating interactive network evolution animation...



‚úÖ Interactive network evolution animation displayed above


## Exporting Results


In [21]:
from semantica.export import GraphExporter, CSVExporter

# Export to JSON and GraphML using GraphExporter
graph_exporter = GraphExporter()
graph_exporter.export(kg, output_path="fraud_detection_kg.json", format="json")
graph_exporter.export(kg, output_path="fraud_detection_kg.graphml", format="graphml")

# Export to CSV using CSVExporter
csv_exporter = CSVExporter()
csv_exporter.export_knowledge_graph(kg, "fraud_detection_alerts")
# Creates: fraud_detection_alerts_entities.csv, fraud_detection_alerts_relationships.csv

print("‚úÖ Exported fraud detection knowledge graph:")
print("   - JSON: fraud_detection_kg.json")
print("   - GraphML: fraud_detection_kg.graphml")
print("   - CSV entities: fraud_detection_alerts_entities.csv")
print("   - CSV relationships: fraud_detection_alerts_relationships.csv")

üîÑ Semantica is exporting: Exporting graph to json: fraud_detection_kg.json üíæ export GraphExporter |‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë‚ñë| 0.0% ETA: - Rate: - Time: 0.01s Extracted: -‚úÖ Semantica is exporting: Exported graph (graphml) to: fraud_detection_kg.graphml üíæ export GraphExporter |‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 100.0% ETA: - Rate: - Time: 0.01s Extracted: -‚úÖ Exported fraud detection knowledge graph:
   - JSON: fraud_detection_kg.json
   - GraphML: fraud_detection_kg.graphml
   - CSV entities: fraud_detection_alerts_entities.csv
   - CSV relationships: fraud_detection_alerts_relationships.csv
