[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/introduction/18_Conflict_Detection.ipynb)

# Conflict Detection

## Overview

This notebook demonstrates how to detect and resolve conflicts in knowledge graphs using Semantica's conflict modules. You'll learn to use `ConflictDetector`, `SourceTracker`, and `ConflictResolver`.

**Documentation**: [API Reference](https://semantica.readthedocs.io/reference/conflicts/)

### Learning Objectives

- Use `ConflictDetector` to detect conflicts
- Use `SourceTracker` to track data sources
- Use `ConflictResolver` to resolve conflicts

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Conflict Detection

Detect conflicts in entities.


In [None]:
%pip install -U "semantica[all]"\nimport semantica\nprint(semantica.__version__)\n

In [None]:
from semantica.conflicts import ConflictDetector
from datetime import datetime

# Initialize detector with configuration
detector = ConflictDetector(
    confidence_threshold=0.7,
    track_provenance=True,
    conflict_fields={"Company": ["name", "founded", "revenue"]}
)

# Sample entities from multiple sources
entities = [
    {"id": "e1", "name": "Apple Inc.", "founded": 1976, "type": "Company", 
     "source": "wikipedia", "confidence": 0.9},
    {"id": "e1", "name": "Apple Incorporated", "founded": 1976, "type": "Company",
     "source": "official_site", "confidence": 0.95},
    {"id": "e1", "name": "Apple Inc.", "founded": 1977, "type": "Company",
     "source": "news", "confidence": 0.7},
    {"id": "e2", "name": "Microsoft", "type": "Company", "founded": 1975, "source": "source1"},
    {"id": "e2", "name": "Microsoft Corporation", "type": "Organization", 
     "founded": 1975, "source": "source2"},
]

# 1.1 Value Conflict Detection
value_conflicts = detector.detect_value_conflicts(entities, "name")

# 1.2 Type Conflict Detection
type_conflicts = detector.detect_type_conflicts(entities)

# 1.3 Temporal Conflict Detection
temporal_conflicts = detector.detect_temporal_conflicts(entities)

# 1.4 Logical Conflict Detection
logical_entities = [
    {"id": "e3", "type": "Person", "name": "John Doe", "source": "source1"},
    {"id": "e3", "type": "Organization", "name": "John Doe", "source": "source2"},
]
logical_conflicts = detector.detect_logical_conflicts(logical_entities)

# 1.5 Relationship Conflict Detection
relationships = [
    {"id": "rel1", "source_id": "e1", "target_id": "e2", "type": "competes_with", "source": "source1"},
    {"id": "rel1", "source_id": "e1", "target_id": "e2", "type": "partners_with", "source": "source2"},
]
rel_conflicts = detector.detect_relationship_conflicts(relationships)

# 1.6 General Conflict Detection (all types)
all_conflicts = detector.detect_conflicts(entities)

# Get conflict report
report = detector.get_conflict_report()


## Step 2: Source Tracking

Track data sources.


In [None]:
from semantica.conflicts import SourceTracker, SourceReference
from datetime import datetime

# Initialize source tracker
tracker = SourceTracker()

# Create source references with metadata
source1 = SourceReference(
    document="wikipedia",
    page=1,
    section="Company Information",
    timestamp=datetime(2023, 1, 15),
    confidence=0.9
)

source2 = SourceReference(
    document="official_site",
    section="About Us",
    timestamp=datetime(2023, 3, 20),
    confidence=0.95
)

# Track property sources
tracker.track_property_source("e1", "name", "Apple Inc.", source1)
tracker.track_property_source("e1", "name", "Apple Incorporated", source2)
tracker.track_property_source("e1", "founded", 1976, source1)

# Track entity sources
tracker.track_entity_source("e1", source1)

# Set source credibility scores
tracker.set_source_credibility("wikipedia", 0.85)
tracker.set_source_credibility("official_site", 0.95)

# Retrieve property sources
prop_source = tracker.get_property_sources("e1", "name")

# Get entity sources
entity_sources = tracker.get_entity_sources("e1")

# Get all source credibilities
all_credibilities = tracker.get_all_source_credibilities()

# Generate traceability chain
chain = tracker.generate_traceability_chain("e1", "name")

# Generate source report
report = tracker.generate_source_report("e1")


## Step 3: Conflict Resolution

Resolve conflicts using ConflictResolver.


In [None]:
from semantica.conflicts import ConflictResolver

# Initialize resolver with source tracker
resolver = ConflictResolver(
    default_strategy="voting",
    source_tracker=tracker
)

# Resolve conflicts using different strategies
if value_conflicts:
    # Voting strategy
    voting_results = resolver.resolve_conflicts(value_conflicts, strategy="voting")
    
    # Credibility-weighted strategy
    credibility_results = resolver.resolve_conflicts(value_conflicts, strategy="credibility_weighted")
    
    # Most recent strategy
    recent_results = resolver.resolve_conflicts(value_conflicts, strategy="most_recent")
    
    # Highest confidence strategy
    confidence_results = resolver.resolve_conflicts(value_conflicts, strategy="highest_confidence")
    
    # First seen strategy
    first_seen_results = resolver.resolve_conflicts(value_conflicts, strategy="first_seen")
    
    # Manual review strategy
    manual_results = resolver.resolve_conflicts(value_conflicts, strategy="manual_review")


## Summary

You've learned how to detect and resolve conflicts:

- **ConflictDetector**: Detect conflicts in entities
- **SourceTracker**: Track data sources
- **ConflictResolver**: Resolve conflicts using various strategies

Next: Learn about configuration in the Configuration notebook.


In [None]:
from semantica.conflicts import ConflictAnalyzer

# Initialize analyzer
analyzer = ConflictAnalyzer()

# Comprehensive analysis
analysis = analyzer.analyze_conflicts(all_conflicts)

# Analysis by type, severity, and source
by_type = analysis['by_type']['counts']
by_severity = analysis['by_severity']['counts']
by_source = analysis['by_source']['counts']

# Top entities and properties
top_entities = analysis['by_entity']['top_entities']
top_properties = analysis['by_property']['top_properties']

# Patterns and recommendations
patterns = analysis['patterns']
recommendations = analysis['recommendations']

# Trend analysis
trends = analyzer.analyze_trends(all_conflicts)

# Generate insights report
insights = analyzer.generate_insights_report(all_conflicts)


## Part 5: Investigation Guides

`InvestigationGuideGenerator` creates guides for manual review.

**Guide Components:**
- Conflict summary, investigation steps, recommended actions
- Source information, context, severity assessment

**Use Cases:** High-severity conflicts, ambiguous cases, compliance, QA workflows

**Export Formats:** Markdown checklists, detailed reports, structured context


In [None]:
from semantica.conflicts import InvestigationGuideGenerator

# Initialize guide generator
guide_generator = InvestigationGuideGenerator(source_tracker=tracker)

# Generate guide for a conflict
if value_conflicts:
    guide = guide_generator.generate_guide(value_conflicts[0])

# Generate guides for multiple conflicts
guides = guide_generator.generate_guides(value_conflicts[:3])

# Export checklist
checklist = guide_generator.export_investigation_checklist(guide, format="markdown")

# Generate conflict report
conflict_report = guide_generator.generate_conflict_report(value_conflicts, format="detailed")


## Part 6: Methods Module

`semantica.conflicts.methods` provides convenience functions for functional-style access.

**Functions:**
- `detect_conflicts()`: Methods: `value`, `type`, `temporal`, `logical`, `relationship`
- `resolve_conflicts()`: Methods: `voting`, `credibility_weighted`, `most_recent`, `highest_confidence`, `first_seen`, `manual_review`
- `analyze_conflicts()`: Methods: `pattern`, `type`, `severity`, `source`, `trend`
- `track_sources()`: Methods: `property`, `entity`, `relationship`
- `generate_investigation_guide()`: Methods: `guide`, `checklist`, `context`
- `list_available_methods()`: List all methods by task type
- `get_conflict_method()`: Retrieve specific method function

**Benefits:** Simpler API, method discovery, consistent interface, extensible


In [None]:
from semantica.conflicts.methods import (
    detect_conflicts,
    resolve_conflicts,
    analyze_conflicts,
    track_sources,
    generate_investigation_guide,
    list_available_methods,
    get_conflict_method
)

# Detection methods
value_conflicts_method = detect_conflicts(entities, method="value", property_name="name")
type_conflicts_method = detect_conflicts(entities, method="type")
temporal_conflicts_method = detect_conflicts(entities, method="temporal")
logical_conflicts_method = detect_conflicts(logical_entities, method="logical")

# Resolution methods
if value_conflicts_method:
    voting_results = resolve_conflicts(value_conflicts_method, method="voting")
    credibility_results = resolve_conflicts(value_conflicts_method, method="credibility_weighted")

# Analysis methods
pattern_analysis = analyze_conflicts(all_conflicts, method="pattern")
type_analysis = analyze_conflicts(all_conflicts, method="type")
severity_analysis = analyze_conflicts(all_conflicts, method="severity")
source_analysis = analyze_conflicts(all_conflicts, method="source")
trend_analysis = analyze_conflicts(all_conflicts, method="trend")

# Source tracking methods
source_ref = SourceReference(document="test_source", confidence=0.9)
track_sources("e1", method="property", property_name="name", value="Test", source=source_ref)
track_sources("e1", method="entity", source=source_ref)

# Investigation guide methods
if value_conflicts_method:
    guide_method = generate_investigation_guide(value_conflicts_method[0], method="guide")
    checklist_method = generate_investigation_guide(value_conflicts_method[0], method="checklist")
    context_method = generate_investigation_guide(value_conflicts_method[0], method="context")

# List available methods
all_methods = list_available_methods()

# Get specific method
voting_method = get_conflict_method("resolution", "voting")


## Part 7: Method Registry

`method_registry` provides a plugin system for custom methods.

**Registration:** Task type (`detection`, `resolution`, `analysis`, `tracking`, `investigation`), method name, function

**Use Cases:**
- Domain-specific resolution logic
- External system integration
- A/B testing strategies
- ML model integration
- Hybrid resolution approaches


In [None]:
from semantica.conflicts import method_registry, ResolutionResult

# Register custom resolution method
def custom_resolution(conflicts, **kwargs):
    """Custom resolution that always picks the first value."""
    results = []
    for conflict in conflicts:
        if conflict.conflicting_values:
            result = ResolutionResult(
                conflict_id=conflict.conflict_id,
                resolved=True,
                resolved_value=conflict.conflicting_values[0],
                resolution_strategy="custom_first",
                confidence=0.8,
                resolution_notes="Custom: Always use first value"
            )
            results.append(result)
    return results

# Register the custom method
method_registry.register("resolution", "custom_first", custom_resolution)

# List registered methods
registered = method_registry.list_all("resolution")

# Use custom method
if value_conflicts:
    custom_results = resolve_conflicts(value_conflicts, method="custom_first")

# Unregister method
method_registry.unregister("resolution", "custom_first")


## Part 8: Configuration

`ConflictsConfig` manages settings programmatically, via environment variables, or method-specific.

**Global Settings:**
- `confidence_threshold`, `default_strategy`, `auto_resolve`, `track_provenance`

**Method-Specific:** Voting (`min_sources`, `tie_breaker`), credibility-weighted (`min_credibility`), most_recent (`time_field`), etc.

**Priority:** Method-specific → Global → Environment variables → Defaults

**Best Practices:** Set source credibility early, configure conflict fields, use method-specific configs


In [None]:
from semantica.conflicts import ConflictsConfig, conflicts_config

# Using global config instance
conflicts_config.set("confidence_threshold", 0.8)
conflicts_config.set("default_strategy", "credibility_weighted")
threshold = conflicts_config.get("confidence_threshold", default=0.7)

# Method-specific configuration
conflicts_config.set_method_config("voting", min_sources=2, tie_breaker="confidence")
conflicts_config.set_method_config("credibility_weighted", min_credibility=0.5)
voting_config = conflicts_config.get_method_config("voting")

# Create custom config instance
custom_config = ConflictsConfig()
custom_config.set("confidence_threshold", 0.9)
custom_config.set("auto_resolve", True)
all_config = custom_config.get_all()


## Part 9: Complete Workflow

End-to-end example: integrating company data from multiple sources.

**Workflow:** Initialize → Track Sources → Detect → Resolve → Analyze → Generate Guides → Build Final Entity

**Scenario:** Three sources (Wikipedia, Official Site, Financial DB) with conflicts in name, founding year, and type classifications.


# Complete workflow: Company data integration from multiple sources
from semantica.conflicts import (
    ConflictDetector, ConflictResolver, ConflictAnalyzer,
    SourceTracker, InvestigationGuideGenerator, SourceReference
)

# Sample company data from multiple sources
company_data = [
    {"id": "company_1", "name": "Apple Inc.", "founded": 1976, "revenue": 394328000000,
     "headquarters": "Cupertino, California", "type": "Company",
     "source": "wikipedia", "confidence": 0.85, "timestamp": datetime(2023, 1, 15)},
    {"id": "company_1", "name": "Apple Inc.", "founded": 1976, "revenue": 394328000000,
     "headquarters": "Cupertino, CA", "type": "Company",
     "source": "official_site", "confidence": 0.95, "timestamp": datetime(2023, 3, 20)},
    {"id": "company_1", "name": "Apple Incorporated", "founded": 1977, "revenue": 394328000000,
     "headquarters": "Cupertino", "type": "Organization",
     "source": "financial_db", "confidence": 0.80, "timestamp": datetime(2023, 2, 10)},
]

# Initialize all components
detector = ConflictDetector(
    confidence_threshold=0.7,
    track_provenance=True,
    conflict_fields={"Company": ["name", "founded", "revenue", "headquarters"]}
)

tracker = SourceTracker()
tracker.set_source_credibility("wikipedia", 0.85)
tracker.set_source_credibility("official_site", 0.95)
tracker.set_source_credibility("financial_db", 0.80)

resolver = ConflictResolver(default_strategy="credibility_weighted", source_tracker=tracker)
analyzer = ConflictAnalyzer()
guide_generator = InvestigationGuideGenerator(source_tracker=tracker)

# Step 1: Track sources
for entity in company_data:
    source_ref = SourceReference(
        document=entity["source"],
        confidence=entity["confidence"],
        timestamp=entity["timestamp"]
    )
    tracker.track_property_source(entity["id"], "name", entity["name"], source_ref)
    tracker.track_property_source(entity["id"], "founded", entity["founded"], source_ref)

# Step 2: Detect conflicts
detected_conflicts = detector.detect_entity_conflicts(company_data, entity_type="Company")

# Step 3: Resolve conflicts
resolved_data = {}
for conflict in detected_conflicts:
    results = resolver.resolve_conflicts([conflict], strategy="credibility_weighted")
    if results[0].resolved:
        resolved_data[conflict.property_name] = results[0].resolved_value

# Step 4: Analyze
analysis = analyzer.analyze_conflicts(detected_conflicts)

# Step 5: Generate guides for unresolved conflicts
unresolved = [c for c in detected_conflicts if c.property_name not in resolved_data]
if unresolved:
    guides = guide_generator.generate_guides(unresolved)

# Final resolved entity
final_entity = {"id": "company_1", "type": "Company", **resolved_data}


## Summary

### Key Features

✅ **Detection**: Value, type, temporal, logical, relationship conflicts  
✅ **Resolution**: 6 strategies (voting, credibility-weighted, most_recent, first_seen, highest_confidence, manual_review)  
✅ **Source Tracking**: Provenance, credibility, traceability chains  
✅ **Analysis**: Patterns, trends, recommendations  
✅ **Investigation Guides**: Automated guides and checklists  
✅ **Methods Module**: Convenience functions for all operations  
✅ **Method Registry**: Custom method registration  
✅ **Configuration**: Global and method-specific settings

### Best Practices

1. Set source credibility before detection
2. Choose strategies based on data characteristics
3. Enable provenance tracking for audits
4. Analyze patterns before resolving
5. Use guides for high-severity conflicts
6. Configure conflict fields to focus on critical properties

### Common Patterns

- **Integration**: Track → Detect → Resolve → Analyze
- **QA**: Detect → Analyze → Generate guides → Review
- **Auto**: Detect → Resolve → Analyze
- **Assessment**: Track → Analyze → Adjust credibility

### Next Steps

- `04_Conflict_Resolution_Strategies.ipynb` - Advanced strategies
- `06_Multi_Source_Data_Integration.ipynb` - Integration workflows
- [API Reference](https://semantica.readthedocs.io/reference/conflicts/)
- [Usage Guide](../semantica/conflicts/conflicts_usage.md)
