[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/advanced/01_Advanced_Extraction.ipynb)

# Advanced Extraction

## Overview

This notebook demonstrates advanced semantic extraction using EventDetector, CoreferenceResolver, TripletExtractor, SemanticAnalyzer, SemanticNetworkExtractor, LLMEnhancer, and ExtractionValidator.


**Documentation**: [API Reference](https://semantica.readthedocs.io/reference/semantic_extract/)

### Learning Objectives

- Use EventDetector to detect events
- Use CoreferenceResolver to resolve coreferences
- Use TripletExtractor to extract RDF triplets
- Use SemanticAnalyzer for semantic analysis
- Use SemanticNetworkExtractor to extract semantic networks
- Use LLMEnhancer for LLM-based enhancement
- Use ExtractionValidator to validate extractions

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Workflow: Event Detection â†’ Coreference Resolution â†’ Triplet Extraction â†’ Semantic Analysis â†’ Network Extraction â†’ LLM Enhancement â†’ Validation


In [1]:
!pip install -q semantica



In [2]:
from semantica.semantic_extract import (
    EventDetector, CoreferenceResolver, TripletExtractor,
    SemanticAnalyzer, SemanticNetworkExtractor, LLMEnhancer, ExtractionValidator
)

text = "Apple Inc. was founded by Steve Jobs in 1976. The company is now led by Tim Cook."

event_detector = EventDetector()
events = event_detector.detect_events(text)

print(f"Detected {len(events)} events")
for event in events[:3]:
    print(f"  Event: {event.event_type} - {event.text[:50]}")


Status,Action,Module,Submodule,File,Time
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,EventDetector,-,0.01s
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,CoreferenceResolver,-,0.01s
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,TripletExtractor,-,1.32s
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,NERExtractor,-,0.51s
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,RelationExtractor,-,0.00s
âœ…,Semantica is extracting,ðŸŽ¯ semantic_extract,SemanticNetworkExtractor,-,1.21s


Detected 1 events
  Event: founded - founded


## Step 2: Coreference Resolution

Resolve coreferences in text.


In [3]:
coreference_resolver = CoreferenceResolver()

coreferences = coreference_resolver.resolve(text)

print(f"Resolved {len(coreferences)} coreference chains")


Resolved 0 coreference chains


## Step 3: Triplet Extraction

Extract RDF triplets.


In [4]:
triplet_extractor = TripletExtractor()

triplets = triplet_extractor.extract_triplets(text)

print(f"Extracted {len(triplets)} triplets")
for triplet in triplets[:3]:
    print(f"  ({triplet.get('subject', '')}, {triplet.get('predicate', '')}, {triplet.get('object', '')})")


DEBUG: Entity map keys: ['apple inc.', 'steve jobs', '1976', 'tim cook']
DEBUG: Match found! Subject='Apple Inc.', Object='Steve Jobs'
DEBUG: Subject Entity found: True, Object Entity found: True
DEBUG: Match found! Subject='Steve Jobs', Object='1976'
DEBUG: Subject Entity found: True, Object Entity found: True
Extracted 2 triplets
  (Apple Inc., founded_by, Steve Jobs)
  (Steve Jobs, located_in, 1976)


## Step 4: Semantic Analysis

Perform semantic analysis.


In [5]:
semantic_analyzer = SemanticAnalyzer()

semantic_roles = semantic_analyzer.analyze_semantic_roles(text)

print(f"Analyzed semantic roles: {len(semantic_roles)}")


Analyzed semantic roles: 6


## Step 5: Semantic Network Extraction

Extract semantic networks.


In [6]:
semantic_network_extractor = SemanticNetworkExtractor()

semantic_network = semantic_network_extractor.extract_network(text)

print(f"Extracted semantic network with {len(semantic_network.get('nodes', []))} nodes")
print(f"Edges: {len(semantic_network.get('edges', []))}")


DEBUG: Entity map keys: ['apple inc.', 'steve jobs', '1976', 'tim cook']
DEBUG: Match found! Subject='Apple Inc.', Object='Steve Jobs'
DEBUG: Subject Entity found: True, Object Entity found: True
DEBUG: Match found! Subject='Steve Jobs', Object='1976'
DEBUG: Subject Entity found: True, Object Entity found: True
Extracted semantic network with 4 nodes
Edges: 2


## Step 6: LLM Enhancement

Enhance extractions using LLM.


In [7]:
llm_enhancer = LLMEnhancer()

enhanced_extractions = llm_enhancer.enhance_extractions(events, text)

print(f"Enhanced {len(enhanced_extractions)} extractions")


AttributeError: 'LLMEnhancer' object has no attribute 'enhance_extractions'

## Step 7: Extraction Validation

Validate extractions.


In [None]:
extraction_validator = ExtractionValidator()

validation_result = extraction_validator.validate(events, text)

print(f"Extraction validation:")
print(f"  Valid: {validation_result.valid}")
print(f"  Confidence: {validation_result.confidence:.3f}")


## Summary

You've learned advanced extraction capabilities:

- **EventDetector**: Event detection and classification
- **CoreferenceResolver**: Coreference resolution
- **TripletExtractor**: RDF triplet extraction
- **SemanticAnalyzer**: Semantic analysis and role labeling
- **SemanticNetworkExtractor**: Semantic network extraction
- **LLMEnhancer**: LLM-based extraction enhancement
- **ExtractionValidator**: Extraction validation
