[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/introduction/08_Building_Knowledge_Graphs.ipynb)

# Building Knowledge Graphs

## Overview

This notebook demonstrates how to build knowledge graphs from entities and relationships using Semantica's graph building modules. You'll learn to use `GraphBuilder` and `EntityResolver`.

**Documentation**: [API Reference](https://semantica.readthedocs.io/reference/kg/)

### Learning Objectives

- Use `GraphBuilder` to construct knowledge graphs
- Use `EntityResolver` to resolve entity conflicts
**Note**: For deduplication, use the `semantica.deduplication` module.

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Build Knowledge Graph

Construct a knowledge graph from entities and relationships.


In [1]:
!pip install semantica






In [4]:
from semantica.kg import GraphBuilder
from semantica.semantic_extract import NERExtractor, RelationExtractor

builder = GraphBuilder()
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()

text = "Apple Inc. is a technology company. Tim Cook is the CEO of Apple Inc. Apple Inc. is headquartered in Cupertino, California."

entities_list = ner_extractor.extract(text)
relationships_list = relation_extractor.extract(text, entities_list)

entities = []
for i, entity in enumerate(entities_list[:5], 1):
    entities.append({
        "id": f"e{i}",
        "type": entity.label,
        "name": entity.text,
        "properties": {}
    })

relationships = []
for i, rel in enumerate(relationships_list[:3], 1):
    relationships.append({
        "source": f"e{1}",
        "target": f"e{i+1}",
        "type": rel.predicate,
        "properties": {}
    })

knowledge_graph = builder.build(entities, relationships)

print(f"Built knowledge graph with {len(knowledge_graph.get('entities', []))} entities")
print(f"Relationships: {len(knowledge_graph.get('relationships', []))}")

## Step 2: Entity Resolution

Resolve entity conflicts and duplicates.


In [7]:
from semantica.kg import EntityResolver

entity_resolver = EntityResolver()

resolved_entities = entity_resolver.resolve_entities(entities)

print(f"Original entities: {len(entities)}")
print(f"Resolved entities: {len(resolved_entities)}")

Original entities: 5
Resolved entities: 4


## Step 3: Deduplication

Remove duplicate entities from the graph.


In [13]:
from semantica.deduplication import DuplicateDetector, EntityMerger, MergeStrategy

# Detect duplicates
detector = DuplicateDetector(similarity_threshold=0.8)
duplicate_groups = detector.detect_duplicate_groups(knowledge_graph.get('entities', []))

# Merge duplicates
merger = EntityMerger()
merge_operations = merger.merge_duplicates(
    knowledge_graph.get('entities', []),
    strategy=MergeStrategy.KEEP_MOST_COMPLETE
)

deduplicated_entities = [op.merged_entity for op in merge_operations]

print(f"Original entities: {len(knowledge_graph.get('entities', []))}")
print(f"Deduplicated entities: {len(deduplicated_entities)}")


Original entities: 4
Deduplicated entities: 0


## Summary

You've learned how to build knowledge graphs:

- **GraphBuilder**: Construct knowledge graphs from entities and relationships
- **EntityResolver**: Resolve entity conflicts and duplicates
- **Deduplication**: Use `semantica.deduplication` module for removing duplicate entities

Next: Learn how to analyze graphs in the Graph_Analytics notebook.
