[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/introduction/01_Welcome_to_Semantica.ipynb)

# Welcome to Semantica

## Overview

This notebook introduces you to the **Semantica framework** - a comprehensive knowledge graph and semantic processing framework for building production-ready semantic AI applications.

**Documentation**: [Getting Started](https://semantica.readthedocs.io/getting-started/) • [Concepts](https://semantica.readthedocs.io/concepts/) • [API Reference](https://semantica.readthedocs.io/reference/)

### What You'll Learn

- What Semantica is and why it's useful
- How to install and configure the framework
- Understanding the framework architecture
- Key concepts and terminology
- Next steps for getting started

## What is Semantica?

**Semantica** is a production-ready framework for:

- **Building Knowledge Graphs**: Transform unstructured data into structured knowledge graphs
- **Semantic Processing**: Extract entities, relationships, and meaning from text, images, and audio
- **GraphRAG**: Graph-based retrieval augmented generation
- **Temporal Analysis**: Time-aware knowledge graphs
- **Multi-Modal Processing**: Handle text, images, audio, and structured data
- **Enterprise Features**: Quality assurance, conflict resolution, ontology generation

### Use Cases

- Threat intelligence and cybersecurity
- Healthcare and medical research
- Financial analysis and fraud detection
- Supply chain optimization
- Research and knowledge management
- Multi-agent AI systems


## Installation & Setup

### Prerequisites

Before installing Semantica, ensure you have:
- Python 3.8 or higher
- pip package manager
- (Optional) Virtual environment for isolation

### Installation Methods

```bash
# Method 1: Install from PyPI (Recommended)
pip install semantica

# Or install with all optional dependencies:
pip install semantica[all]

# Method 2: Install from source (development version)
git clone https://github.com/Hawksight-AI/semantica.git
cd semantica
pip install -e .

# Or with all optional dependencies:
pip install -e ".[all]"

# Verify installation
import semantica
print(semantica.__version__)
```

### Configuration

```bash
# Set up environment variables for API keys and configuration
# export SEMANTICA_API_KEY=your_openai_key
# export SEMANTICA_EMBEDDING_PROVIDER=openai
# export SEMANTICA_MODEL_NAME=gpt-4

# Or use a config file (config.yaml):
# api_keys:
#   openai: your_key_here
#   anthropic: your_key_here
# embedding:
#   provider: openai
#   model: text-embedding-3-large
#   dimensions: 3072
# knowledge_graph:
#   backend: networkx  # or neo4j, arangodb
#   temporal: true
```

---

## Framework Architecture Overview

Semantica is organized into modular components, each handling a specific aspect of semantic processing:

### 1. INGEST MODULE - Data Ingestion
**Purpose**: Ingest data from various sources
**Components**:
- `FileIngestor`: Read files (PDF, DOCX, HTML, JSON, CSV, etc.)
- `WebIngestor`: Scrape and ingest web pages
- `FeedIngestor`: Process RSS/Atom feeds
- `StreamIngestor`: Real-time data streaming (Kafka, RabbitMQ, Kinesis, Pulsar)
- `DBIngestor`: Database queries and ingestion (PostgreSQL, MySQL, SQLite, Oracle, SQL Server)
- `EmailIngestor`: Process email messages (IMAP, POP3)
- `RepoIngestor`: Git repository analysis
- `MCPIngestor`: Model Context Protocol server integration

**Example**:
```python
from semantica.ingest import FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, EmailIngestor, RepoIngestor, MCPIngestor
file_ingestor = FileIngestor()
web_ingestor = WebIngestor()
documents = file_ingestor.ingest("data/")
web_docs = web_ingestor.ingest("https://example.com")
```

### 2. PARSE MODULE - Document Parsing
**Purpose**: Parse and extract content from various formats
**Components**:
- `DocumentParser`: Main parser orchestrator
- `PDFParser`: Extract text, tables, images from PDFs
- `DOCXParser`: Parse Word documents
- `HTMLParser`: Extract content from HTML
- `JSONParser`: Parse structured JSON data
- `ExcelParser`: Process spreadsheets
- `ImageParser`: OCR and image analysis
- `CodeParser`: Parse source code files

**Example**:
```python
from semantica.parse import DocumentParser
parser = DocumentParser()
parsed_docs = parser.parse(documents)
```

### 3. NORMALIZE MODULE - Text Normalization
**Purpose**: Clean and normalize text for processing
**Components**:
- `TextNormalizer`: Main normalization orchestrator
- `TextCleaner`: Remove noise, fix encoding
- `DataCleaner`: Clean structured data
- `EntityNormalizer`: Normalize entity names
- `DateNormalizer`: Standardize date formats
- `NumberNormalizer`: Normalize numeric values
- `LanguageDetector`: Detect document language
- `EncodingHandler`: Handle character encoding

**Example**:
```python
from semantica.normalize import TextNormalizer
normalizer = TextNormalizer()
normalized = normalizer.normalize(parsed_docs)
```

### 4. SEMANTIC_EXTRACT MODULE - Entity & Relationship Extraction
**Purpose**: Extract entities, relationships, and semantic information
**Components**:
- `NERExtractor`: Named Entity Recognition
- `RelationExtractor`: Extract relationships between entities
- `SemanticAnalyzer`: Deep semantic analysis
- `SemanticNetworkExtractor`: Extract semantic networks

**Example**:
```python
from semantica.semantic_extract import NERExtractor, RelationExtractor
extractor = NERExtractor()
entities = extractor.extract(normalized_docs)
relation_extractor = RelationExtractor()
relationships = relation_extractor.extract(normalized_docs, entities)
```

### 5. KG MODULE - Knowledge Graph Construction
**Purpose**: Build and manage knowledge graphs
**Components**:
- `GraphBuilder`: Construct knowledge graphs from entities/relationships
- `GraphAnalyzer`: Analyze graph structure and properties
- `GraphValidator`: Validate graph quality and consistency
- `EntityResolver`: Resolve entity conflicts and duplicates
- `ConflictDetector`: Detect conflicting information
- `CentralityCalculator`: Calculate node importance metrics
- `CommunityDetector`: Detect communities in graphs
- `ConnectivityAnalyzer`: Analyze graph connectivity
- `TemporalQuery`: Query temporal knowledge graphs
- `Deduplicator`: Remove duplicate entities/relationships

**Example**:
```python
from semantica.kg import GraphBuilder, GraphAnalyzer
builder = GraphBuilder()
kg = builder.build(entities, relationships)
analyzer = GraphAnalyzer()
metrics = analyzer.analyze(kg)
```

### 6. EMBEDDINGS MODULE - Embedding Generation
**Purpose**: Generate vector embeddings for various data types
**Components**:
- `EmbeddingGenerator`: Main embedding orchestrator
- `TextEmbedder`: Generate text embeddings
- `ImageEmbedder`: Generate image embeddings
- `AudioEmbedder`: Generate audio embeddings
- `MultimodalEmbedder`: Combine multiple modalities
- `EmbeddingOptimizer`: Optimize embedding quality
- `ProviderAdapters`: Support for OpenAI, Cohere, etc.

**Example**:
```python
from semantica.embeddings import EmbeddingGenerator
generator = EmbeddingGenerator()
embeddings = generator.generate(documents)
```

### 7. VECTOR_STORE MODULE - Vector Database Operations
**Purpose**: Store and search vector embeddings
**Components**:
- `VectorStore`: Main vector store interface
- `FAISSAdapter`: FAISS integration
- `HybridSearch`: Combine vector and keyword search
- `VectorRetriever`: Retrieve relevant vectors

**Example**:
```python
from semantica.vector_store import VectorStore, HybridSearch
vector_store = VectorStore()
vector_store.store(embeddings, documents, metadata)
hybrid_search = HybridSearch(vector_store)
results = hybrid_search.search(query, top_k=10)
```

### 8. GRAPH_STORE MODULE - Persistent Graph Database Operations
**Purpose**: Store and query property graphs in Neo4j, KuzuDB, or FalkorDB
**Components**:
- `GraphStore`: Main graph store interface
- `Neo4jAdapter`: Neo4j integration (enterprise features)
- `KuzuAdapter`: KuzuDB integration (embedded, no server)
- `FalkorDBAdapter`: FalkorDB integration (Redis-based, ultra-fast)

**Example**:
```python
from semantica.graph_store import GraphStore
store = GraphStore(backend="kuzu", database_path="./my_graph_db")
store.connect()
node = store.create_node(["Person"], {"name": "John", "age": 30})
store.create_relationship(node["id"], other_id, "KNOWS", {"since": 2020})
results = store.execute_query("MATCH (p:Person) RETURN p.name")
store.close()
```

### 9. REASONING MODULE - Inference and Reasoning
**Purpose**: Perform logical inference and reasoning
**Components**:
- `InferenceEngine`: Main inference orchestrator
- `RuleManager`: Manage inference rules
- `DeductiveReasoner`: Deductive reasoning
- `AbductiveReasoner`: Abductive reasoning
- `ExplanationGenerator`: Generate explanations for inferences
- `RETEEngine`: RETE algorithm for rule matching

**Example**:
```python
from semantica.reasoning import InferenceEngine, RuleManager
inference_engine = InferenceEngine()
rule_manager = RuleManager()
new_facts = inference_engine.forward_chain(kg, rule_manager)
```

### 10. ONTOLOGY MODULE - Ontology Generation
**Purpose**: Generate and manage ontologies
**Components**:
- `OntologyGenerator`: Generate ontologies from knowledge graphs
- `OntologyValidator`: Validate ontology structure
- `OWLGenerator`: Generate OWL format ontologies
- `PropertyGenerator`: Generate ontology properties
- `ClassInferrer`: Infer ontology classes

**Example**:
```python
from semantica.ontology import OntologyGenerator
generator = OntologyGenerator()
ontology = generator.generate_from_graph(kg)
```

### 11. EXPORT MODULE - Data Export
**Purpose**: Export data in various formats
**Components**:
- `JSONExporter`: Export to JSON
- `RDFExporter`: Export to RDF/XML
- `CSVExporter`: Export to CSV
- `GraphExporter`: Export to graph formats (GraphML, GEXF)
- `OWLExporter`: Export to OWL
- `VectorExporter`: Export vectors

**Example**:
```python
from semantica.export import JSONExporter, RDFExporter
json_exporter = JSONExporter()
json_exporter.export(kg, "output.json")
```

### 12. VISUALIZATION MODULE - Graph Visualization
**Purpose**: Visualize knowledge graphs and analytics
**Components**:
- `KGVisualizer`: Visualize knowledge graphs
- `EmbeddingVisualizer`: Visualize embeddings (t-SNE, PCA, UMAP)
- `QualityVisualizer`: Visualize quality metrics
- `AnalyticsVisualizer`: Visualize graph analytics
- `TemporalVisualizer`: Visualize temporal data

**Example**:
```python
from semantica.visualization import KGVisualizer
visualizer = KGVisualizer()
visualizer.visualize(kg)
```

### 13. PIPELINE MODULE - Pipeline Orchestration
**Purpose**: Build and execute processing pipelines
**Components**:
- `PipelineBuilder`: Build complex pipelines
- `ExecutionEngine`: Execute pipelines
- `FailureHandler`: Handle pipeline failures
- `ParallelismManager`: Enable parallel processing
- `ResourceScheduler`: Schedule resources

**Example**:
```python
from semantica.pipeline import PipelineBuilder
builder = PipelineBuilder()
pipeline = builder.add_step("ingest", FileIngestor()) \\
                  .add_step("parse", DocumentParser()) \\
                  .build()
```


