11 May 15:26

github-actions

2ef6e9f

v0.5.0 Latest

Latest

Semantica 0.5.0 — Distance Intelligence & Ontology Hub 🚀

Released: 2026-05-11

Highlights

Distance Intelligence — measure semantic distance between any two nodes across the graph, API, and Explorer
Ontology Hub — full workspace for browsing, loading, aligning, and validating ontologies in the Explorer
Parquet Ingest — native Parquet file ingestion support
KnowledgeGraph dataclass — first-class KnowledgeGraph type with native visualizer support
MCP Server — modular, pipx-installable MCP server package with 4 new plugin bundles

New Features

Distance Intelligence

Semantic distance between nodes across ContextGraph, REST API, and Knowledge Explorer — #502, #512 by @KaifAhmad1
Node distance semantics in PathResponse (step count, hop distance, relationship type) — #472, #477 by @KaifAhmad1
Bidirectional path finding via directed=false parameter — #469, #476 by @KaifAhmad1
Distance intelligence optimization and UI polish — #550 by @KaifAhmad1

Ontology Hub

Ontology Hub workspace in the Explorer: Registry, Loader, Entity Search, SKOS browser — #518, #521, #523 by @KaifAhmad1
Alignments panel, SHACL Studio, and ontology health dashboard — #524 by @KaifAhmad1 and @ZohaibHassan16

Knowledge Explorer UI

Full Explorer UI with graph, decision, provenance, temporal, and vocabulary workspaces — #453 by @KaifAhmad1 and @ZohaibHassan16
Redesigned landing page — #516 by @ZohaibHassan16
Grouped graph view — collapse related nodes into logical clusters — #493 by @ZohaibHassan16
Graph declutter mode and calm layout — #483 by @ZohaibHassan16
Stabilized local graph interaction — #487 by @Sameer6305
Welcome screen and SPA root handler — #501, #463 by @KaifAhmad1
Explorer visual refresh and graph element rendering system — #503 by @ZohaibHassan16
Indexed search for large graphs (bisect-based, thread-safe) — #481 by @ZohaibHassan16

Parquet Ingest

Native Parquet file ingestion (ParquetIngester) — #548 by @Luffy2208

KnowledgeGraph Dataclass

New KnowledgeGraph dataclass with native KGVisualizer support — #471, #474 by @KaifAhmad1
All visualize_* methods now accept KnowledgeGraph objects directly — #459 by @KaifAhmad1

MCP Server & Plugins

Modular MCP server package at repo root, installable via pipx — by @KaifAhmad1
New plugin bundles: Windsurf, Cline, Continue, VS Code — by @KaifAhmad1
OpenClaw integration module — #460 by @KaifAhmad1
Plugin local install fixed (hooks field, auto-load) — #489 by @serge

Deduplication

DuplicateDetector now supports max_results, top_k_per_entity, min_similarity, and sort_by — by @KaifAhmad1

Cookbooks

Datalog-style reasoning end-to-end notebook — #457 by @KaifAhmad1
Manual ontology + Snowflake mapping cookbook — by @KaifAhmad1

Bug Fixes 🐛

Windows install failure from gpu in [all] extra — #538 by @KaifAhmad1
UnicodeEncodeError on cp1252 / Windows consoles in progress tracker — #537 by @KaifAhmad1
Circular import in semantic_extract — #536 by @ZohaibHassan16
Lazy-load optional ingest backends — uninstalled extras no longer raise on import — #535 by @ZohaibHassan16
ConflictDetector.detect_conflicts duplicate method definition — #539 by @KaifAhmad1
DuplicateDetector merged group key normalization — #540 by @ZohaibHassan16
MCP server package structure for pipx installation — #544 by @KaifAhmad1
OWL/Turtle exporter silent data-property omission — #478 / #479 by @KaifAhmad1
Blazegraph literal serialization and prefixed datatype expansion — #450 by @KaifAhmad1
TripletStore IRI resolution against ontology namespace base URI — #447 / #451 by @KaifAhmad1
OWL generation — preserve user-facing schema fields, align IRIs with namespace — #449 by @KaifAhmad1
Provenance upstream ancestor traversal and direction classification — #480 by @Sameer6305
DeepSeek provider switched to OpenAI SDK client — #482 by @lingli
Distance intelligence visibility and slash-safe API calls — #513 / #515 by @ZohaibHassan16
Explorer graph motion and live layout restore — #486 by @KaifAhmad1
Polynomial ReDoS regex removed from format detection — #521 by @KaifAhmad1

Security 🔒

Patched 12 vulnerabilities (CRITICAL → LOW) across path traversal, injection, and unsafe deserialization — #452 by @KaifAhmad1
Updated mkdocs ≥1.6.1 — #507
Updated mkdocs-mermaid2-plugin — #508
Updated mkdocs-jupyter — #509
Updated mkdocs-material — #511
Updated pymdown-extensions — #510

Dependencies

Docker base image: python 3.12-slim → 3.14-slim — #466
Docker base image: node 20-alpine → 25-alpine — #465
pytest-benchmark ≥5.2.3 — #522

Contributors

@KaifAhmad1 — Distance intelligence, Ontology Hub, KnowledgeGraph dataclass, TripletStore, Blazegraph, MCP server, security, Windows fixes, Explorer welcome screen, OWL exporter, OpenClaw, MCP packaging, Datalog cookbook
@ZohaibHassan16 — Explorer UI overhaul, grouped view, graph declutter, indexed search, landing page, lazy ingest, circular import fix
@Sameer6305 — Provenance upstream traversal fix, local graph interaction
@lingli — DeepSeek provider fix
@Luffy2208 — Parquet ingest support
@serge — Plugin local install fix
@dependabot — Automated dependency and security updates

Breaking Changes

None.

Contributors

serge, lingli, and 5 other contributors

Assets 6

08 Apr 05:16

github-actions

v0.4.0

1d04005

v0.4.0

Semantica v0.4.0 — Release Notes

Released: 2026-04-08
PyPI: pip install semantica==0.4.0
Tag: v0.4.0
Full Changelog: CHANGELOG.md

v0.4.0 is the largest feature release to date. It ships a complete bi-temporal intelligence stack, a production-ready Knowledge Explorer API, first-class SHACL validation, SKOS vocabulary management, ontology alignment & diff, an Agno agentic framework integration, a Datalog reasoning engine, and a broad sweep of reliability, performance, and security fixes.

Test suite: 886 passed · 9 skipped · 0 failed

What's New

Temporal Intelligence

A full bi-temporal model is now baked into the core. Every entity, relationship, decision, and provenance record can carry valid time (when a fact was true in the world) and transaction time (when it was recorded in the system).

Core Temporal Data Model (PR #396)

New semantica.kg.temporal_model with shared parsing, normalization, and serialization helpers used across all temporal APIs
TemporalBound and BiTemporalFact exported from semantica.kg
valid, transaction, and both time-axis filtering in all temporal queries
TemporalValidationError raised consistently on invalid inputs — no silent coercions
History-preserving revisions in TemporalVersionManager.apply_revision() with supersession semantics

Temporal Query Engine: Point-in-Time Correctness (PR #397)

TemporalGraphQuery.reconstruct_at_time(graph, at_time) — builds a consistent point-in-time subgraph without mutating the source
query_at_time() uses reconstruction internally so returned subgraphs never contain dangling edges
TemporalConsistencyReport — detects inverted intervals, relationships outside entity lifetimes, missing endpoints, overlapping same-type relationships, and temporal gaps
validate_temporal_consistency(graph) available as a top-level module function
Sequence and cycle pattern detection with pattern_type, signature, frequency, and per-occurrence detail
Calendar-aligned temporal evolution bucketing via temporal_granularity
Causal ordering controls on find_temporal_paths() — enforce_causal_ordering, ordering_strategy (strict, overlap, loose)

Deterministic Temporal Reasoning Engine (PR #398)

New semantica.kg.temporal_reasoning — zero LLM calls, pure deterministic reasoning
Full Allen interval algebra via IntervalRelation — all 13 relations (before, meets, overlaps, starts, during, finishes, equals, and inverses)
TemporalReasoningEngine with helpers for interval merging, gap analysis, coverage calculation, timelines, and retroactive coverage
Circular import risk between semantica.reasoning and semantica.kg eliminated; semantica.reasoning access preserved via re-exports

Temporal Awareness in Context Graph (PR #399)

Decision dataclass carries valid_from / valid_until validity windows — superseded decisions remain in the graph (immutable history)
find_precedents_by_scenario(include_superseded=False, as_of=None) — defaults exclude expired decisions; as_of enables point-in-time queries
ContextGraph.state_at(timestamp) — serializable point-in-time snapshot; source graph never mutated
CausalChainAnalyzer.trace_at_time(event_id, at_time) — reconstructs causal chain using only edges recorded up to at_time
AgentContext.checkpoint(label), diff_checkpoints(label1, label2), flush_checkpoint(label) — named in-memory snapshots with structured diffs

Temporal Metadata Extraction from Text (PR #400)

extract_relations_llm(extract_temporal_bounds=True) — each returned Relation gains valid_from, valid_until, temporal_confidence (0.0–1.0), and temporal_source_text; default False is 100% backward-compatible
Calibrated confidence anchors baked into the prompt: 1.00 = full ISO date → 0.00 = no temporal signal
New TemporalNormalizer — zero LLM calls, pure regex + dateutil:
- normalize(value) → (valid_from, valid_until) UTC datetime tuple or None
- normalize_phrase(phrase) → domain metadata dict or None
- 13-domain default phrase map covering General/Policy, Healthcare, Cybersecurity, Supply Chain, Finance, and Energy
- Ambiguous DD/MM/YYYY inputs issue TemporalAmbiguityWarning — never silently guesses locale
- User-supplied phrase_map merged over defaults at construction

Temporal Provenance & Export (PR #401)

ProvenanceTracker.track_entity() auto-stamps recorded_at on every new record
query_recorded_between(start, end) — returns all provenance records within an inclusive time range
revision_history(fact_id) — complete revision chain ordered by recorded_at ascending
export_audit_log(fact_ids, format) — "json" (pretty-printed) or "csv" (with header row)
RDFExporter.export_to_rdf(include_temporal=True, time_axis="valid"|"transaction"|"both") — emits OWL-Time triples for all temporally-annotated relationships
create_snapshot() stamps "format_version": "1.0"; validate_snapshot() and migrate_snapshot() for stable snapshot lifecycle management

Temporal GraphRAG Integration (PR #402)

TemporalGraphRetriever — drop-in wrapper for any ContextRetriever; filters retrieved entities and relationships to a point in time; at_time=None is a true passthrough
ContextRetriever.query_with_reasoning(at_time=..., header_template=...) — structured temporal header prepended to LLM context; format-string injection guard via str.replace
TemporalQueryRewriter — extracts temporal_intent, at_time, start_time, end_time, and rewritten_query from natural language; regex-only by default, optional LLM-assisted mode

Ontology & Knowledge Representation

SHACL Shape Generation & Validation (PR #318)

SHACLGenerator derives SHACL node and property shapes from any Semantica ontology dict — zero hand-authoring required
Three quality tiers: "basic" (structure + cardinality), "standard" (adds sh:in, sh:pattern, inheritance), "strict" (adds sh:closed true + sh:ignoredProperties)
Output formats: Turtle, JSON-LD, N-Triples; iterative multi-level inheritance propagation, cycle-safe
OntologyEngine.to_shacl(), export_shacl(), and validate_graph(explain=True) — plain-English explanations for all 7 SHACL constraint types
SHACLValidationReport with conforms, violations, warnings, summary(), explain_violations(), to_dict()
Install: pip install semantica[shacl]

SKOS Vocabulary Module (PR #319)

TripletStore.add_skos_concept() — assembles and stores all required SKOS triples automatically via existing add_triplets() API
TripletStore.get_skos_concepts(scheme_uri=None) — SPARQL-backed retrieval with multi-value altLabel/broader/narrower collapsing
OntologyEngine.list_vocabularies(), list_concepts(scheme_uri), search_concepts(query, scheme_uri=None) — injection-safe SPARQL throughout
NamespaceManager.get_skos_uri(local_name) and build_concept_scheme_uri(name) namespace helpers

Ontology Alignment API (PR #361)

OntologyEngine.create_alignment(source_uri, target_uri, predicate) — stores triples using standard OWL/SKOS predicates (owl:equivalentClass, skos:exactMatch, skos:relatedMatch, etc.)
get_alignments(entity_uri) — bidirectional retrieval of all alignments for an entity
ReuseManager.suggest_alignments(target, source) — O(N+M) hashmap heuristic over exact label matches
QueryEngine.expand_entity_uri(uri, store, use_alignments=True) — SPARQL expansion to automatically include aligned equivalents in queries
SPARQL injection hardened in list_alignments and build_values_clause

Ontology Diff & Migration (PR #367)

VersionManager.diff_ontologies(base, target) — structured diff covering classes, properties, individuals, and axioms
ChangeLogAnalyzer.analyze(diff) — classifies impact: CRITICAL/BREAKING, HIGH/BREAKING, MEDIUM/POTENTIALLY_BREAKING, INFO/NON_BREAKING
ImpactReport and generate_change_report(diff) — structured output with summary, impact_classification, and recommendations
OntologyEngine.compare_versions(base_id, target_id, run_validation=True, graph_data=...) — end-to-end orchestrator with optional validation and graph-instance checks

Knowledge Explorer

A full FastAPI backend for the Semantica dashboard. Install with pip install semantica[explorer] and launch via semantica-explorer --graph my_graph.json.

Graph API (PR #384)

GET /api/graph/nodes|edges|stats — type/keyword filter, skip/limit pagination
GET /api/graph/node/{id}/neighbors — BFS traversal, configurable depth 1–5
GET /api/graph/node/{id}/path — BFS or Dijkstra, dispatched via algorithm param
POST /api/graph/search — full-text search across node content and metadata

Analytics, Decisions & Temporal (PR #384)

GET /api/analytics — centrality, community detection, connectivity (comma-separated metrics param)
GET /api/decisions/{id}/chain|precedents|compliance — causal chain BFS, ranked precedent retrieval, in-graph compliance edge scan
GET /api/temporal/snapshot|diff|patterns — point-in-time snapshots, node-set diffs between timestamps, pattern detection

Enrichment & Export (PR #384)

POST /api/enrich/extract|links|dedup|reason — NLP extraction, link prediction, deduplication, forward/backward inference
POST /api/export — 12 formats: JSON, Turtle, RDF-XML, N-Triples, CSV, GraphML, GEXF, OWL, Cypher, AQL, YAML; temp files always cleaned via try/finally
POST /api/import — JSON/JSON-LD multipart upload with WebSocket real-time progress events

SKOS Vocabulary REST API (PR #426)

`GE...

Contributors

Alex-yang00, KaifAhmad1, and 3 other contributors

Assets 4

10 Mar 22:07

github-actions

v0.3.0

43a8f82

v0.3.0

🧠 Semantica v0.3.0 — First Stable Release

Released: 2026-03-10 | PyPI: pip install semantica | Python: 3.8 – 3.12 | License: MIT

The first Production/Stable release of Semantica — an open-source framework for building context graphs and decision intelligence layers for AI agents. This release consolidates everything shipped across three stages: 0.3.0-alpha (2026-02-19), 0.3.0-beta (2026-03-07), and 0.3.0 stable (2026-03-10).

pip install --upgrade semantica

No breaking changes. All new parameters carry safe defaults. All new methods are purely additive.

🚦 Release Highlights

🕐 Temporal Validity — valid_from/valid_until on nodes & edges; query what's active at any point in time
🔗 Cross-Graph Navigation — link separate ContextGraph instances; navigate across them; survives save/load
⚖️ Weighted BFS Traversal — filter multi-hop queries by edge confidence with min_weight
🧠 Decision Intelligence — full lifecycle: record → causal chain → impact analysis → precedent search → policy enforcement
🔄 Delta Processing — SPARQL-based incremental graph diffs; only changed data flows through the pipeline
🗃️ Deduplication v2 — 6.98x faster semantic dedup, 63.6% faster candidate generation
📤 New Export Formats — ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks ready)
🗄️ Graph Backends — Apache AGE, PgVector, AWS Neptune, FalkorDB
✅ 886+ tests passing — 0 failures

👥 Contributors

Contributor	Areas
@KaifAhmad1	Lead maintainer — context graph, decision intelligence, KG algorithms, semantic extraction, pipeline, provenance, bug fixes, release management
@ZohaibHassan16	Deduplication v2 suite, incremental/delta processing, benchmark suite
@Sameer6305	Apache AGE backend, PgVector store, Snowflake connector, Apache Arrow export
@tibisabau	ArangoDB AQL export, Apache Parquet export
@d4ndr4d3	ResourceScheduler deadlock fix

✨ v0.3.0 Stable — Context Graph Feature Completeness

Shipped 2026-03-10 · All changes by @KaifAhmad1

🕐 Temporal Validity Windows

Nodes and edges now carry first-class valid_from / valid_until ISO datetime fields — stored directly on the ContextNode and ContextEdge dataclasses, not buried in metadata.

New API:

add_node(valid_from=..., valid_until=...) and add_edge(valid_from=..., valid_until=...) — set validity window at creation
node.is_active(at_time=None) and edge.is_active(at_time=None) — returns True if live at the given time (defaults to now)
graph.find_active_nodes(node_type=None, at_time=None) — filters entire graph to active nodes only

Bug fixes:

is_active() crashed with TypeError on tz-aware datetime inputs — fixed by normalising to tz-naive UTC via new _parse_iso_dt() helper
Validity fields silently lost during serialisation — fixed across all four paths: add_nodes(), add_edges(), to_dict(), from_dict()

🔗 Cross-Graph Navigation

Separate ContextGraph instances can now be linked and navigated between. Links are fully durable — they survive save_to_file() / load_from_file() and reconnect via a registry.

New API:

graph.graph_id — stable UUID assigned at init; persisted to JSON
link_graph(other_graph, source_node_id, target_node_id) — creates a navigable bridge; returns link_id
navigate_to(link_id) — returns (other_graph, target_node_id)
resolve_links({graph_id: instance}) — reconnects links after load; returns count resolved
save_to_file() — now writes a links section alongside nodes and edges
load_from_file() — restores graph_id and populates _unresolved_links

Bug fix: Previous implementation auto-created marker targets as phantom "entity" nodes — fixed by pre-creating a "cross_graph_link" typed ContextNode before inserting the marker edge.

14 new tests in tests/context/test_cross_graph_navigation.py covering link creation, phantom-node prevention, partial registry resolution, and full save/load round-trips.

⚖️ Weighted Multi-Hop BFS Traversal

get_neighbors() now accepts a min_weight threshold to confine traversal to high-confidence causal links only. Default 0.0 passes all edges — fully backward-compatible.

🔧 Additional Fixes in v0.3.0 Stable

PipelineBuilder.add_step() return type annotation corrected from "PipelineBuilder" to "PipelineStep"
test_hybrid_search_performance fixed to accumulate a true search_times list; threshold relaxed to < 5.0s for real sentence-transformers latency

🔧 v0.3.0-beta — Semantic Extraction, Deduplication v2, New Export Formats

Shipped 2026-03-07

🧩 Semantic Extraction Fixes — @KaifAhmad1 (PR #354, #355)

LLM Relation Extraction:

Unmatched subjects/objects now produce a synthetic UNKNOWN entity instead of silently dropping the relation
Orphaned legacy block in _parse_relation_result that appended every relation twice has been removed
extraction_method parameter added — typed extraction paths now record "llm_typed" instead of "llm"

Reasoner Pattern Matching:

_match_pattern in reasoner.py fully rewritten — splits patterns on ?var placeholders, escapes only literal segments, uses backreferences for repeated variables and non-greedy .+? to prevent over-consumption

RDF Export Aliases:

RDFExporter now accepts "ttl", "nt", "xml", "rdf", and "json-ld" as format aliases — zero API changes

Tests added: tests/reasoning/test_reasoner.py (4 tests), tests/semantic_extract/test_relation_extractor.py (6 tests), tests/export/test_rdf_exporter.py (8 tests)

🔄 Incremental / Delta Processing — @ZohaibHassan16, @KaifAhmad1 (PR #349)

Native SPARQL-based diff between graph snapshots — only changed triples enter the pipeline
delta_mode flag in PipelineBuilder for near-real-time incremental workloads
Version snapshot management with graph URI tracking and per-snapshot metadata storage
prune_versions() for automatic retention cleanup of old snapshots

Bug fixes: corrected SPARQL variable order, fixed class references, resolved duplicate dictionary keys.

🗃️ Deduplication v2 Suite — @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

Three independently opt-in tiers — legacy mode remains the default, fully backward-compatible.

Candidate Generation v2 (PR #338):

New blocking_v2 and hybrid_v2 strategies replace O(N²) pair enumeration
Multi-key blocking with normalised token prefixes, type-aware keys, and optional phonetic (Soundex) matching
Deterministic max_candidates_per_entity budgeting with stable sorting
63.6% faster in worst-case scenarios (0.259s → 0.094s for 100 entities)

Two-Stage Scoring Prefilter (PR #339):

Fast gates for type mismatch, name-length ratio, and token overlap eliminate expensive semantic scoring for obvious non-matches
Configurable thresholds: min_length_ratio, min_token_overlap_ratio, required_shared_token
18–25% faster batch processing when enabled (prefilter_enabled=False by default)

Semantic Relationship Deduplication v2 (PR #340):

Canonicalisation engine with predicate synonym mapping (e.g. works_for → employed_by)
O(1) hash matching for exact canonical signatures before any semantic scoring
Weighted scoring: 60% predicate + 40% object with explainable semantic_match_score in metadata
6.98x faster than legacy mode (83ms vs 579ms)
dedup_triplets() infinite recursion bug fixed; promoted to first-class API in methods.py

Migration guide: MIGRATION_V2.md with complete examples for all v2 strategies (PR #344)

📤 New Export Formats — @tibisabau (PR #342, #343)

ArangoDB AQL Export (PR #342):

Full AQL INSERT statement generation for vertices and edges
Configurable collection names with validation and sanitisation; batch processing (default: 1000)
export_arango() convenience function; .aql auto-detection in the unified exporter
17 tests — 100% pass rate

Apache Parquet Export (PR #343):

Columnar storage with configurable compression: snappy, gzip, brotli, zstd, lz4, none
Explicit Apache Arrow schemas with type safety and field normalisation
Analytics-ready: pandas, Spark, Snowflake, BigQuery, Databricks
export_parquet() convenience function; .parquet auto-detection
25 tests — 100% pass rate

🐛 Beta Bug Fixes — @KaifAhmad1

Context module:

retrieve_decision_precedents — entity extraction correctly gated on use_hybrid_search=True
_extract_entities_from_query — switched to word[0].isupper() to capture camelCase identifiers like CreditCard
Added missing expand_context() (BFS traversal) and _get_decision_query() methods
Fixed hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly for correct single-pass BFS
Fixed _retrieve_from_vector fallback to prevent empty content and negative similarity scores

KG module:

calculate_pagerank — added alpha/max_iter aliases; return format structured to {"centrality": scores, "rankings": sorted_list}
community_detector._to_networkx — fixed silent edge-loss when a NetworkX graph is passed directly
Added 9 domain-specific tracking methods to AlgorithmTrackerWithProvenance
Created provenance_tracker.py with ProvenanceTracker; correctly exported from semantica.kg

Pipeline module:

Retry loop fixed — now correctly iterate...

Contributors

d4ndr4d3, tibisabau, and 3 other contributors

Assets 4

07 Mar 11:28

github-actions

v0.3.0-beta

26b3b9b

v0.3.0-beta Pre-release

Pre-release

Semantica v0.3.0-beta — Release Notes

Date: 2026-03-07 | Tag: v0.3.0-beta | Status: Internal Beta (Pre-release)

Consolidates all alpha and unreleased features for internal validation ahead of the public 0.3.0 launch.

What's New

Semantic Extraction & Reasoning

Multi-Founder LLM Extraction Fix (#354) — Unmatched relation subjects/objects now produce synthetic UNKNOWN entities instead of being silently dropped; all LLM-returned co-founders preserved
Reasoner Pattern Matching Rewrite (#354) — _match_pattern correctly handles multi-word values, pre-bound variables, repeated variable backreferences, and non-greedy separators

Export

RDF / TTL Alias Fix (#355) — format="ttl", "nt", "xml", "rdf", "json-ld" all resolve without breaking existing callers
ArangoDB AQL Export (#342) — Full AQL INSERT generation for vertices and edges; configurable batching; 17 tests passing
Apache Parquet Export (#343) — Columnar storage with configurable compression (snappy, gzip, brotli, zstd, lz4); explicit Arrow schemas; 25 tests passing

Deduplication v2 (Epic #333)

Candidate Generation v2 (#338) — blocking_v2 / hybrid_v2 strategies with multi-key and phonetic blocking; 63.6% faster worst-case
Two-Stage Scoring Prefilter (#339) — Fast prefilter gates before expensive semantic scoring; 18–25% faster batch processing
Semantic Deduplication v2 (#340) — Opt-in semantic_v2 with canonicalization, O(1) hash matching, weighted scoring; 6.98x speedup; fixed infinite recursion bug
Migration Guide (#344) — MIGRATION_V2.md with full examples; 5.86x speedup confirmed; backward compatible

Incremental / Delta Processing

Delta Processing (#349) — Native SPARQL delta computation between graph snapshots; delta_mode pipeline config; prune_versions() for snapshot retention; production-ready for near real-time pipelines

Bug Fixes

NameError — missing Type import in utils/helpers.py; removed unused import from config_manager.py
Context module — fixed retrieve_decision_precedents, hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly, _retrieve_from_vector, _extract_entities_from_query; added missing expand_context and _get_decision_query methods
Knowledge Graph module — fixed calculate_pagerank, community_detector._to_networkx, detect_communities, _build_adjacency; added ProvenanceTracker and 9 domain-specific tracking methods
Pipeline module — fixed retry loop in execution_engine; added RecoveryAction with LINEAR / EXPONENTIAL / FIXED backoff; fixed add_step return value; added validate alias
Test files — replaced emoji with ASCII for Windows cp1252 compatibility; fixed assertion ordering and loop bugs across 4 test files

Test Results

Passing	Skipped (external services)	Failed
~840	36	0

Contributors

@KaifAhmad1 · @ZohaibHassan16 · @tibisabau

Contributors

tibisabau, KaifAhmad1, and ZohaibHassan16

Assets 4

19 Feb 18:46

github-actions

v0.3.0-alpha

d5e2637

v0.3.0-alpha Pre-release

Pre-release

🎉 Semantica v0.3.0-alpha Release

This alpha release introduces comprehensive decision tracking capabilities, advanced knowledge graph algorithms, and production-ready architecture for testing.

🚀 Major Features

Decision Tracking System

Complete decision lifecycle management with audit trails
Provenance tracking and lineage management
Policy compliance and exception handling
Decision influence analysis and impact scoring

Advanced Knowledge Graph Algorithms

Node2Vec embeddings for semantic similarity
Centrality analysis (degree, betweenness, closeness, eigenvector)
Community detection and graph analytics
Path finding and link prediction

Enhanced Context Module

Unified AgentContext with granular feature flags
Decision tracking integration
Production-ready architecture with validation
GraphStore capability validation

Vector Store Features

Hybrid search combining semantic, structural, and category similarity
Advanced retrieval with configurable weights
FastEmbed integration for efficient operations

🧪 Testing & Quality

113+ tests passing across context and core modules
Comprehensive decision tracking test coverage
Enhanced error handling and edge case testing
Fixed all critical test failures for release readiness

📦 Installation

pip install semantica==0.3.0a0

Assets 4

09 Feb 07:26

github-actions

v0.2.7

affe3aa

Semantica 0.2.7

Overview

Release 0.2.7 adds Snowflake integration, Apache Arrow export, and benchmark suite.

🚀 New Features

Snowflake Connector for Data Ingestion

PR #276 by @Sameer6305

Native Snowflake connector with multi-authentication support (password, OAuth, key-pair, SSO). Includes table/query ingestion, schema introspection, and SQL injection prevention.

Tests: 24/24 passing
Dependency: db-snowflake optional

Apache Arrow Export Support

PR #273 by @Sameer6305

High-performance columnar export with explicit schemas, compression, and Pandas/DuckDB compatibility.

Tests: 20/20 passing
Dependency: db-arrow optional

Comprehensive Benchmark Suite

PR #289 by @ZohaibHassan16, @KaifAhmad1

137+ benchmarks across all modules with regression detection and CI/CD integration.

Features: Statistical analysis, environment-agnostic design, CLI tool

📊 Quality Assurance

Total Tests: 44/44 passing
Breaking Changes: None
Backward Compatible: Yes

🛠 Installation

pip install semantica==0.2.7
pip install semantica[db-snowflake,db-arrow]==0.2.7

🙏 Contributors

@Sameer6305: Snowflake Connector, Arrow Export
@ZohaibHassan16: Benchmark Suite implementation
@KaifAhmad1: Benchmark enhancements, CI/CD integration

🔗 Links

GitHub: https://github.com/Hawksight-AI/semantica
PyPI: https://pypi.org/project/semantica/
Benchmarks: python benchmarks/benchmark_runner.py

📈 Performance

Text Processing: >10,000 ops/sec
Arrow Export: 10x faster
Benchmark Coverage: 137+ tests

Thanks to all contributors for making this release possible!

Contributors

KaifAhmad1, ZohaibHassan16, and Sameer6305

Assets 4

03 Feb 05:10

github-actions

v0.2.6

a4ab3fd

Semantica v0.2.6

Release Date: February 3, 2026

We're excited to announce Semantica v0.2.6, featuring major enhancements in provenance tracking, change management, and several important bug fixes!

🎉 Highlights

Major Features

W3C PROV-O Compliant Provenance Tracking - Enterprise-grade lineage tracking across all 17 modules
Enhanced Change Management - Version control for knowledge graphs and ontologies
CSV Ingestion Improvements - Auto-detection and robust error handling
Comprehensive Test Coverage - 80-86% coverage for ingestion modules

Bug Fixes

Temperature compatibility for LLM providers
JenaStore empty graph initialization

✨ New Features & Enhancements

W3C PROV-O Compliant Provenance Tracking

PRs: #254, #246 | Contributor: @KaifAhmad1

A comprehensive provenance tracking system with W3C PROV-O compliance across all 17 Semantica modules.

Core Module:

ProvenanceManager for centralized tracking
W3C PROV-O schemas (Activity, Entity, Agent)
Storage backends: InMemory and SQLite
SHA-256 integrity verification

Module Integrations:

Semantic Extract, LLMs (Groq, OpenAI, HuggingFace, LiteLLM)
Pipeline, Context, Ingest, Embeddings
Graph/Vector/Triplet stores
Reasoning, Conflicts, Deduplication
Export, Parse, Normalize, Ontology, Visualization

Features:

Complete lineage tracking: Document → Chunk → Entity → Relationship → Graph
LLM tracking: tokens, costs, latency
Source tracking and bridge axioms for domain transformations

Compliance:

W3C PROV-O, FDA 21 CFR Part 11, SOX, HIPAA, TNFD

Testing:

237 tests covering core functionality, all 17 module integrations, edge cases, backward compatibility

Design:

Opt-in with provenance=False by default
Zero breaking changes
No new dependencies

Enhanced Change Management Module

PRs: #248, #243 | Contributor: @KaifAhmad1

Enterprise-grade version control for knowledge graphs and ontologies with persistent storage and audit trails.

Core Classes:

TemporalVersionManager - Knowledge graph versioning
OntologyVersionManager - Ontology versioning
ChangeLogEntry - Change metadata tracking

Storage:

SQLite (persistent) and in-memory backends
Thread-safe operations

Features:

SHA-256 checksums for integrity
Detailed entity/relationship diffs
Structural ontology comparison
Email validation

Compliance:

HIPAA, SOX, FDA 21 CFR Part 11
Immutable audit trails

Testing:

104 tests (100% pass)
Unit, integration, compliance, performance, edge cases

Performance:

17.6ms for 10k entities
510+ ops/sec concurrent
Handles 5k+ entity graphs

Migration:

Backward compatible
Simplified class names
Zero external dependencies

CSV Ingestion Enhancements

PR: #244 | Contributor: @saloni0318

Robust CSV parsing with auto-detection and error handling.

Features:

Auto-detect CSV encoding using chardet
Auto-detect delimiter using csv.Sniffer
Tolerant decoding and malformed-row handling (on_bad_lines='warn')
Optional chunked reading for large files
Metadata tracks detected values

Testing:

Expanded unit tests covering:
- Multiple delimiters
- Quoted/multiline fields
- Header overrides
- Chunked reading
- NaN preservation

Comprehensive Test Coverage

TextNormalizer Tests

PR: #242 | Contributor: @ZohaibHassan16

Added focused test coverage for TextNormalizer behavior across various inputs.

Integration Test Improvements

PR: #241 | Contributor: @KaifAhmad1

Introduced integration test marker
Reduced noisy warnings in ingest tests

Ingest Unit Tests

PRs: #239, #232 | Contributor: @Mohammed2372

Comprehensive unit tests for ingestion modules (file, web, and feed ingestors).

Coverage:

File scanning: local/cloud (S3/GCS/Azure)
Web ingestion: URL/sitemap/robots.txt
RSS/Atom feed parsing

Testing:

998 lines of test code
Mocked external dependencies for fast, isolated execution

Results:

file_ingestor: 86% coverage
web_ingestor: 86% coverage
feed_ingestor: 80% coverage

Covers happy paths, edge cases, and error handling.

🐛 Bug Fixes

Temperature Compatibility Fix

PRs: #256, #252 | Contributors: @F0rt1s, @IGES-Institut

Fixed hardcoded temperature=0.3 that broke compatibility with models requiring specific temperature values (e.g., gpt-5-mini).

Changes:

Added _add_if_set helper method to BaseProvider
Only passes parameters when explicitly set
When temperature=None, parameter is omitted allowing APIs to use model defaults
Updated all 5 providers: OpenAI, Groq, Gemini, Ollama, DeepSeek

Impact:

Reduced code by ~85 lines with cleaner parameter handling
Comprehensive test coverage added (10 temperature tests, all passing)
Backward compatible - no breaking changes

JenaStore Empty Graph Bug

PRs: #257, #258 | Contributor: @ZohaibHassan16

Fixed ProcessingError: Graph not initialized when operating on empty (but initialized) graphs.

Changes:

Replaced implicit if not self.graph: checks with explicit if self.graph is None: validation
Updated 5 methods: add_triplets, get_triplets, delete_triplet, execute_sparql, serialize
Properly distinguishes None (uninitialized) from empty graphs (initialized with 0 triplets)

Impact:

Unblocks benchmarking suite
Enables fresh deployments
Improves testing workflows

📦 Installation

pip install semantica==0.2.6

Or upgrade from a previous version:

pip install --upgrade semantica

🙏 Contributors

Special thanks to all contributors who made this release possible:

@KaifAhmad1 - Provenance tracking, change management, test improvements
@saloni0318 - CSV ingestion enhancements
@ZohaibHassan16 - TextNormalizer tests, JenaStore bug fix
@Mohammed2372 - Comprehensive ingest unit tests
@F0rt1s - Temperature compatibility fix
@IGES-Institut - Temperature compatibility fix

📚 Documentation

Documentation: https://semantica.readthedocs.io
GitHub: https://github.com/Hawksight-AI/semantica
PyPI: https://pypi.org/project/semantica/

🔗 Links

🚀 What's Next?

Stay tuned for upcoming features in future releases. Check our GitHub Issues to see what we're working on!

Full Changelog: v0.2.5...v0.2.6

Contributors

F0rt1s, Mohammed2372, and 4 other contributors

Assets 4

27 Jan 16:26

github-actions

v0.2.5

3968a45

Deep Extraction, BYOM & Pinecone Support (v0.2.5)

Semantica v0.2.5

🚀 Release Highlights

This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.

🌟 New Features

Pinecone Vector Store Support

Implemented native PineconeStore with full CRUD capabilities.
Support for serverless and pod-based indexes, namespaces, and metadata filtering.
Fully integrated with the unified VectorStore interface and registry.
(Closes #219, Resolves #220)

Configurable LLM Retry Logic

Exposed max_retries parameter in NERExtractor, RelationExtractor, and TripletExtractor.
Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
Propagated retry configuration through chunked processing helpers for consistent long-document handling.

Bring Your Own Model (BYOM) Support

Custom Hugging Face Models: Enabled full support for custom models in NERExtractor, RelationExtractor, and TripletExtractor.
Custom Tokenizers: Added support for models with non-standard tokenization requirements.
Runtime Overrides: extract(model=...) now correctly overrides configuration defaults.

Enhanced Extraction Capabilities

NER: Added configurable aggregation strategies (simple, first, average, max) and robust IOB/BILOU parsing.
Relation Extraction: Implemented standard entity marker techniques (<subj>, <obj>) and structured output parsing.
Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.

🐛 Bug Fixes

LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing max_retries limits.
Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
Import Handling: Fixed circular import issues in test suites via improved mocking strategies.

📦 Installation

pip install semantica==0.2.5

Assets 6

22 Jan 07:20

github-actions

v0.2.4

b382a7d

Semantica v0.2.4

Added

Ontology Ingestion Module:
- Implemented OntologyIngestor for parsing RDF/OWL files (Turtle, RDF/XML, JSON-LD, N3).
- Added ingest_ontology and unified ingest(source_type="ontology") interface.
- Added recursive directory scanning for batch ontology ingestion.
- Added OntologyData dataclass for consistent metadata.
Documentation:
- Updated ontology_usage.md and ontology.md with usage examples and API details.
Tests:
- Added comprehensive test suite tests/ingest/test_ontology_ingestor.py.
- Added examples/demo_ontology_ingest.py for end-to-end demonstration.

Assets 4

20 Jan 06:39

github-actions

v0.2.3

fa8544c

Semantica v0.2.3

We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.

🚀 Added

Vector Store High-Performance Ingestion

New add_documents API: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing.
embed_batch Helper: Efficiently generate embeddings for lists of texts without immediate storage.
Parallel Defaults: Enabled default parallel ingestion in VectorStore (default: max_workers=6) for faster processing.
Documentation: Added dedicated guide docs/vector_store_usage.md for high-performance configuration.
Tests: Added tests/vector_store/test_vector_store_parallel.py covering parallel vs. sequential performance and edge cases.

Amazon Neptune Dev Environment

CloudFormation Template: Added cookbook/introduction/neptune-setup.yaml to provision a development Neptune cluster with public endpoints and IAM auth.
Documentation: Updated cookbook/introduction/21_Amazon_Neptune_Store.ipynb with deployment guides, cost estimates, and IAM best practices.
Linting: Added cfn-lint to pre-commit hooks for CloudFormation validation.

Comprehensive Test Suite

Unit Tests: Added tests/test_relations_llm.py covering typed and structured response paths for relation extraction.
Integration Tests: Added tests/integration/test_relations_groq.py for real Groq API validation.

🐛 Fixed

LLM Relation Extraction Parsing

Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
Parameter Cleanup: Removed unsupported kwargs (max_tokens, max_entities_prompt) from internal calls to prevent API errors.

Pipeline Circular Import

Resolved Import Cycles: Fixed circular dependency between pipeline_builder and pipeline_validator (Issues #192, #193).
Lazy Loading: Implemented lazy loading for PipelineValidator to ensure stable imports.

JupyterLab Stability

Progress Output Control: Added SEMANTICA_DISABLE_JUPYTER_PROGRESS environment variable.
Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).

⚡ Changed

Relation Extraction API

Simplified Interface: Removed unused kwargs to prevent parameter leakage.
Better Debugging: Improved error handling and verbose logging for extraction workflows.
Robust Parsing: Enhanced post-response parsing stability across different LLM providers.

Vector Store Defaults

Standardized Concurrency: Set default max_workers=6 for VectorStore parallel ingestion.
Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.

Assets 4

Uh oh!

Uh oh!

Releases: semantica-agi/semantica

v0.5.0

Semantica 0.5.0 — Distance Intelligence & Ontology Hub 🚀

Highlights

New Features

Distance Intelligence

Ontology Hub

Knowledge Explorer UI

Parquet Ingest

KnowledgeGraph Dataclass

MCP Server & Plugins

Deduplication

Cookbooks

Bug Fixes 🐛

Security 🔒

Dependencies

Contributors

Breaking Changes

Contributors

Uh oh!

v0.4.0

Semantica v0.4.0 — Release Notes

What's New

Temporal Intelligence

Ontology & Knowledge Representation

Knowledge Explorer

Contributors

Uh oh!

v0.3.0

🧠 Semantica v0.3.0 — First Stable Release

🚦 Release Highlights

👥 Contributors

✨ v0.3.0 Stable — Context Graph Feature Completeness

🕐 Temporal Validity Windows

🔗 Cross-Graph Navigation

⚖️ Weighted Multi-Hop BFS Traversal

🔧 Additional Fixes in v0.3.0 Stable

🔧 v0.3.0-beta — Semantic Extraction, Deduplication v2, New Export Formats

🧩 Semantic Extraction Fixes — @KaifAhmad1 (PR #354, #355)

🔄 Incremental / Delta Processing — @ZohaibHassan16, @KaifAhmad1 (PR #349)

🗃️ Deduplication v2 Suite — @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

📤 New Export Formats — @tibisabau (PR #342, #343)

🐛 Beta Bug Fixes — @KaifAhmad1

Contributors

Uh oh!

v0.3.0-beta

Semantica v0.3.0-beta — Release Notes

What's New

Semantic Extraction & Reasoning

Export

Deduplication v2 (Epic #333)

Incremental / Delta Processing

Bug Fixes

Test Results

Contributors

Contributors

Uh oh!

v0.3.0-alpha

🎉 Semantica v0.3.0-alpha Release

🚀 Major Features

Decision Tracking System

Advanced Knowledge Graph Algorithms

Enhanced Context Module

Vector Store Features

🧪 Testing & Quality

📦 Installation

Uh oh!

Semantica 0.2.7

Overview

🚀 New Features

Snowflake Connector for Data Ingestion

Apache Arrow Export Support

Comprehensive Benchmark Suite

📊 Quality Assurance

🛠 Installation

🙏 Contributors

🔗 Links

📈 Performance