Releases: semantica-agi/semantica
v0.5.0
Semantica 0.5.0 β Distance Intelligence & Ontology Hub π
Released: 2026-05-11
Highlights
- Distance Intelligence β measure semantic distance between any two nodes across the graph, API, and Explorer
- Ontology Hub β full workspace for browsing, loading, aligning, and validating ontologies in the Explorer
- Parquet Ingest β native Parquet file ingestion support
- KnowledgeGraph dataclass β first-class
KnowledgeGraphtype with native visualizer support - MCP Server β modular,
pipx-installable MCP server package with 4 new plugin bundles
New Features
Distance Intelligence
- Semantic distance between nodes across
ContextGraph, REST API, and Knowledge Explorer β #502, #512 by @KaifAhmad1 - Node distance semantics in
PathResponse(step count, hop distance, relationship type) β #472, #477 by @KaifAhmad1 - Bidirectional path finding via
directed=falseparameter β #469, #476 by @KaifAhmad1 - Distance intelligence optimization and UI polish β #550 by @KaifAhmad1
Ontology Hub
- Ontology Hub workspace in the Explorer: Registry, Loader, Entity Search, SKOS browser β #518, #521, #523 by @KaifAhmad1
- Alignments panel, SHACL Studio, and ontology health dashboard β #524 by @KaifAhmad1 and @ZohaibHassan16
Knowledge Explorer UI
- Full Explorer UI with graph, decision, provenance, temporal, and vocabulary workspaces β #453 by @KaifAhmad1 and @ZohaibHassan16
- Redesigned landing page β #516 by @ZohaibHassan16
- Grouped graph view β collapse related nodes into logical clusters β #493 by @ZohaibHassan16
- Graph declutter mode and calm layout β #483 by @ZohaibHassan16
- Stabilized local graph interaction β #487 by @Sameer6305
- Welcome screen and SPA root handler β #501, #463 by @KaifAhmad1
- Explorer visual refresh and graph element rendering system β #503 by @ZohaibHassan16
- Indexed search for large graphs (bisect-based, thread-safe) β #481 by @ZohaibHassan16
Parquet Ingest
- Native Parquet file ingestion (
ParquetIngester) β #548 by @Luffy2208
KnowledgeGraph Dataclass
- New
KnowledgeGraphdataclass with nativeKGVisualizersupport β #471, #474 by @KaifAhmad1 - All
visualize_*methods now acceptKnowledgeGraphobjects directly β #459 by @KaifAhmad1
MCP Server & Plugins
- Modular MCP server package at repo root, installable via
pipxβ by @KaifAhmad1 - New plugin bundles: Windsurf, Cline, Continue, VS Code β by @KaifAhmad1
- OpenClaw integration module β #460 by @KaifAhmad1
- Plugin local install fixed (hooks field, auto-load) β #489 by @serge
Deduplication
DuplicateDetectornow supportsmax_results,top_k_per_entity,min_similarity, andsort_byβ by @KaifAhmad1
Cookbooks
- Datalog-style reasoning end-to-end notebook β #457 by @KaifAhmad1
- Manual ontology + Snowflake mapping cookbook β by @KaifAhmad1
Bug Fixes π
- Windows install failure from
gpuin[all]extra β #538 by @KaifAhmad1 UnicodeEncodeErroron cp1252 / Windows consoles in progress tracker β #537 by @KaifAhmad1- Circular import in
semantic_extractβ #536 by @ZohaibHassan16 - Lazy-load optional ingest backends β uninstalled extras no longer raise on import β #535 by @ZohaibHassan16
ConflictDetector.detect_conflictsduplicate method definition β #539 by @KaifAhmad1DuplicateDetectormerged group key normalization β #540 by @ZohaibHassan16- MCP server package structure for
pipxinstallation β #544 by @KaifAhmad1 - OWL/Turtle exporter silent data-property omission β #478 / #479 by @KaifAhmad1
- Blazegraph literal serialization and prefixed datatype expansion β #450 by @KaifAhmad1
- TripletStore IRI resolution against ontology namespace base URI β #447 / #451 by @KaifAhmad1
- OWL generation β preserve user-facing schema fields, align IRIs with namespace β #449 by @KaifAhmad1
- Provenance upstream ancestor traversal and direction classification β #480 by @Sameer6305
- DeepSeek provider switched to OpenAI SDK client β #482 by @lingli
- Distance intelligence visibility and slash-safe API calls β #513 / #515 by @ZohaibHassan16
- Explorer graph motion and live layout restore β #486 by @KaifAhmad1
- Polynomial ReDoS regex removed from format detection β #521 by @KaifAhmad1
Security π
- Patched 12 vulnerabilities (CRITICAL β LOW) across path traversal, injection, and unsafe deserialization β #452 by @KaifAhmad1
- Updated
mkdocsβ₯1.6.1 β #507 - Updated
mkdocs-mermaid2-pluginβ #508 - Updated
mkdocs-jupyterβ #509 - Updated
mkdocs-materialβ #511 - Updated
pymdown-extensionsβ #510
Dependencies
- Docker base image:
python 3.12-slimβ3.14-slimβ #466 - Docker base image:
node 20-alpineβ25-alpineβ #465 pytest-benchmarkβ₯5.2.3 β #522
Contributors
- @KaifAhmad1 β Distance intelligence, Ontology Hub, KnowledgeGraph dataclass, TripletStore, Blazegraph, MCP server, security, Windows fixes, Explorer welcome screen, OWL exporter, OpenClaw, MCP packaging, Datalog cookbook
- @ZohaibHassan16 β Explorer UI overhaul, grouped view, graph declutter, indexed search, landing page, lazy ingest, circular import fix
- @Sameer6305 β Provenance upstream traversal fix, local graph interaction
- @lingli β DeepSeek provider fix
- @Luffy2208 β Parquet ingest support
- @serge β Plugin local install fix
- @dependabot β Automated dependency and security updates
Breaking Changes
None.
v0.4.0
Semantica v0.4.0 β Release Notes
Released: 2026-04-08
PyPI: pip install semantica==0.4.0
Tag: v0.4.0
Full Changelog: CHANGELOG.md
v0.4.0 is the largest feature release to date. It ships a complete bi-temporal intelligence stack, a production-ready Knowledge Explorer API, first-class SHACL validation, SKOS vocabulary management, ontology alignment & diff, an Agno agentic framework integration, a Datalog reasoning engine, and a broad sweep of reliability, performance, and security fixes.
Test suite: 886 passed Β· 9 skipped Β· 0 failed
What's New
Temporal Intelligence
A full bi-temporal model is now baked into the core. Every entity, relationship, decision, and provenance record can carry valid time (when a fact was true in the world) and transaction time (when it was recorded in the system).
Core Temporal Data Model (PR #396)
- New
semantica.kg.temporal_modelwith shared parsing, normalization, and serialization helpers used across all temporal APIs TemporalBoundandBiTemporalFactexported fromsemantica.kgvalid,transaction, andbothtime-axis filtering in all temporal queriesTemporalValidationErrorraised consistently on invalid inputs β no silent coercions- History-preserving revisions in
TemporalVersionManager.apply_revision()with supersession semantics
Temporal Query Engine: Point-in-Time Correctness (PR #397)
TemporalGraphQuery.reconstruct_at_time(graph, at_time)β builds a consistent point-in-time subgraph without mutating the sourcequery_at_time()uses reconstruction internally so returned subgraphs never contain dangling edgesTemporalConsistencyReportβ detects inverted intervals, relationships outside entity lifetimes, missing endpoints, overlapping same-type relationships, and temporal gapsvalidate_temporal_consistency(graph)available as a top-level module function- Sequence and cycle pattern detection with
pattern_type,signature,frequency, and per-occurrence detail - Calendar-aligned temporal evolution bucketing via
temporal_granularity - Causal ordering controls on
find_temporal_paths()βenforce_causal_ordering,ordering_strategy(strict,overlap,loose)
Deterministic Temporal Reasoning Engine (PR #398)
- New
semantica.kg.temporal_reasoningβ zero LLM calls, pure deterministic reasoning - Full Allen interval algebra via
IntervalRelationβ all 13 relations (before,meets,overlaps,starts,during,finishes,equals, and inverses) TemporalReasoningEnginewith helpers for interval merging, gap analysis, coverage calculation, timelines, and retroactive coverage- Circular import risk between
semantica.reasoningandsemantica.kgeliminated;semantica.reasoningaccess preserved via re-exports
Temporal Awareness in Context Graph (PR #399)
Decisiondataclass carriesvalid_from/valid_untilvalidity windows β superseded decisions remain in the graph (immutable history)find_precedents_by_scenario(include_superseded=False, as_of=None)β defaults exclude expired decisions;as_ofenables point-in-time queriesContextGraph.state_at(timestamp)β serializable point-in-time snapshot; source graph never mutatedCausalChainAnalyzer.trace_at_time(event_id, at_time)β reconstructs causal chain using only edges recorded up toat_timeAgentContext.checkpoint(label),diff_checkpoints(label1, label2),flush_checkpoint(label)β named in-memory snapshots with structured diffs
Temporal Metadata Extraction from Text (PR #400)
extract_relations_llm(extract_temporal_bounds=True)β each returnedRelationgainsvalid_from,valid_until,temporal_confidence(0.0β1.0), andtemporal_source_text; defaultFalseis 100% backward-compatible- Calibrated confidence anchors baked into the prompt:
1.00= full ISO date β0.00= no temporal signal - New
TemporalNormalizerβ zero LLM calls, pure regex +dateutil:normalize(value)β(valid_from, valid_until)UTC datetime tuple orNonenormalize_phrase(phrase)β domain metadata dict orNone- 13-domain default phrase map covering General/Policy, Healthcare, Cybersecurity, Supply Chain, Finance, and Energy
- Ambiguous
DD/MM/YYYYinputs issueTemporalAmbiguityWarningβ never silently guesses locale - User-supplied
phrase_mapmerged over defaults at construction
Temporal Provenance & Export (PR #401)
ProvenanceTracker.track_entity()auto-stampsrecorded_aton every new recordquery_recorded_between(start, end)β returns all provenance records within an inclusive time rangerevision_history(fact_id)β complete revision chain ordered byrecorded_atascendingexport_audit_log(fact_ids, format)β"json"(pretty-printed) or"csv"(with header row)RDFExporter.export_to_rdf(include_temporal=True, time_axis="valid"|"transaction"|"both")β emits OWL-Time triples for all temporally-annotated relationshipscreate_snapshot()stamps"format_version": "1.0";validate_snapshot()andmigrate_snapshot()for stable snapshot lifecycle management
Temporal GraphRAG Integration (PR #402)
TemporalGraphRetrieverβ drop-in wrapper for anyContextRetriever; filters retrieved entities and relationships to a point in time;at_time=Noneis a true passthroughContextRetriever.query_with_reasoning(at_time=..., header_template=...)β structured temporal header prepended to LLM context; format-string injection guard viastr.replaceTemporalQueryRewriterβ extractstemporal_intent,at_time,start_time,end_time, andrewritten_queryfrom natural language; regex-only by default, optional LLM-assisted mode
Ontology & Knowledge Representation
SHACL Shape Generation & Validation (PR #318)
SHACLGeneratorderives SHACL node and property shapes from any Semantica ontology dict β zero hand-authoring required- Three quality tiers:
"basic"(structure + cardinality),"standard"(addssh:in,sh:pattern, inheritance),"strict"(addssh:closed true+sh:ignoredProperties) - Output formats: Turtle, JSON-LD, N-Triples; iterative multi-level inheritance propagation, cycle-safe
OntologyEngine.to_shacl(),export_shacl(), andvalidate_graph(explain=True)β plain-English explanations for all 7 SHACL constraint typesSHACLValidationReportwithconforms,violations,warnings,summary(),explain_violations(),to_dict()- Install:
pip install semantica[shacl]
SKOS Vocabulary Module (PR #319)
TripletStore.add_skos_concept()β assembles and stores all required SKOS triples automatically via existingadd_triplets()APITripletStore.get_skos_concepts(scheme_uri=None)β SPARQL-backed retrieval with multi-valuealtLabel/broader/narrowercollapsingOntologyEngine.list_vocabularies(),list_concepts(scheme_uri),search_concepts(query, scheme_uri=None)β injection-safe SPARQL throughoutNamespaceManager.get_skos_uri(local_name)andbuild_concept_scheme_uri(name)namespace helpers
Ontology Alignment API (PR #361)
OntologyEngine.create_alignment(source_uri, target_uri, predicate)β stores triples using standard OWL/SKOS predicates (owl:equivalentClass,skos:exactMatch,skos:relatedMatch, etc.)get_alignments(entity_uri)β bidirectional retrieval of all alignments for an entityReuseManager.suggest_alignments(target, source)β O(N+M) hashmap heuristic over exact label matchesQueryEngine.expand_entity_uri(uri, store, use_alignments=True)β SPARQL expansion to automatically include aligned equivalents in queries- SPARQL injection hardened in
list_alignmentsandbuild_values_clause
Ontology Diff & Migration (PR #367)
VersionManager.diff_ontologies(base, target)β structured diff covering classes, properties, individuals, and axiomsChangeLogAnalyzer.analyze(diff)β classifies impact:CRITICAL/BREAKING,HIGH/BREAKING,MEDIUM/POTENTIALLY_BREAKING,INFO/NON_BREAKINGImpactReportandgenerate_change_report(diff)β structured output withsummary,impact_classification, andrecommendationsOntologyEngine.compare_versions(base_id, target_id, run_validation=True, graph_data=...)β end-to-end orchestrator with optional validation and graph-instance checks
Knowledge Explorer
A full FastAPI backend for the Semantica dashboard. Install with pip install semantica[explorer] and launch via semantica-explorer --graph my_graph.json.
Graph API (PR #384)
GET /api/graph/nodes|edges|statsβ type/keyword filter, skip/limit paginationGET /api/graph/node/{id}/neighborsβ BFS traversal, configurable depth 1β5GET /api/graph/node/{id}/pathβ BFS or Dijkstra, dispatched viaalgorithmparamPOST /api/graph/searchβ full-text search across node content and metadata
Analytics, Decisions & Temporal (PR #384)
GET /api/analyticsβ centrality, community detection, connectivity (comma-separatedmetricsparam)GET /api/decisions/{id}/chain|precedents|complianceβ causal chain BFS, ranked precedent retrieval, in-graph compliance edge scanGET /api/temporal/snapshot|diff|patternsβ point-in-time snapshots, node-set diffs between timestamps, pattern detection
Enrichment & Export (PR #384)
POST /api/enrich/extract|links|dedup|reasonβ NLP extraction, link prediction, deduplication, forward/backward inferencePOST /api/exportβ 12 formats: JSON, Turtle, RDF-XML, N-Triples, CSV, GraphML, GEXF, OWL, Cypher, AQL, YAML; temp files always cleaned viatry/finallyPOST /api/importβ JSON/JSON-LD multipart upload with WebSocket real-time progress events
SKOS Vocabulary REST API (PR #426)
- `GE...
v0.3.0
π§ Semantica v0.3.0 β First Stable Release
Released: 2026-03-10 Β |Β PyPI: pip install semantica Β |Β Python: 3.8 β 3.12 Β |Β License: MIT
The first
Production/Stablerelease of Semantica β an open-source framework for building context graphs and decision intelligence layers for AI agents. This release consolidates everything shipped across three stages: 0.3.0-alpha (2026-02-19), 0.3.0-beta (2026-03-07), and 0.3.0 stable (2026-03-10).
pip install --upgrade semanticaNo breaking changes. All new parameters carry safe defaults. All new methods are purely additive.
π¦ Release Highlights
- π Temporal Validity β
valid_from/valid_untilon nodes & edges; query what's active at any point in time - π Cross-Graph Navigation β link separate
ContextGraphinstances; navigate across them; survives save/load - βοΈ Weighted BFS Traversal β filter multi-hop queries by edge confidence with
min_weight - π§ Decision Intelligence β full lifecycle: record β causal chain β impact analysis β precedent search β policy enforcement
- π Delta Processing β SPARQL-based incremental graph diffs; only changed data flows through the pipeline
- ποΈ Deduplication v2 β 6.98x faster semantic dedup, 63.6% faster candidate generation
- π€ New Export Formats β ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks ready)
- ποΈ Graph Backends β Apache AGE, PgVector, AWS Neptune, FalkorDB
- β 886+ tests passing β 0 failures
π₯ Contributors
| Contributor | Areas |
|---|---|
| @KaifAhmad1 | Lead maintainer β context graph, decision intelligence, KG algorithms, semantic extraction, pipeline, provenance, bug fixes, release management |
| @ZohaibHassan16 | Deduplication v2 suite, incremental/delta processing, benchmark suite |
| @Sameer6305 | Apache AGE backend, PgVector store, Snowflake connector, Apache Arrow export |
| @tibisabau | ArangoDB AQL export, Apache Parquet export |
| @d4ndr4d3 | ResourceScheduler deadlock fix |
β¨ v0.3.0 Stable β Context Graph Feature Completeness
Shipped 2026-03-10 Β· All changes by @KaifAhmad1
π Temporal Validity Windows
Nodes and edges now carry first-class valid_from / valid_until ISO datetime fields β stored directly on the ContextNode and ContextEdge dataclasses, not buried in metadata.
New API:
add_node(valid_from=..., valid_until=...)andadd_edge(valid_from=..., valid_until=...)β set validity window at creationnode.is_active(at_time=None)andedge.is_active(at_time=None)β returnsTrueif live at the given time (defaults to now)graph.find_active_nodes(node_type=None, at_time=None)β filters entire graph to active nodes only
Bug fixes:
is_active()crashed withTypeErroron tz-awaredatetimeinputs β fixed by normalising to tz-naive UTC via new_parse_iso_dt()helper- Validity fields silently lost during serialisation β fixed across all four paths:
add_nodes(),add_edges(),to_dict(),from_dict()
π Cross-Graph Navigation
Separate ContextGraph instances can now be linked and navigated between. Links are fully durable β they survive save_to_file() / load_from_file() and reconnect via a registry.
New API:
graph.graph_idβ stable UUID assigned at init; persisted to JSONlink_graph(other_graph, source_node_id, target_node_id)β creates a navigable bridge; returnslink_idnavigate_to(link_id)β returns(other_graph, target_node_id)resolve_links({graph_id: instance})β reconnects links after load; returns count resolvedsave_to_file()β now writes alinkssection alongside nodes and edgesload_from_file()β restoresgraph_idand populates_unresolved_links
Bug fix: Previous implementation auto-created marker targets as phantom "entity" nodes β fixed by pre-creating a "cross_graph_link" typed ContextNode before inserting the marker edge.
14 new tests in tests/context/test_cross_graph_navigation.py covering link creation, phantom-node prevention, partial registry resolution, and full save/load round-trips.
βοΈ Weighted Multi-Hop BFS Traversal
get_neighbors() now accepts a min_weight threshold to confine traversal to high-confidence causal links only. Default 0.0 passes all edges β fully backward-compatible.
π§ Additional Fixes in v0.3.0 Stable
PipelineBuilder.add_step()return type annotation corrected from"PipelineBuilder"to"PipelineStep"test_hybrid_search_performancefixed to accumulate a truesearch_timeslist; threshold relaxed to< 5.0sfor realsentence-transformerslatency
π§ v0.3.0-beta β Semantic Extraction, Deduplication v2, New Export Formats
Shipped 2026-03-07
π§© Semantic Extraction Fixes β @KaifAhmad1 (PR #354, #355)
LLM Relation Extraction:
- Unmatched subjects/objects now produce a synthetic
UNKNOWNentity instead of silently dropping the relation - Orphaned legacy block in
_parse_relation_resultthat appended every relation twice has been removed extraction_methodparameter added β typed extraction paths now record"llm_typed"instead of"llm"
Reasoner Pattern Matching:
_match_patterninreasoner.pyfully rewritten β splits patterns on?varplaceholders, escapes only literal segments, uses backreferences for repeated variables and non-greedy.+?to prevent over-consumption
RDF Export Aliases:
RDFExporternow accepts"ttl","nt","xml","rdf", and"json-ld"as format aliases β zero API changes
Tests added: tests/reasoning/test_reasoner.py (4 tests), tests/semantic_extract/test_relation_extractor.py (6 tests), tests/export/test_rdf_exporter.py (8 tests)
π Incremental / Delta Processing β @ZohaibHassan16, @KaifAhmad1 (PR #349)
- Native SPARQL-based diff between graph snapshots β only changed triples enter the pipeline
delta_modeflag inPipelineBuilderfor near-real-time incremental workloads- Version snapshot management with graph URI tracking and per-snapshot metadata storage
prune_versions()for automatic retention cleanup of old snapshots
Bug fixes: corrected SPARQL variable order, fixed class references, resolved duplicate dictionary keys.
ποΈ Deduplication v2 Suite β @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)
Three independently opt-in tiers β legacy mode remains the default, fully backward-compatible.
Candidate Generation v2 (PR #338):
- New
blocking_v2andhybrid_v2strategies replace O(NΒ²) pair enumeration - Multi-key blocking with normalised token prefixes, type-aware keys, and optional phonetic (Soundex) matching
- Deterministic
max_candidates_per_entitybudgeting with stable sorting - 63.6% faster in worst-case scenarios (0.259s β 0.094s for 100 entities)
Two-Stage Scoring Prefilter (PR #339):
- Fast gates for type mismatch, name-length ratio, and token overlap eliminate expensive semantic scoring for obvious non-matches
- Configurable thresholds:
min_length_ratio,min_token_overlap_ratio,required_shared_token - 18β25% faster batch processing when enabled (
prefilter_enabled=Falseby default)
Semantic Relationship Deduplication v2 (PR #340):
- Canonicalisation engine with predicate synonym mapping (e.g.
works_forβemployed_by) - O(1) hash matching for exact canonical signatures before any semantic scoring
- Weighted scoring: 60% predicate + 40% object with explainable
semantic_match_scorein metadata - 6.98x faster than legacy mode (83ms vs 579ms)
dedup_triplets()infinite recursion bug fixed; promoted to first-class API inmethods.py
Migration guide: MIGRATION_V2.md with complete examples for all v2 strategies (PR #344)
π€ New Export Formats β @tibisabau (PR #342, #343)
ArangoDB AQL Export (PR #342):
- Full AQL
INSERTstatement generation for vertices and edges - Configurable collection names with validation and sanitisation; batch processing (default: 1000)
export_arango()convenience function;.aqlauto-detection in the unified exporter- 17 tests β 100% pass rate
Apache Parquet Export (PR #343):
- Columnar storage with configurable compression: snappy, gzip, brotli, zstd, lz4, none
- Explicit Apache Arrow schemas with type safety and field normalisation
- Analytics-ready: pandas, Spark, Snowflake, BigQuery, Databricks
export_parquet()convenience function;.parquetauto-detection- 25 tests β 100% pass rate
π Beta Bug Fixes β @KaifAhmad1
Context module:
retrieve_decision_precedentsβ entity extraction correctly gated onuse_hybrid_search=True_extract_entities_from_queryβ switched toword[0].isupper()to capture camelCase identifiers likeCreditCard- Added missing
expand_context()(BFS traversal) and_get_decision_query()methods - Fixed
hybrid_retrieval,dynamic_context_traversal,multi_hop_context_assemblyfor correct single-pass BFS - Fixed
_retrieve_from_vectorfallback to prevent empty content and negative similarity scores
KG module:
calculate_pagerankβ addedalpha/max_iteraliases; return format structured to{"centrality": scores, "rankings": sorted_list}community_detector._to_networkxβ fixed silent edge-loss when a NetworkX graph is passed directly- Added 9 domain-specific tracking methods to
AlgorithmTrackerWithProvenance - Created
provenance_tracker.pywithProvenanceTracker; correctly exported fromsemantica.kg
Pipeline module:
- Retry loop fixed β now correctly iterate...
v0.3.0-beta
Semantica v0.3.0-beta β Release Notes
Date: 2026-03-07 | Tag: v0.3.0-beta | Status: Internal Beta (Pre-release)
Consolidates all alpha and unreleased features for internal validation ahead of the public 0.3.0 launch.
What's New
Semantic Extraction & Reasoning
- Multi-Founder LLM Extraction Fix (#354) β Unmatched relation subjects/objects now produce synthetic
UNKNOWNentities instead of being silently dropped; all LLM-returned co-founders preserved - Reasoner Pattern Matching Rewrite (#354) β
_match_patterncorrectly handles multi-word values, pre-bound variables, repeated variable backreferences, and non-greedy separators
Export
- RDF / TTL Alias Fix (#355) β
format="ttl","nt","xml","rdf","json-ld"all resolve without breaking existing callers - ArangoDB AQL Export (#342) β Full AQL INSERT generation for vertices and edges; configurable batching; 17 tests passing
- Apache Parquet Export (#343) β Columnar storage with configurable compression (snappy, gzip, brotli, zstd, lz4); explicit Arrow schemas; 25 tests passing
Deduplication v2 (Epic #333)
- Candidate Generation v2 (#338) β
blocking_v2/hybrid_v2strategies with multi-key and phonetic blocking; 63.6% faster worst-case - Two-Stage Scoring Prefilter (#339) β Fast prefilter gates before expensive semantic scoring; 18β25% faster batch processing
- Semantic Deduplication v2 (#340) β Opt-in
semantic_v2with canonicalization, O(1) hash matching, weighted scoring; 6.98x speedup; fixed infinite recursion bug - Migration Guide (#344) β
MIGRATION_V2.mdwith full examples; 5.86x speedup confirmed; backward compatible
Incremental / Delta Processing
- Delta Processing (#349) β Native SPARQL delta computation between graph snapshots;
delta_modepipeline config;prune_versions()for snapshot retention; production-ready for near real-time pipelines
Bug Fixes
NameErrorβ missingTypeimport inutils/helpers.py; removed unused import fromconfig_manager.py- Context module β fixed
retrieve_decision_precedents,hybrid_retrieval,dynamic_context_traversal,multi_hop_context_assembly,_retrieve_from_vector,_extract_entities_from_query; added missingexpand_contextand_get_decision_querymethods - Knowledge Graph module β fixed
calculate_pagerank,community_detector._to_networkx,detect_communities,_build_adjacency; addedProvenanceTrackerand 9 domain-specific tracking methods - Pipeline module β fixed retry loop in
execution_engine; addedRecoveryActionwith LINEAR / EXPONENTIAL / FIXED backoff; fixedadd_stepreturn value; addedvalidatealias - Test files β replaced emoji with ASCII for Windows cp1252 compatibility; fixed assertion ordering and loop bugs across 4 test files
Test Results
| Passing | Skipped (external services) | Failed |
|---|---|---|
| ~840 | 36 | 0 |
Contributors
v0.3.0-alpha
π Semantica v0.3.0-alpha Release
This alpha release introduces comprehensive decision tracking capabilities, advanced knowledge graph algorithms, and production-ready architecture for testing.
π Major Features
Decision Tracking System
- Complete decision lifecycle management with audit trails
- Provenance tracking and lineage management
- Policy compliance and exception handling
- Decision influence analysis and impact scoring
Advanced Knowledge Graph Algorithms
- Node2Vec embeddings for semantic similarity
- Centrality analysis (degree, betweenness, closeness, eigenvector)
- Community detection and graph analytics
- Path finding and link prediction
Enhanced Context Module
- Unified AgentContext with granular feature flags
- Decision tracking integration
- Production-ready architecture with validation
- GraphStore capability validation
Vector Store Features
- Hybrid search combining semantic, structural, and category similarity
- Advanced retrieval with configurable weights
- FastEmbed integration for efficient operations
π§ͺ Testing & Quality
- 113+ tests passing across context and core modules
- Comprehensive decision tracking test coverage
- Enhanced error handling and edge case testing
- Fixed all critical test failures for release readiness
π¦ Installation
pip install semantica==0.3.0a0Semantica 0.2.7
Overview
Release 0.2.7 adds Snowflake integration, Apache Arrow export, and benchmark suite.
π New Features
Snowflake Connector for Data Ingestion
PR #276 by @Sameer6305
Native Snowflake connector with multi-authentication support (password, OAuth, key-pair, SSO). Includes table/query ingestion, schema introspection, and SQL injection prevention.
Tests: 24/24 passing
Dependency: db-snowflake optional
Apache Arrow Export Support
PR #273 by @Sameer6305
High-performance columnar export with explicit schemas, compression, and Pandas/DuckDB compatibility.
Tests: 20/20 passing
Dependency: db-arrow optional
Comprehensive Benchmark Suite
PR #289 by @ZohaibHassan16, @KaifAhmad1
137+ benchmarks across all modules with regression detection and CI/CD integration.
Features: Statistical analysis, environment-agnostic design, CLI tool
π Quality Assurance
- Total Tests: 44/44 passing
- Breaking Changes: None
- Backward Compatible: Yes
π Installation
pip install semantica==0.2.7
pip install semantica[db-snowflake,db-arrow]==0.2.7π Contributors
- @Sameer6305: Snowflake Connector, Arrow Export
- @ZohaibHassan16: Benchmark Suite implementation
- @KaifAhmad1: Benchmark enhancements, CI/CD integration
π Links
- GitHub: https://github.com/Hawksight-AI/semantica
- PyPI: https://pypi.org/project/semantica/
- Benchmarks:
python benchmarks/benchmark_runner.py
π Performance
- Text Processing: >10,000 ops/sec
- Arrow Export: 10x faster
- Benchmark Coverage: 137+ tests
Thanks to all contributors for making this release possible!
Semantica v0.2.6
Semantica v0.2.6
Release Date: February 3, 2026
We're excited to announce Semantica v0.2.6, featuring major enhancements in provenance tracking, change management, and several important bug fixes!
π Highlights
Major Features
- W3C PROV-O Compliant Provenance Tracking - Enterprise-grade lineage tracking across all 17 modules
- Enhanced Change Management - Version control for knowledge graphs and ontologies
- CSV Ingestion Improvements - Auto-detection and robust error handling
- Comprehensive Test Coverage - 80-86% coverage for ingestion modules
Bug Fixes
- Temperature compatibility for LLM providers
- JenaStore empty graph initialization
β¨ New Features & Enhancements
W3C PROV-O Compliant Provenance Tracking
PRs: #254, #246 | Contributor: @KaifAhmad1
A comprehensive provenance tracking system with W3C PROV-O compliance across all 17 Semantica modules.
Core Module:
ProvenanceManagerfor centralized tracking- W3C PROV-O schemas (Activity, Entity, Agent)
- Storage backends: InMemory and SQLite
- SHA-256 integrity verification
Module Integrations:
- Semantic Extract, LLMs (Groq, OpenAI, HuggingFace, LiteLLM)
- Pipeline, Context, Ingest, Embeddings
- Graph/Vector/Triplet stores
- Reasoning, Conflicts, Deduplication
- Export, Parse, Normalize, Ontology, Visualization
Features:
- Complete lineage tracking: Document β Chunk β Entity β Relationship β Graph
- LLM tracking: tokens, costs, latency
- Source tracking and bridge axioms for domain transformations
Compliance:
- W3C PROV-O, FDA 21 CFR Part 11, SOX, HIPAA, TNFD
Testing:
- 237 tests covering core functionality, all 17 module integrations, edge cases, backward compatibility
Design:
- Opt-in with
provenance=Falseby default - Zero breaking changes
- No new dependencies
Enhanced Change Management Module
PRs: #248, #243 | Contributor: @KaifAhmad1
Enterprise-grade version control for knowledge graphs and ontologies with persistent storage and audit trails.
Core Classes:
TemporalVersionManager- Knowledge graph versioningOntologyVersionManager- Ontology versioningChangeLogEntry- Change metadata tracking
Storage:
- SQLite (persistent) and in-memory backends
- Thread-safe operations
Features:
- SHA-256 checksums for integrity
- Detailed entity/relationship diffs
- Structural ontology comparison
- Email validation
Compliance:
- HIPAA, SOX, FDA 21 CFR Part 11
- Immutable audit trails
Testing:
- 104 tests (100% pass)
- Unit, integration, compliance, performance, edge cases
Performance:
- 17.6ms for 10k entities
- 510+ ops/sec concurrent
- Handles 5k+ entity graphs
Migration:
- Backward compatible
- Simplified class names
- Zero external dependencies
CSV Ingestion Enhancements
PR: #244 | Contributor: @saloni0318
Robust CSV parsing with auto-detection and error handling.
Features:
- Auto-detect CSV encoding using
chardet - Auto-detect delimiter using
csv.Sniffer - Tolerant decoding and malformed-row handling (
on_bad_lines='warn') - Optional chunked reading for large files
- Metadata tracks detected values
Testing:
- Expanded unit tests covering:
- Multiple delimiters
- Quoted/multiline fields
- Header overrides
- Chunked reading
- NaN preservation
Comprehensive Test Coverage
TextNormalizer Tests
PR: #242 | Contributor: @ZohaibHassan16
Added focused test coverage for TextNormalizer behavior across various inputs.
Integration Test Improvements
PR: #241 | Contributor: @KaifAhmad1
- Introduced integration test marker
- Reduced noisy warnings in ingest tests
Ingest Unit Tests
PRs: #239, #232 | Contributor: @Mohammed2372
Comprehensive unit tests for ingestion modules (file, web, and feed ingestors).
Coverage:
- File scanning: local/cloud (S3/GCS/Azure)
- Web ingestion: URL/sitemap/robots.txt
- RSS/Atom feed parsing
Testing:
- 998 lines of test code
- Mocked external dependencies for fast, isolated execution
Results:
file_ingestor: 86% coverageweb_ingestor: 86% coveragefeed_ingestor: 80% coverage
Covers happy paths, edge cases, and error handling.
π Bug Fixes
Temperature Compatibility Fix
PRs: #256, #252 | Contributors: @F0rt1s, @IGES-Institut
Fixed hardcoded temperature=0.3 that broke compatibility with models requiring specific temperature values (e.g., gpt-5-mini).
Changes:
- Added
_add_if_sethelper method toBaseProvider - Only passes parameters when explicitly set
- When
temperature=None, parameter is omitted allowing APIs to use model defaults - Updated all 5 providers: OpenAI, Groq, Gemini, Ollama, DeepSeek
Impact:
- Reduced code by ~85 lines with cleaner parameter handling
- Comprehensive test coverage added (10 temperature tests, all passing)
- Backward compatible - no breaking changes
JenaStore Empty Graph Bug
PRs: #257, #258 | Contributor: @ZohaibHassan16
Fixed ProcessingError: Graph not initialized when operating on empty (but initialized) graphs.
Changes:
- Replaced implicit
if not self.graph:checks with explicitif self.graph is None:validation - Updated 5 methods:
add_triplets,get_triplets,delete_triplet,execute_sparql,serialize - Properly distinguishes
None(uninitialized) from empty graphs (initialized with 0 triplets)
Impact:
- Unblocks benchmarking suite
- Enables fresh deployments
- Improves testing workflows
π¦ Installation
pip install semantica==0.2.6Or upgrade from a previous version:
pip install --upgrade semanticaπ Contributors
Special thanks to all contributors who made this release possible:
- @KaifAhmad1 - Provenance tracking, change management, test improvements
- @saloni0318 - CSV ingestion enhancements
- @ZohaibHassan16 - TextNormalizer tests, JenaStore bug fix
- @Mohammed2372 - Comprehensive ingest unit tests
- @F0rt1s - Temperature compatibility fix
- @IGES-Institut - Temperature compatibility fix
π Documentation
- Documentation: https://semantica.readthedocs.io
- GitHub: https://github.com/Hawksight-AI/semantica
- PyPI: https://pypi.org/project/semantica/
π Links
π What's Next?
Stay tuned for upcoming features in future releases. Check our GitHub Issues to see what we're working on!
Full Changelog: v0.2.5...v0.2.6
Deep Extraction, BYOM & Pinecone Support (v0.2.5)
Semantica v0.2.5
π Release Highlights
This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.
π New Features
Pinecone Vector Store Support
- Implemented native
PineconeStorewith full CRUD capabilities. - Support for serverless and pod-based indexes, namespaces, and metadata filtering.
- Fully integrated with the unified
VectorStoreinterface and registry. - (Closes #219, Resolves #220)
Configurable LLM Retry Logic
- Exposed
max_retriesparameter inNERExtractor,RelationExtractor, andTripletExtractor. - Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
- Propagated retry configuration through chunked processing helpers for consistent long-document handling.
Bring Your Own Model (BYOM) Support
- Custom Hugging Face Models: Enabled full support for custom models in
NERExtractor,RelationExtractor, andTripletExtractor. - Custom Tokenizers: Added support for models with non-standard tokenization requirements.
- Runtime Overrides:
extract(model=...)now correctly overrides configuration defaults.
Enhanced Extraction Capabilities
- NER: Added configurable aggregation strategies (
simple,first,average,max) and robust IOB/BILOU parsing. - Relation Extraction: Implemented standard entity marker techniques (
<subj>,<obj>) and structured output parsing. - Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.
π Bug Fixes
- LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing
max_retrieslimits. - Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
- Import Handling: Fixed circular import issues in test suites via improved mocking strategies.
π¦ Installation
pip install semantica==0.2.5Semantica v0.2.4
Added
- Ontology Ingestion Module:
- Implemented
OntologyIngestorfor parsing RDF/OWL files (Turtle, RDF/XML, JSON-LD, N3). - Added
ingest_ontologyand unifiedingest(source_type="ontology")interface. - Added recursive directory scanning for batch ontology ingestion.
- Added
OntologyDatadataclass for consistent metadata.
- Implemented
- Documentation:
- Updated
ontology_usage.mdandontology.mdwith usage examples and API details.
- Updated
- Tests:
- Added comprehensive test suite
tests/ingest/test_ontology_ingestor.py. - Added
examples/demo_ontology_ingest.pyfor end-to-end demonstration.
- Added comprehensive test suite
Semantica v0.2.3
We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.
π Added
Vector Store High-Performance Ingestion
- New
add_documentsAPI: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing. embed_batchHelper: Efficiently generate embeddings for lists of texts without immediate storage.- Parallel Defaults: Enabled default parallel ingestion in
VectorStore(default:max_workers=6) for faster processing. - Documentation: Added dedicated guide
docs/vector_store_usage.mdfor high-performance configuration. - Tests: Added
tests/vector_store/test_vector_store_parallel.pycovering parallel vs. sequential performance and edge cases.
Amazon Neptune Dev Environment
- CloudFormation Template: Added
cookbook/introduction/neptune-setup.yamlto provision a development Neptune cluster with public endpoints and IAM auth. - Documentation: Updated
cookbook/introduction/21_Amazon_Neptune_Store.ipynbwith deployment guides, cost estimates, and IAM best practices. - Linting: Added
cfn-lintto pre-commit hooks for CloudFormation validation.
Comprehensive Test Suite
- Unit Tests: Added
tests/test_relations_llm.pycovering typed and structured response paths for relation extraction. - Integration Tests: Added
tests/integration/test_relations_groq.pyfor real Groq API validation.
π Fixed
LLM Relation Extraction Parsing
- Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
- Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
- JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
- Parameter Cleanup: Removed unsupported kwargs (
max_tokens,max_entities_prompt) from internal calls to prevent API errors.
Pipeline Circular Import
- Resolved Import Cycles: Fixed circular dependency between
pipeline_builderandpipeline_validator(Issues #192, #193). - Lazy Loading: Implemented lazy loading for
PipelineValidatorto ensure stable imports.
JupyterLab Stability
- Progress Output Control: Added
SEMANTICA_DISABLE_JUPYTER_PROGRESSenvironment variable. - Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).
β‘ Changed
Relation Extraction API
- Simplified Interface: Removed unused kwargs to prevent parameter leakage.
- Better Debugging: Improved error handling and verbose logging for extraction workflows.
- Robust Parsing: Enhanced post-response parsing stability across different LLM providers.
Vector Store Defaults
- Standardized Concurrency: Set default
max_workers=6forVectorStoreparallel ingestion. - Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.