Skip to content

Releases: semantica-agi/semantica

v0.5.0

11 May 15:26

Choose a tag to compare

Semantica 0.5.0 β€” Distance Intelligence & Ontology Hub πŸš€

Released: 2026-05-11


Highlights

  • Distance Intelligence β€” measure semantic distance between any two nodes across the graph, API, and Explorer
  • Ontology Hub β€” full workspace for browsing, loading, aligning, and validating ontologies in the Explorer
  • Parquet Ingest β€” native Parquet file ingestion support
  • KnowledgeGraph dataclass β€” first-class KnowledgeGraph type with native visualizer support
  • MCP Server β€” modular, pipx-installable MCP server package with 4 new plugin bundles

New Features

Distance Intelligence

  • Semantic distance between nodes across ContextGraph, REST API, and Knowledge Explorer β€” #502, #512 by @KaifAhmad1
  • Node distance semantics in PathResponse (step count, hop distance, relationship type) β€” #472, #477 by @KaifAhmad1
  • Bidirectional path finding via directed=false parameter β€” #469, #476 by @KaifAhmad1
  • Distance intelligence optimization and UI polish β€” #550 by @KaifAhmad1

Ontology Hub

Knowledge Explorer UI

Parquet Ingest

  • Native Parquet file ingestion (ParquetIngester) β€” #548 by @Luffy2208

KnowledgeGraph Dataclass

  • New KnowledgeGraph dataclass with native KGVisualizer support β€” #471, #474 by @KaifAhmad1
  • All visualize_* methods now accept KnowledgeGraph objects directly β€” #459 by @KaifAhmad1

MCP Server & Plugins

  • Modular MCP server package at repo root, installable via pipx β€” by @KaifAhmad1
  • New plugin bundles: Windsurf, Cline, Continue, VS Code β€” by @KaifAhmad1
  • OpenClaw integration module β€” #460 by @KaifAhmad1
  • Plugin local install fixed (hooks field, auto-load) β€” #489 by @serge

Deduplication

  • DuplicateDetector now supports max_results, top_k_per_entity, min_similarity, and sort_by β€” by @KaifAhmad1

Cookbooks

  • Datalog-style reasoning end-to-end notebook β€” #457 by @KaifAhmad1
  • Manual ontology + Snowflake mapping cookbook β€” by @KaifAhmad1

Bug Fixes πŸ›

  • Windows install failure from gpu in [all] extra β€” #538 by @KaifAhmad1
  • UnicodeEncodeError on cp1252 / Windows consoles in progress tracker β€” #537 by @KaifAhmad1
  • Circular import in semantic_extract β€” #536 by @ZohaibHassan16
  • Lazy-load optional ingest backends β€” uninstalled extras no longer raise on import β€” #535 by @ZohaibHassan16
  • ConflictDetector.detect_conflicts duplicate method definition β€” #539 by @KaifAhmad1
  • DuplicateDetector merged group key normalization β€” #540 by @ZohaibHassan16
  • MCP server package structure for pipx installation β€” #544 by @KaifAhmad1
  • OWL/Turtle exporter silent data-property omission β€” #478 / #479 by @KaifAhmad1
  • Blazegraph literal serialization and prefixed datatype expansion β€” #450 by @KaifAhmad1
  • TripletStore IRI resolution against ontology namespace base URI β€” #447 / #451 by @KaifAhmad1
  • OWL generation β€” preserve user-facing schema fields, align IRIs with namespace β€” #449 by @KaifAhmad1
  • Provenance upstream ancestor traversal and direction classification β€” #480 by @Sameer6305
  • DeepSeek provider switched to OpenAI SDK client β€” #482 by @lingli
  • Distance intelligence visibility and slash-safe API calls β€” #513 / #515 by @ZohaibHassan16
  • Explorer graph motion and live layout restore β€” #486 by @KaifAhmad1
  • Polynomial ReDoS regex removed from format detection β€” #521 by @KaifAhmad1

Security πŸ”’

  • Patched 12 vulnerabilities (CRITICAL β†’ LOW) across path traversal, injection, and unsafe deserialization β€” #452 by @KaifAhmad1
  • Updated mkdocs β‰₯1.6.1 β€” #507
  • Updated mkdocs-mermaid2-plugin β€” #508
  • Updated mkdocs-jupyter β€” #509
  • Updated mkdocs-material β€” #511
  • Updated pymdown-extensions β€” #510

Dependencies

  • Docker base image: python 3.12-slim β†’ 3.14-slim β€” #466
  • Docker base image: node 20-alpine β†’ 25-alpine β€” #465
  • pytest-benchmark β‰₯5.2.3 β€” #522

Contributors

  • @KaifAhmad1 β€” Distance intelligence, Ontology Hub, KnowledgeGraph dataclass, TripletStore, Blazegraph, MCP server, security, Windows fixes, Explorer welcome screen, OWL exporter, OpenClaw, MCP packaging, Datalog cookbook
  • @ZohaibHassan16 β€” Explorer UI overhaul, grouped view, graph declutter, indexed search, landing page, lazy ingest, circular import fix
  • @Sameer6305 β€” Provenance upstream traversal fix, local graph interaction
  • @lingli β€” DeepSeek provider fix
  • @Luffy2208 β€” Parquet ingest support
  • @serge β€” Plugin local install fix
  • @dependabot β€” Automated dependency and security updates

Breaking Changes

None.

v0.4.0

08 Apr 05:16

Choose a tag to compare

Semantica v0.4.0 β€” Release Notes

Released: 2026-04-08
PyPI: pip install semantica==0.4.0
Tag: v0.4.0
Full Changelog: CHANGELOG.md


v0.4.0 is the largest feature release to date. It ships a complete bi-temporal intelligence stack, a production-ready Knowledge Explorer API, first-class SHACL validation, SKOS vocabulary management, ontology alignment & diff, an Agno agentic framework integration, a Datalog reasoning engine, and a broad sweep of reliability, performance, and security fixes.

Test suite: 886 passed Β· 9 skipped Β· 0 failed


What's New

Temporal Intelligence

A full bi-temporal model is now baked into the core. Every entity, relationship, decision, and provenance record can carry valid time (when a fact was true in the world) and transaction time (when it was recorded in the system).

Core Temporal Data Model (PR #396)

  • New semantica.kg.temporal_model with shared parsing, normalization, and serialization helpers used across all temporal APIs
  • TemporalBound and BiTemporalFact exported from semantica.kg
  • valid, transaction, and both time-axis filtering in all temporal queries
  • TemporalValidationError raised consistently on invalid inputs β€” no silent coercions
  • History-preserving revisions in TemporalVersionManager.apply_revision() with supersession semantics

Temporal Query Engine: Point-in-Time Correctness (PR #397)

  • TemporalGraphQuery.reconstruct_at_time(graph, at_time) β€” builds a consistent point-in-time subgraph without mutating the source
  • query_at_time() uses reconstruction internally so returned subgraphs never contain dangling edges
  • TemporalConsistencyReport β€” detects inverted intervals, relationships outside entity lifetimes, missing endpoints, overlapping same-type relationships, and temporal gaps
  • validate_temporal_consistency(graph) available as a top-level module function
  • Sequence and cycle pattern detection with pattern_type, signature, frequency, and per-occurrence detail
  • Calendar-aligned temporal evolution bucketing via temporal_granularity
  • Causal ordering controls on find_temporal_paths() β€” enforce_causal_ordering, ordering_strategy (strict, overlap, loose)

Deterministic Temporal Reasoning Engine (PR #398)

  • New semantica.kg.temporal_reasoning β€” zero LLM calls, pure deterministic reasoning
  • Full Allen interval algebra via IntervalRelation β€” all 13 relations (before, meets, overlaps, starts, during, finishes, equals, and inverses)
  • TemporalReasoningEngine with helpers for interval merging, gap analysis, coverage calculation, timelines, and retroactive coverage
  • Circular import risk between semantica.reasoning and semantica.kg eliminated; semantica.reasoning access preserved via re-exports

Temporal Awareness in Context Graph (PR #399)

  • Decision dataclass carries valid_from / valid_until validity windows β€” superseded decisions remain in the graph (immutable history)
  • find_precedents_by_scenario(include_superseded=False, as_of=None) β€” defaults exclude expired decisions; as_of enables point-in-time queries
  • ContextGraph.state_at(timestamp) β€” serializable point-in-time snapshot; source graph never mutated
  • CausalChainAnalyzer.trace_at_time(event_id, at_time) β€” reconstructs causal chain using only edges recorded up to at_time
  • AgentContext.checkpoint(label), diff_checkpoints(label1, label2), flush_checkpoint(label) β€” named in-memory snapshots with structured diffs

Temporal Metadata Extraction from Text (PR #400)

  • extract_relations_llm(extract_temporal_bounds=True) β€” each returned Relation gains valid_from, valid_until, temporal_confidence (0.0–1.0), and temporal_source_text; default False is 100% backward-compatible
  • Calibrated confidence anchors baked into the prompt: 1.00 = full ISO date β†’ 0.00 = no temporal signal
  • New TemporalNormalizer β€” zero LLM calls, pure regex + dateutil:
    • normalize(value) β†’ (valid_from, valid_until) UTC datetime tuple or None
    • normalize_phrase(phrase) β†’ domain metadata dict or None
    • 13-domain default phrase map covering General/Policy, Healthcare, Cybersecurity, Supply Chain, Finance, and Energy
    • Ambiguous DD/MM/YYYY inputs issue TemporalAmbiguityWarning β€” never silently guesses locale
    • User-supplied phrase_map merged over defaults at construction

Temporal Provenance & Export (PR #401)

  • ProvenanceTracker.track_entity() auto-stamps recorded_at on every new record
  • query_recorded_between(start, end) β€” returns all provenance records within an inclusive time range
  • revision_history(fact_id) β€” complete revision chain ordered by recorded_at ascending
  • export_audit_log(fact_ids, format) β€” "json" (pretty-printed) or "csv" (with header row)
  • RDFExporter.export_to_rdf(include_temporal=True, time_axis="valid"|"transaction"|"both") β€” emits OWL-Time triples for all temporally-annotated relationships
  • create_snapshot() stamps "format_version": "1.0"; validate_snapshot() and migrate_snapshot() for stable snapshot lifecycle management

Temporal GraphRAG Integration (PR #402)

  • TemporalGraphRetriever β€” drop-in wrapper for any ContextRetriever; filters retrieved entities and relationships to a point in time; at_time=None is a true passthrough
  • ContextRetriever.query_with_reasoning(at_time=..., header_template=...) β€” structured temporal header prepended to LLM context; format-string injection guard via str.replace
  • TemporalQueryRewriter β€” extracts temporal_intent, at_time, start_time, end_time, and rewritten_query from natural language; regex-only by default, optional LLM-assisted mode

Ontology & Knowledge Representation

SHACL Shape Generation & Validation (PR #318)

  • SHACLGenerator derives SHACL node and property shapes from any Semantica ontology dict β€” zero hand-authoring required
  • Three quality tiers: "basic" (structure + cardinality), "standard" (adds sh:in, sh:pattern, inheritance), "strict" (adds sh:closed true + sh:ignoredProperties)
  • Output formats: Turtle, JSON-LD, N-Triples; iterative multi-level inheritance propagation, cycle-safe
  • OntologyEngine.to_shacl(), export_shacl(), and validate_graph(explain=True) β€” plain-English explanations for all 7 SHACL constraint types
  • SHACLValidationReport with conforms, violations, warnings, summary(), explain_violations(), to_dict()
  • Install: pip install semantica[shacl]

SKOS Vocabulary Module (PR #319)

  • TripletStore.add_skos_concept() β€” assembles and stores all required SKOS triples automatically via existing add_triplets() API
  • TripletStore.get_skos_concepts(scheme_uri=None) β€” SPARQL-backed retrieval with multi-value altLabel/broader/narrower collapsing
  • OntologyEngine.list_vocabularies(), list_concepts(scheme_uri), search_concepts(query, scheme_uri=None) β€” injection-safe SPARQL throughout
  • NamespaceManager.get_skos_uri(local_name) and build_concept_scheme_uri(name) namespace helpers

Ontology Alignment API (PR #361)

  • OntologyEngine.create_alignment(source_uri, target_uri, predicate) β€” stores triples using standard OWL/SKOS predicates (owl:equivalentClass, skos:exactMatch, skos:relatedMatch, etc.)
  • get_alignments(entity_uri) β€” bidirectional retrieval of all alignments for an entity
  • ReuseManager.suggest_alignments(target, source) β€” O(N+M) hashmap heuristic over exact label matches
  • QueryEngine.expand_entity_uri(uri, store, use_alignments=True) β€” SPARQL expansion to automatically include aligned equivalents in queries
  • SPARQL injection hardened in list_alignments and build_values_clause

Ontology Diff & Migration (PR #367)

  • VersionManager.diff_ontologies(base, target) β€” structured diff covering classes, properties, individuals, and axioms
  • ChangeLogAnalyzer.analyze(diff) β€” classifies impact: CRITICAL/BREAKING, HIGH/BREAKING, MEDIUM/POTENTIALLY_BREAKING, INFO/NON_BREAKING
  • ImpactReport and generate_change_report(diff) β€” structured output with summary, impact_classification, and recommendations
  • OntologyEngine.compare_versions(base_id, target_id, run_validation=True, graph_data=...) β€” end-to-end orchestrator with optional validation and graph-instance checks

Knowledge Explorer

A full FastAPI backend for the Semantica dashboard. Install with pip install semantica[explorer] and launch via semantica-explorer --graph my_graph.json.

Graph API (PR #384)

  • GET /api/graph/nodes|edges|stats β€” type/keyword filter, skip/limit pagination
  • GET /api/graph/node/{id}/neighbors β€” BFS traversal, configurable depth 1–5
  • GET /api/graph/node/{id}/path β€” BFS or Dijkstra, dispatched via algorithm param
  • POST /api/graph/search β€” full-text search across node content and metadata

Analytics, Decisions & Temporal (PR #384)

  • GET /api/analytics β€” centrality, community detection, connectivity (comma-separated metrics param)
  • GET /api/decisions/{id}/chain|precedents|compliance β€” causal chain BFS, ranked precedent retrieval, in-graph compliance edge scan
  • GET /api/temporal/snapshot|diff|patterns β€” point-in-time snapshots, node-set diffs between timestamps, pattern detection

Enrichment & Export (PR #384)

  • POST /api/enrich/extract|links|dedup|reason β€” NLP extraction, link prediction, deduplication, forward/backward inference
  • POST /api/export β€” 12 formats: JSON, Turtle, RDF-XML, N-Triples, CSV, GraphML, GEXF, OWL, Cypher, AQL, YAML; temp files always cleaned via try/finally
  • POST /api/import β€” JSON/JSON-LD multipart upload with WebSocket real-time progress events

SKOS Vocabulary REST API (PR #426)

  • `GE...
Read more

v0.3.0

10 Mar 22:07

Choose a tag to compare

🧠 Semantica v0.3.0 β€” First Stable Release

Released: 2026-03-10 Β |Β  PyPI: pip install semantica Β |Β  Python: 3.8 – 3.12 Β |Β  License: MIT

The first Production/Stable release of Semantica β€” an open-source framework for building context graphs and decision intelligence layers for AI agents. This release consolidates everything shipped across three stages: 0.3.0-alpha (2026-02-19), 0.3.0-beta (2026-03-07), and 0.3.0 stable (2026-03-10).

pip install --upgrade semantica

No breaking changes. All new parameters carry safe defaults. All new methods are purely additive.


🚦 Release Highlights

  • πŸ• Temporal Validity β€” valid_from/valid_until on nodes & edges; query what's active at any point in time
  • πŸ”— Cross-Graph Navigation β€” link separate ContextGraph instances; navigate across them; survives save/load
  • βš–οΈ Weighted BFS Traversal β€” filter multi-hop queries by edge confidence with min_weight
  • 🧠 Decision Intelligence β€” full lifecycle: record β†’ causal chain β†’ impact analysis β†’ precedent search β†’ policy enforcement
  • πŸ”„ Delta Processing β€” SPARQL-based incremental graph diffs; only changed data flows through the pipeline
  • πŸ—ƒοΈ Deduplication v2 β€” 6.98x faster semantic dedup, 63.6% faster candidate generation
  • πŸ“€ New Export Formats β€” ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks ready)
  • πŸ—„οΈ Graph Backends β€” Apache AGE, PgVector, AWS Neptune, FalkorDB
  • βœ… 886+ tests passing β€” 0 failures

πŸ‘₯ Contributors

Contributor Areas
@KaifAhmad1 Lead maintainer β€” context graph, decision intelligence, KG algorithms, semantic extraction, pipeline, provenance, bug fixes, release management
@ZohaibHassan16 Deduplication v2 suite, incremental/delta processing, benchmark suite
@Sameer6305 Apache AGE backend, PgVector store, Snowflake connector, Apache Arrow export
@tibisabau ArangoDB AQL export, Apache Parquet export
@d4ndr4d3 ResourceScheduler deadlock fix

✨ v0.3.0 Stable β€” Context Graph Feature Completeness

Shipped 2026-03-10 Β· All changes by @KaifAhmad1

πŸ• Temporal Validity Windows

Nodes and edges now carry first-class valid_from / valid_until ISO datetime fields β€” stored directly on the ContextNode and ContextEdge dataclasses, not buried in metadata.

New API:

  • add_node(valid_from=..., valid_until=...) and add_edge(valid_from=..., valid_until=...) β€” set validity window at creation
  • node.is_active(at_time=None) and edge.is_active(at_time=None) β€” returns True if live at the given time (defaults to now)
  • graph.find_active_nodes(node_type=None, at_time=None) β€” filters entire graph to active nodes only

Bug fixes:

  • is_active() crashed with TypeError on tz-aware datetime inputs β€” fixed by normalising to tz-naive UTC via new _parse_iso_dt() helper
  • Validity fields silently lost during serialisation β€” fixed across all four paths: add_nodes(), add_edges(), to_dict(), from_dict()

πŸ”— Cross-Graph Navigation

Separate ContextGraph instances can now be linked and navigated between. Links are fully durable β€” they survive save_to_file() / load_from_file() and reconnect via a registry.

New API:

  • graph.graph_id β€” stable UUID assigned at init; persisted to JSON
  • link_graph(other_graph, source_node_id, target_node_id) β€” creates a navigable bridge; returns link_id
  • navigate_to(link_id) β€” returns (other_graph, target_node_id)
  • resolve_links({graph_id: instance}) β€” reconnects links after load; returns count resolved
  • save_to_file() β€” now writes a links section alongside nodes and edges
  • load_from_file() β€” restores graph_id and populates _unresolved_links

Bug fix: Previous implementation auto-created marker targets as phantom "entity" nodes β€” fixed by pre-creating a "cross_graph_link" typed ContextNode before inserting the marker edge.

14 new tests in tests/context/test_cross_graph_navigation.py covering link creation, phantom-node prevention, partial registry resolution, and full save/load round-trips.


βš–οΈ Weighted Multi-Hop BFS Traversal

get_neighbors() now accepts a min_weight threshold to confine traversal to high-confidence causal links only. Default 0.0 passes all edges β€” fully backward-compatible.


πŸ”§ Additional Fixes in v0.3.0 Stable

  • PipelineBuilder.add_step() return type annotation corrected from "PipelineBuilder" to "PipelineStep"
  • test_hybrid_search_performance fixed to accumulate a true search_times list; threshold relaxed to < 5.0s for real sentence-transformers latency

πŸ”§ v0.3.0-beta β€” Semantic Extraction, Deduplication v2, New Export Formats

Shipped 2026-03-07

🧩 Semantic Extraction Fixes β€” @KaifAhmad1 (PR #354, #355)

LLM Relation Extraction:

  • Unmatched subjects/objects now produce a synthetic UNKNOWN entity instead of silently dropping the relation
  • Orphaned legacy block in _parse_relation_result that appended every relation twice has been removed
  • extraction_method parameter added β€” typed extraction paths now record "llm_typed" instead of "llm"

Reasoner Pattern Matching:

  • _match_pattern in reasoner.py fully rewritten β€” splits patterns on ?var placeholders, escapes only literal segments, uses backreferences for repeated variables and non-greedy .+? to prevent over-consumption

RDF Export Aliases:

  • RDFExporter now accepts "ttl", "nt", "xml", "rdf", and "json-ld" as format aliases β€” zero API changes

Tests added: tests/reasoning/test_reasoner.py (4 tests), tests/semantic_extract/test_relation_extractor.py (6 tests), tests/export/test_rdf_exporter.py (8 tests)


πŸ”„ Incremental / Delta Processing β€” @ZohaibHassan16, @KaifAhmad1 (PR #349)

  • Native SPARQL-based diff between graph snapshots β€” only changed triples enter the pipeline
  • delta_mode flag in PipelineBuilder for near-real-time incremental workloads
  • Version snapshot management with graph URI tracking and per-snapshot metadata storage
  • prune_versions() for automatic retention cleanup of old snapshots

Bug fixes: corrected SPARQL variable order, fixed class references, resolved duplicate dictionary keys.


πŸ—ƒοΈ Deduplication v2 Suite β€” @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

Three independently opt-in tiers β€” legacy mode remains the default, fully backward-compatible.

Candidate Generation v2 (PR #338):

  • New blocking_v2 and hybrid_v2 strategies replace O(NΒ²) pair enumeration
  • Multi-key blocking with normalised token prefixes, type-aware keys, and optional phonetic (Soundex) matching
  • Deterministic max_candidates_per_entity budgeting with stable sorting
  • 63.6% faster in worst-case scenarios (0.259s β†’ 0.094s for 100 entities)

Two-Stage Scoring Prefilter (PR #339):

  • Fast gates for type mismatch, name-length ratio, and token overlap eliminate expensive semantic scoring for obvious non-matches
  • Configurable thresholds: min_length_ratio, min_token_overlap_ratio, required_shared_token
  • 18–25% faster batch processing when enabled (prefilter_enabled=False by default)

Semantic Relationship Deduplication v2 (PR #340):

  • Canonicalisation engine with predicate synonym mapping (e.g. works_for β†’ employed_by)
  • O(1) hash matching for exact canonical signatures before any semantic scoring
  • Weighted scoring: 60% predicate + 40% object with explainable semantic_match_score in metadata
  • 6.98x faster than legacy mode (83ms vs 579ms)
  • dedup_triplets() infinite recursion bug fixed; promoted to first-class API in methods.py

Migration guide: MIGRATION_V2.md with complete examples for all v2 strategies (PR #344)


πŸ“€ New Export Formats β€” @tibisabau (PR #342, #343)

ArangoDB AQL Export (PR #342):

  • Full AQL INSERT statement generation for vertices and edges
  • Configurable collection names with validation and sanitisation; batch processing (default: 1000)
  • export_arango() convenience function; .aql auto-detection in the unified exporter
  • 17 tests β€” 100% pass rate

Apache Parquet Export (PR #343):

  • Columnar storage with configurable compression: snappy, gzip, brotli, zstd, lz4, none
  • Explicit Apache Arrow schemas with type safety and field normalisation
  • Analytics-ready: pandas, Spark, Snowflake, BigQuery, Databricks
  • export_parquet() convenience function; .parquet auto-detection
  • 25 tests β€” 100% pass rate

πŸ› Beta Bug Fixes β€” @KaifAhmad1

Context module:

  • retrieve_decision_precedents β€” entity extraction correctly gated on use_hybrid_search=True
  • _extract_entities_from_query β€” switched to word[0].isupper() to capture camelCase identifiers like CreditCard
  • Added missing expand_context() (BFS traversal) and _get_decision_query() methods
  • Fixed hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly for correct single-pass BFS
  • Fixed _retrieve_from_vector fallback to prevent empty content and negative similarity scores

KG module:

  • calculate_pagerank β€” added alpha/max_iter aliases; return format structured to {"centrality": scores, "rankings": sorted_list}
  • community_detector._to_networkx β€” fixed silent edge-loss when a NetworkX graph is passed directly
  • Added 9 domain-specific tracking methods to AlgorithmTrackerWithProvenance
  • Created provenance_tracker.py with ProvenanceTracker; correctly exported from semantica.kg

Pipeline module:

  • Retry loop fixed β€” now correctly iterate...
Read more

v0.3.0-beta

07 Mar 11:28

Choose a tag to compare

v0.3.0-beta Pre-release
Pre-release

Semantica v0.3.0-beta β€” Release Notes

Date: 2026-03-07 | Tag: v0.3.0-beta | Status: Internal Beta (Pre-release)

Consolidates all alpha and unreleased features for internal validation ahead of the public 0.3.0 launch.


What's New

Semantic Extraction & Reasoning

  • Multi-Founder LLM Extraction Fix (#354) β€” Unmatched relation subjects/objects now produce synthetic UNKNOWN entities instead of being silently dropped; all LLM-returned co-founders preserved
  • Reasoner Pattern Matching Rewrite (#354) β€” _match_pattern correctly handles multi-word values, pre-bound variables, repeated variable backreferences, and non-greedy separators

Export

  • RDF / TTL Alias Fix (#355) β€” format="ttl", "nt", "xml", "rdf", "json-ld" all resolve without breaking existing callers
  • ArangoDB AQL Export (#342) β€” Full AQL INSERT generation for vertices and edges; configurable batching; 17 tests passing
  • Apache Parquet Export (#343) β€” Columnar storage with configurable compression (snappy, gzip, brotli, zstd, lz4); explicit Arrow schemas; 25 tests passing

Deduplication v2 (Epic #333)

  • Candidate Generation v2 (#338) β€” blocking_v2 / hybrid_v2 strategies with multi-key and phonetic blocking; 63.6% faster worst-case
  • Two-Stage Scoring Prefilter (#339) β€” Fast prefilter gates before expensive semantic scoring; 18–25% faster batch processing
  • Semantic Deduplication v2 (#340) β€” Opt-in semantic_v2 with canonicalization, O(1) hash matching, weighted scoring; 6.98x speedup; fixed infinite recursion bug
  • Migration Guide (#344) β€” MIGRATION_V2.md with full examples; 5.86x speedup confirmed; backward compatible

Incremental / Delta Processing

  • Delta Processing (#349) β€” Native SPARQL delta computation between graph snapshots; delta_mode pipeline config; prune_versions() for snapshot retention; production-ready for near real-time pipelines

Bug Fixes

  • NameError β€” missing Type import in utils/helpers.py; removed unused import from config_manager.py
  • Context module β€” fixed retrieve_decision_precedents, hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly, _retrieve_from_vector, _extract_entities_from_query; added missing expand_context and _get_decision_query methods
  • Knowledge Graph module β€” fixed calculate_pagerank, community_detector._to_networkx, detect_communities, _build_adjacency; added ProvenanceTracker and 9 domain-specific tracking methods
  • Pipeline module β€” fixed retry loop in execution_engine; added RecoveryAction with LINEAR / EXPONENTIAL / FIXED backoff; fixed add_step return value; added validate alias
  • Test files β€” replaced emoji with ASCII for Windows cp1252 compatibility; fixed assertion ordering and loop bugs across 4 test files

Test Results

Passing Skipped (external services) Failed
~840 36 0

Contributors

@KaifAhmad1 Β· @ZohaibHassan16 Β· @tibisabau

v0.3.0-alpha

19 Feb 18:46

Choose a tag to compare

v0.3.0-alpha Pre-release
Pre-release

πŸŽ‰ Semantica v0.3.0-alpha Release

This alpha release introduces comprehensive decision tracking capabilities, advanced knowledge graph algorithms, and production-ready architecture for testing.

πŸš€ Major Features

Decision Tracking System

  • Complete decision lifecycle management with audit trails
  • Provenance tracking and lineage management
  • Policy compliance and exception handling
  • Decision influence analysis and impact scoring

Advanced Knowledge Graph Algorithms

  • Node2Vec embeddings for semantic similarity
  • Centrality analysis (degree, betweenness, closeness, eigenvector)
  • Community detection and graph analytics
  • Path finding and link prediction

Enhanced Context Module

  • Unified AgentContext with granular feature flags
  • Decision tracking integration
  • Production-ready architecture with validation
  • GraphStore capability validation

Vector Store Features

  • Hybrid search combining semantic, structural, and category similarity
  • Advanced retrieval with configurable weights
  • FastEmbed integration for efficient operations

πŸ§ͺ Testing & Quality

  • 113+ tests passing across context and core modules
  • Comprehensive decision tracking test coverage
  • Enhanced error handling and edge case testing
  • Fixed all critical test failures for release readiness

πŸ“¦ Installation

pip install semantica==0.3.0a0

Semantica 0.2.7

09 Feb 07:26

Choose a tag to compare

Overview

Release 0.2.7 adds Snowflake integration, Apache Arrow export, and benchmark suite.

πŸš€ New Features

Snowflake Connector for Data Ingestion

PR #276 by @Sameer6305

Native Snowflake connector with multi-authentication support (password, OAuth, key-pair, SSO). Includes table/query ingestion, schema introspection, and SQL injection prevention.

Tests: 24/24 passing
Dependency: db-snowflake optional

Apache Arrow Export Support

PR #273 by @Sameer6305

High-performance columnar export with explicit schemas, compression, and Pandas/DuckDB compatibility.

Tests: 20/20 passing
Dependency: db-arrow optional

Comprehensive Benchmark Suite

PR #289 by @ZohaibHassan16, @KaifAhmad1

137+ benchmarks across all modules with regression detection and CI/CD integration.

Features: Statistical analysis, environment-agnostic design, CLI tool

πŸ“Š Quality Assurance

  • Total Tests: 44/44 passing
  • Breaking Changes: None
  • Backward Compatible: Yes

πŸ›  Installation

pip install semantica==0.2.7
pip install semantica[db-snowflake,db-arrow]==0.2.7

πŸ™ Contributors

πŸ”— Links

πŸ“ˆ Performance

  • Text Processing: >10,000 ops/sec
  • Arrow Export: 10x faster
  • Benchmark Coverage: 137+ tests

Thanks to all contributors for making this release possible!

Semantica v0.2.6

03 Feb 05:10

Choose a tag to compare

Semantica v0.2.6

Release Date: February 3, 2026

We're excited to announce Semantica v0.2.6, featuring major enhancements in provenance tracking, change management, and several important bug fixes!


πŸŽ‰ Highlights

Major Features

  • W3C PROV-O Compliant Provenance Tracking - Enterprise-grade lineage tracking across all 17 modules
  • Enhanced Change Management - Version control for knowledge graphs and ontologies
  • CSV Ingestion Improvements - Auto-detection and robust error handling
  • Comprehensive Test Coverage - 80-86% coverage for ingestion modules

Bug Fixes

  • Temperature compatibility for LLM providers
  • JenaStore empty graph initialization

✨ New Features & Enhancements

W3C PROV-O Compliant Provenance Tracking

PRs: #254, #246 | Contributor: @KaifAhmad1

A comprehensive provenance tracking system with W3C PROV-O compliance across all 17 Semantica modules.

Core Module:

  • ProvenanceManager for centralized tracking
  • W3C PROV-O schemas (Activity, Entity, Agent)
  • Storage backends: InMemory and SQLite
  • SHA-256 integrity verification

Module Integrations:

  • Semantic Extract, LLMs (Groq, OpenAI, HuggingFace, LiteLLM)
  • Pipeline, Context, Ingest, Embeddings
  • Graph/Vector/Triplet stores
  • Reasoning, Conflicts, Deduplication
  • Export, Parse, Normalize, Ontology, Visualization

Features:

  • Complete lineage tracking: Document β†’ Chunk β†’ Entity β†’ Relationship β†’ Graph
  • LLM tracking: tokens, costs, latency
  • Source tracking and bridge axioms for domain transformations

Compliance:

  • W3C PROV-O, FDA 21 CFR Part 11, SOX, HIPAA, TNFD

Testing:

  • 237 tests covering core functionality, all 17 module integrations, edge cases, backward compatibility

Design:

  • Opt-in with provenance=False by default
  • Zero breaking changes
  • No new dependencies

Enhanced Change Management Module

PRs: #248, #243 | Contributor: @KaifAhmad1

Enterprise-grade version control for knowledge graphs and ontologies with persistent storage and audit trails.

Core Classes:

  • TemporalVersionManager - Knowledge graph versioning
  • OntologyVersionManager - Ontology versioning
  • ChangeLogEntry - Change metadata tracking

Storage:

  • SQLite (persistent) and in-memory backends
  • Thread-safe operations

Features:

  • SHA-256 checksums for integrity
  • Detailed entity/relationship diffs
  • Structural ontology comparison
  • Email validation

Compliance:

  • HIPAA, SOX, FDA 21 CFR Part 11
  • Immutable audit trails

Testing:

  • 104 tests (100% pass)
  • Unit, integration, compliance, performance, edge cases

Performance:

  • 17.6ms for 10k entities
  • 510+ ops/sec concurrent
  • Handles 5k+ entity graphs

Migration:

  • Backward compatible
  • Simplified class names
  • Zero external dependencies

CSV Ingestion Enhancements

PR: #244 | Contributor: @saloni0318

Robust CSV parsing with auto-detection and error handling.

Features:

  • Auto-detect CSV encoding using chardet
  • Auto-detect delimiter using csv.Sniffer
  • Tolerant decoding and malformed-row handling (on_bad_lines='warn')
  • Optional chunked reading for large files
  • Metadata tracks detected values

Testing:

  • Expanded unit tests covering:
    • Multiple delimiters
    • Quoted/multiline fields
    • Header overrides
    • Chunked reading
    • NaN preservation

Comprehensive Test Coverage

TextNormalizer Tests

PR: #242 | Contributor: @ZohaibHassan16

Added focused test coverage for TextNormalizer behavior across various inputs.

Integration Test Improvements

PR: #241 | Contributor: @KaifAhmad1

  • Introduced integration test marker
  • Reduced noisy warnings in ingest tests

Ingest Unit Tests

PRs: #239, #232 | Contributor: @Mohammed2372

Comprehensive unit tests for ingestion modules (file, web, and feed ingestors).

Coverage:

  • File scanning: local/cloud (S3/GCS/Azure)
  • Web ingestion: URL/sitemap/robots.txt
  • RSS/Atom feed parsing

Testing:

  • 998 lines of test code
  • Mocked external dependencies for fast, isolated execution

Results:

  • file_ingestor: 86% coverage
  • web_ingestor: 86% coverage
  • feed_ingestor: 80% coverage

Covers happy paths, edge cases, and error handling.


πŸ› Bug Fixes

Temperature Compatibility Fix

PRs: #256, #252 | Contributors: @F0rt1s, @IGES-Institut

Fixed hardcoded temperature=0.3 that broke compatibility with models requiring specific temperature values (e.g., gpt-5-mini).

Changes:

  • Added _add_if_set helper method to BaseProvider
  • Only passes parameters when explicitly set
  • When temperature=None, parameter is omitted allowing APIs to use model defaults
  • Updated all 5 providers: OpenAI, Groq, Gemini, Ollama, DeepSeek

Impact:

  • Reduced code by ~85 lines with cleaner parameter handling
  • Comprehensive test coverage added (10 temperature tests, all passing)
  • Backward compatible - no breaking changes

JenaStore Empty Graph Bug

PRs: #257, #258 | Contributor: @ZohaibHassan16

Fixed ProcessingError: Graph not initialized when operating on empty (but initialized) graphs.

Changes:

  • Replaced implicit if not self.graph: checks with explicit if self.graph is None: validation
  • Updated 5 methods: add_triplets, get_triplets, delete_triplet, execute_sparql, serialize
  • Properly distinguishes None (uninitialized) from empty graphs (initialized with 0 triplets)

Impact:

  • Unblocks benchmarking suite
  • Enables fresh deployments
  • Improves testing workflows

πŸ“¦ Installation

pip install semantica==0.2.6

Or upgrade from a previous version:

pip install --upgrade semantica

πŸ™ Contributors

Special thanks to all contributors who made this release possible:


πŸ“š Documentation


πŸ”— Links


πŸš€ What's Next?

Stay tuned for upcoming features in future releases. Check our GitHub Issues to see what we're working on!


Full Changelog: v0.2.5...v0.2.6

Deep Extraction, BYOM & Pinecone Support (v0.2.5)

27 Jan 16:26

Choose a tag to compare

Semantica v0.2.5

πŸš€ Release Highlights

This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.

🌟 New Features

Pinecone Vector Store Support

  • Implemented native PineconeStore with full CRUD capabilities.
  • Support for serverless and pod-based indexes, namespaces, and metadata filtering.
  • Fully integrated with the unified VectorStore interface and registry.
  • (Closes #219, Resolves #220)

Configurable LLM Retry Logic

  • Exposed max_retries parameter in NERExtractor, RelationExtractor, and TripletExtractor.
  • Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
  • Propagated retry configuration through chunked processing helpers for consistent long-document handling.

Bring Your Own Model (BYOM) Support

  • Custom Hugging Face Models: Enabled full support for custom models in NERExtractor, RelationExtractor, and TripletExtractor.
  • Custom Tokenizers: Added support for models with non-standard tokenization requirements.
  • Runtime Overrides: extract(model=...) now correctly overrides configuration defaults.

Enhanced Extraction Capabilities

  • NER: Added configurable aggregation strategies (simple, first, average, max) and robust IOB/BILOU parsing.
  • Relation Extraction: Implemented standard entity marker techniques (<subj>, <obj>) and structured output parsing.
  • Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.

πŸ› Bug Fixes

  • LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing max_retries limits.
  • Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
  • Import Handling: Fixed circular import issues in test suites via improved mocking strategies.

πŸ“¦ Installation

pip install semantica==0.2.5

Semantica v0.2.4

22 Jan 07:20

Choose a tag to compare

Added

  • Ontology Ingestion Module:
    • Implemented OntologyIngestor for parsing RDF/OWL files (Turtle, RDF/XML, JSON-LD, N3).
    • Added ingest_ontology and unified ingest(source_type="ontology") interface.
    • Added recursive directory scanning for batch ontology ingestion.
    • Added OntologyData dataclass for consistent metadata.
  • Documentation:
    • Updated ontology_usage.md and ontology.md with usage examples and API details.
  • Tests:
    • Added comprehensive test suite tests/ingest/test_ontology_ingestor.py.
    • Added examples/demo_ontology_ingest.py for end-to-end demonstration.

Semantica v0.2.3

20 Jan 06:39

Choose a tag to compare

We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.

πŸš€ Added

Vector Store High-Performance Ingestion

  • New add_documents API: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing.
  • embed_batch Helper: Efficiently generate embeddings for lists of texts without immediate storage.
  • Parallel Defaults: Enabled default parallel ingestion in VectorStore (default: max_workers=6) for faster processing.
  • Documentation: Added dedicated guide docs/vector_store_usage.md for high-performance configuration.
  • Tests: Added tests/vector_store/test_vector_store_parallel.py covering parallel vs. sequential performance and edge cases.

Amazon Neptune Dev Environment

  • CloudFormation Template: Added cookbook/introduction/neptune-setup.yaml to provision a development Neptune cluster with public endpoints and IAM auth.
  • Documentation: Updated cookbook/introduction/21_Amazon_Neptune_Store.ipynb with deployment guides, cost estimates, and IAM best practices.
  • Linting: Added cfn-lint to pre-commit hooks for CloudFormation validation.

Comprehensive Test Suite

  • Unit Tests: Added tests/test_relations_llm.py covering typed and structured response paths for relation extraction.
  • Integration Tests: Added tests/integration/test_relations_groq.py for real Groq API validation.

πŸ› Fixed

LLM Relation Extraction Parsing

  • Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
  • Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
  • JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
  • Parameter Cleanup: Removed unsupported kwargs (max_tokens, max_entities_prompt) from internal calls to prevent API errors.

Pipeline Circular Import

  • Resolved Import Cycles: Fixed circular dependency between pipeline_builder and pipeline_validator (Issues #192, #193).
  • Lazy Loading: Implemented lazy loading for PipelineValidator to ensure stable imports.

JupyterLab Stability

  • Progress Output Control: Added SEMANTICA_DISABLE_JUPYTER_PROGRESS environment variable.
  • Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).

⚑ Changed

Relation Extraction API

  • Simplified Interface: Removed unused kwargs to prevent parameter leakage.
  • Better Debugging: Improved error handling and verbose logging for extraction workflows.
  • Robust Parsing: Enhanced post-response parsing stability across different LLM providers.

Vector Store Defaults

  • Standardized Concurrency: Set default max_workers=6 for VectorStore parallel ingestion.
  • Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.