Skip to content

v0.3.0

Choose a tag to compare

@github-actions github-actions released this 10 Mar 22:07
· 700 commits to main since this release

🧠 Semantica v0.3.0 β€” First Stable Release

Released: 2026-03-10 Β |Β  PyPI: pip install semantica Β |Β  Python: 3.8 – 3.12 Β |Β  License: MIT

The first Production/Stable release of Semantica β€” an open-source framework for building context graphs and decision intelligence layers for AI agents. This release consolidates everything shipped across three stages: 0.3.0-alpha (2026-02-19), 0.3.0-beta (2026-03-07), and 0.3.0 stable (2026-03-10).

pip install --upgrade semantica

No breaking changes. All new parameters carry safe defaults. All new methods are purely additive.


🚦 Release Highlights

  • πŸ• Temporal Validity β€” valid_from/valid_until on nodes & edges; query what's active at any point in time
  • πŸ”— Cross-Graph Navigation β€” link separate ContextGraph instances; navigate across them; survives save/load
  • βš–οΈ Weighted BFS Traversal β€” filter multi-hop queries by edge confidence with min_weight
  • 🧠 Decision Intelligence β€” full lifecycle: record β†’ causal chain β†’ impact analysis β†’ precedent search β†’ policy enforcement
  • πŸ”„ Delta Processing β€” SPARQL-based incremental graph diffs; only changed data flows through the pipeline
  • πŸ—ƒοΈ Deduplication v2 β€” 6.98x faster semantic dedup, 63.6% faster candidate generation
  • πŸ“€ New Export Formats β€” ArangoDB AQL, Apache Parquet (Spark/BigQuery/Databricks ready)
  • πŸ—„οΈ Graph Backends β€” Apache AGE, PgVector, AWS Neptune, FalkorDB
  • βœ… 886+ tests passing β€” 0 failures

πŸ‘₯ Contributors

Contributor Areas
@KaifAhmad1 Lead maintainer β€” context graph, decision intelligence, KG algorithms, semantic extraction, pipeline, provenance, bug fixes, release management
@ZohaibHassan16 Deduplication v2 suite, incremental/delta processing, benchmark suite
@Sameer6305 Apache AGE backend, PgVector store, Snowflake connector, Apache Arrow export
@tibisabau ArangoDB AQL export, Apache Parquet export
@d4ndr4d3 ResourceScheduler deadlock fix

✨ v0.3.0 Stable β€” Context Graph Feature Completeness

Shipped 2026-03-10 Β· All changes by @KaifAhmad1

πŸ• Temporal Validity Windows

Nodes and edges now carry first-class valid_from / valid_until ISO datetime fields β€” stored directly on the ContextNode and ContextEdge dataclasses, not buried in metadata.

New API:

  • add_node(valid_from=..., valid_until=...) and add_edge(valid_from=..., valid_until=...) β€” set validity window at creation
  • node.is_active(at_time=None) and edge.is_active(at_time=None) β€” returns True if live at the given time (defaults to now)
  • graph.find_active_nodes(node_type=None, at_time=None) β€” filters entire graph to active nodes only

Bug fixes:

  • is_active() crashed with TypeError on tz-aware datetime inputs β€” fixed by normalising to tz-naive UTC via new _parse_iso_dt() helper
  • Validity fields silently lost during serialisation β€” fixed across all four paths: add_nodes(), add_edges(), to_dict(), from_dict()

πŸ”— Cross-Graph Navigation

Separate ContextGraph instances can now be linked and navigated between. Links are fully durable β€” they survive save_to_file() / load_from_file() and reconnect via a registry.

New API:

  • graph.graph_id β€” stable UUID assigned at init; persisted to JSON
  • link_graph(other_graph, source_node_id, target_node_id) β€” creates a navigable bridge; returns link_id
  • navigate_to(link_id) β€” returns (other_graph, target_node_id)
  • resolve_links({graph_id: instance}) β€” reconnects links after load; returns count resolved
  • save_to_file() β€” now writes a links section alongside nodes and edges
  • load_from_file() β€” restores graph_id and populates _unresolved_links

Bug fix: Previous implementation auto-created marker targets as phantom "entity" nodes β€” fixed by pre-creating a "cross_graph_link" typed ContextNode before inserting the marker edge.

14 new tests in tests/context/test_cross_graph_navigation.py covering link creation, phantom-node prevention, partial registry resolution, and full save/load round-trips.


βš–οΈ Weighted Multi-Hop BFS Traversal

get_neighbors() now accepts a min_weight threshold to confine traversal to high-confidence causal links only. Default 0.0 passes all edges β€” fully backward-compatible.


πŸ”§ Additional Fixes in v0.3.0 Stable

  • PipelineBuilder.add_step() return type annotation corrected from "PipelineBuilder" to "PipelineStep"
  • test_hybrid_search_performance fixed to accumulate a true search_times list; threshold relaxed to < 5.0s for real sentence-transformers latency

πŸ”§ v0.3.0-beta β€” Semantic Extraction, Deduplication v2, New Export Formats

Shipped 2026-03-07

🧩 Semantic Extraction Fixes β€” @KaifAhmad1 (PR #354, #355)

LLM Relation Extraction:

  • Unmatched subjects/objects now produce a synthetic UNKNOWN entity instead of silently dropping the relation
  • Orphaned legacy block in _parse_relation_result that appended every relation twice has been removed
  • extraction_method parameter added β€” typed extraction paths now record "llm_typed" instead of "llm"

Reasoner Pattern Matching:

  • _match_pattern in reasoner.py fully rewritten β€” splits patterns on ?var placeholders, escapes only literal segments, uses backreferences for repeated variables and non-greedy .+? to prevent over-consumption

RDF Export Aliases:

  • RDFExporter now accepts "ttl", "nt", "xml", "rdf", and "json-ld" as format aliases β€” zero API changes

Tests added: tests/reasoning/test_reasoner.py (4 tests), tests/semantic_extract/test_relation_extractor.py (6 tests), tests/export/test_rdf_exporter.py (8 tests)


πŸ”„ Incremental / Delta Processing β€” @ZohaibHassan16, @KaifAhmad1 (PR #349)

  • Native SPARQL-based diff between graph snapshots β€” only changed triples enter the pipeline
  • delta_mode flag in PipelineBuilder for near-real-time incremental workloads
  • Version snapshot management with graph URI tracking and per-snapshot metadata storage
  • prune_versions() for automatic retention cleanup of old snapshots

Bug fixes: corrected SPARQL variable order, fixed class references, resolved duplicate dictionary keys.


πŸ—ƒοΈ Deduplication v2 Suite β€” @ZohaibHassan16, @KaifAhmad1 (PR #338, #339, #340, #344)

Three independently opt-in tiers β€” legacy mode remains the default, fully backward-compatible.

Candidate Generation v2 (PR #338):

  • New blocking_v2 and hybrid_v2 strategies replace O(NΒ²) pair enumeration
  • Multi-key blocking with normalised token prefixes, type-aware keys, and optional phonetic (Soundex) matching
  • Deterministic max_candidates_per_entity budgeting with stable sorting
  • 63.6% faster in worst-case scenarios (0.259s β†’ 0.094s for 100 entities)

Two-Stage Scoring Prefilter (PR #339):

  • Fast gates for type mismatch, name-length ratio, and token overlap eliminate expensive semantic scoring for obvious non-matches
  • Configurable thresholds: min_length_ratio, min_token_overlap_ratio, required_shared_token
  • 18–25% faster batch processing when enabled (prefilter_enabled=False by default)

Semantic Relationship Deduplication v2 (PR #340):

  • Canonicalisation engine with predicate synonym mapping (e.g. works_for β†’ employed_by)
  • O(1) hash matching for exact canonical signatures before any semantic scoring
  • Weighted scoring: 60% predicate + 40% object with explainable semantic_match_score in metadata
  • 6.98x faster than legacy mode (83ms vs 579ms)
  • dedup_triplets() infinite recursion bug fixed; promoted to first-class API in methods.py

Migration guide: MIGRATION_V2.md with complete examples for all v2 strategies (PR #344)


πŸ“€ New Export Formats β€” @tibisabau (PR #342, #343)

ArangoDB AQL Export (PR #342):

  • Full AQL INSERT statement generation for vertices and edges
  • Configurable collection names with validation and sanitisation; batch processing (default: 1000)
  • export_arango() convenience function; .aql auto-detection in the unified exporter
  • 17 tests β€” 100% pass rate

Apache Parquet Export (PR #343):

  • Columnar storage with configurable compression: snappy, gzip, brotli, zstd, lz4, none
  • Explicit Apache Arrow schemas with type safety and field normalisation
  • Analytics-ready: pandas, Spark, Snowflake, BigQuery, Databricks
  • export_parquet() convenience function; .parquet auto-detection
  • 25 tests β€” 100% pass rate

πŸ› Beta Bug Fixes β€” @KaifAhmad1

Context module:

  • retrieve_decision_precedents β€” entity extraction correctly gated on use_hybrid_search=True
  • _extract_entities_from_query β€” switched to word[0].isupper() to capture camelCase identifiers like CreditCard
  • Added missing expand_context() (BFS traversal) and _get_decision_query() methods
  • Fixed hybrid_retrieval, dynamic_context_traversal, multi_hop_context_assembly for correct single-pass BFS
  • Fixed _retrieve_from_vector fallback to prevent empty content and negative similarity scores

KG module:

  • calculate_pagerank β€” added alpha/max_iter aliases; return format structured to {"centrality": scores, "rankings": sorted_list}
  • community_detector._to_networkx β€” fixed silent edge-loss when a NetworkX graph is passed directly
  • Added 9 domain-specific tracking methods to AlgorithmTrackerWithProvenance
  • Created provenance_tracker.py with ProvenanceTracker; correctly exported from semantica.kg

Pipeline module:

  • Retry loop fixed β€” now correctly iterates to max_retries
  • Added RecoveryAction + handle_failure(error, policy, retry_count) with LINEAR, EXPONENTIAL, and FIXED backoff
  • add_step() fixed to return the created PipelineStep
  • validate added as public alias for validate_pipeline in PipelineValidator

Other:

  • Fixed NameError β€” missing Type import in utils/helpers.py
  • Vector store performance threshold relaxed from < 100ms to < 500ms per decision
  • Windows cp1252 encoding fixed in test files

Beta result: ~840 tests passing, 36 skipped (external services), 0 failed


πŸš€ v0.3.0-alpha β€” Foundational Features

Shipped 2026-02-19

🧠 Decision Intelligence & Agent Context β€” @KaifAhmad1 (PR #307, #315)

The foundational 0.3.0 feature β€” complete overhaul of semantica.context for production-grade decision intelligence.

Full decision lifecycle:

  • record_decision() β†’ add_causal_relationship() β†’ trace_decision_chain() β†’ analyze_decision_impact() β†’ analyze_decision_influence() β†’ find_similar_decisions()

AgentContext β€” unified wrapper:

  • Feature flags: decision_tracking, kg_algorithms, graph_expansion
  • Methods: store(), retrieve(), get_conversation_history(), get_statistics(), capture_cross_system_inputs()

Supporting components:

  • AgentMemory β€” working, conversation, and long-term memory tiers
  • PolicyEngine β€” versioned policy nodes, check_decision_rules(), PolicyException model, graceful fallback without a graph store
  • Hybrid precedent search β€” vector + structural + category similarity with configurable weights

9 critical bug fixes in PR #315: causal chain depth, None metadata, nonexistent node handling, find_precedents() direction, missing from_dict(), missing properties in to_dict(), UUID generation β€” all 71 context tests passing after fixes.


πŸ“Š KG Algorithms β€” @KaifAhmad1 (PR #292, #293)

30+ graph algorithms across 7 categories:

  • Node embeddings: Node2Vec, DeepWalk, Word2Vec via NodeEmbedder
  • Similarity: cosine, Euclidean, Manhattan, correlation via SimilarityCalculator
  • Path finding: Dijkstra, A*, BFS, K-shortest paths via PathFinder
  • Link prediction: preferential attachment, Jaccard, Adamic-Adar via LinkPredictor
  • Centrality: degree, betweenness, closeness, PageRank via CentralityAnalyzer
  • Community detection: Louvain, Leiden, label propagation via CommunityDetector
  • Connectivity: components, bridges, density via ConnectivityAnalyzer

Decision embedding pipeline (PR #293):

  • DecisionEmbeddingPipeline β€” semantic + structural embeddings
  • HybridSimilarityCalculator β€” configurable weights (semantic: 0.7, structural: 0.3)
  • Convenience API: quick_decision(), find_precedents(), explain(), similar_to(), batch_decisions(), filter_decisions()
  • Performance: 0.028s per decision, 0.031s search, ~0.8KB memory per decision

πŸ—„οΈ Graph Database Backends β€” @Sameer6305, @KaifAhmad1

Apache AGE (PR #311):

  • AgeStore class with full GraphStore API compatibility (openCypher via SQL on PostgreSQL)
  • SQL injection vulnerabilities fixed with comprehensive input validation
  • psycopg2-binary dependency added; migration guide included

PgVector Store (PR #303):

  • Native PostgreSQL vector storage using the pgvector extension
  • Distance metrics: cosine, L2/Euclidean, inner product with automatic score normalisation
  • HNSW and IVFFlat indexing for approximate nearest-neighbour search
  • JSONB metadata with flexible filtering; connection pooling with psycopg3/psycopg2 fallback
  • SQL injection protection via psycopg_sql.SQL(); 36+ tests with Docker integration

βš™οΈ Infrastructure β€” @d4ndr4d3, @KaifAhmad1 (PR #299, #301)

ResourceScheduler Deadlock Fix:

  • Root cause: nested lock acquisition in allocate_resources() with threading.Lock() deadlocked under concurrent load
  • Fix: replaced with threading.RLock() to allow reentrant acquisition
  • Added ValidationError when no resources can be allocated; progress tracking moved outside lock scope
  • 6 regression tests for deadlock prevention

Security Configuration:

  • Dependabot configured for bi-weekly security updates with manual review
  • Automated security scans (Bandit, Safety, Semgrep) on schedule
  • Zero auto-merge policy for security-critical packages

πŸ“Š Test Coverage Summary

  • semantica.context β€” 335 tests
  • semantica.kg β€” ~430 tests
  • semantica.semantic_extract β€” 70 tests (9 skipped β€” external LLM APIs)
  • semantica.reasoning β€” 19 tests
  • semantica.pipeline, semantica.export, semantica.deduplication β€” all passing
  • Real-world E2E scenarios β€” 85 tests
  • Grand total: 886+ passing β€” 0 failures