Skip to content

Semantica v0.2.3

Choose a tag to compare

@github-actions github-actions released this 20 Jan 06:39
· 1056 commits to main since this release

We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.

🚀 Added

Vector Store High-Performance Ingestion

  • New add_documents API: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing.
  • embed_batch Helper: Efficiently generate embeddings for lists of texts without immediate storage.
  • Parallel Defaults: Enabled default parallel ingestion in VectorStore (default: max_workers=6) for faster processing.
  • Documentation: Added dedicated guide docs/vector_store_usage.md for high-performance configuration.
  • Tests: Added tests/vector_store/test_vector_store_parallel.py covering parallel vs. sequential performance and edge cases.

Amazon Neptune Dev Environment

  • CloudFormation Template: Added cookbook/introduction/neptune-setup.yaml to provision a development Neptune cluster with public endpoints and IAM auth.
  • Documentation: Updated cookbook/introduction/21_Amazon_Neptune_Store.ipynb with deployment guides, cost estimates, and IAM best practices.
  • Linting: Added cfn-lint to pre-commit hooks for CloudFormation validation.

Comprehensive Test Suite

  • Unit Tests: Added tests/test_relations_llm.py covering typed and structured response paths for relation extraction.
  • Integration Tests: Added tests/integration/test_relations_groq.py for real Groq API validation.

🐛 Fixed

LLM Relation Extraction Parsing

  • Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
  • Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
  • JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
  • Parameter Cleanup: Removed unsupported kwargs (max_tokens, max_entities_prompt) from internal calls to prevent API errors.

Pipeline Circular Import

  • Resolved Import Cycles: Fixed circular dependency between pipeline_builder and pipeline_validator (Issues #192, #193).
  • Lazy Loading: Implemented lazy loading for PipelineValidator to ensure stable imports.

JupyterLab Stability

  • Progress Output Control: Added SEMANTICA_DISABLE_JUPYTER_PROGRESS environment variable.
  • Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).

⚡ Changed

Relation Extraction API

  • Simplified Interface: Removed unused kwargs to prevent parameter leakage.
  • Better Debugging: Improved error handling and verbose logging for extraction workflows.
  • Robust Parsing: Enhanced post-response parsing stability across different LLM providers.

Vector Store Defaults

  • Standardized Concurrency: Set default max_workers=6 for VectorStore parallel ingestion.
  • Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.