Semantica v0.2.3
We are excited to announce Semantica v0.2.3! This release focuses on stability, performance, and developer experience improvements, including critical fixes for LLM relation extraction, high-performance vector store ingestion, and resolved circular dependencies.
🚀 Added
Vector Store High-Performance Ingestion
- New
add_documentsAPI: Added high-throughput ingestion with automatic embedding generation, batching, and parallel processing. embed_batchHelper: Efficiently generate embeddings for lists of texts without immediate storage.- Parallel Defaults: Enabled default parallel ingestion in
VectorStore(default:max_workers=6) for faster processing. - Documentation: Added dedicated guide
docs/vector_store_usage.mdfor high-performance configuration. - Tests: Added
tests/vector_store/test_vector_store_parallel.pycovering parallel vs. sequential performance and edge cases.
Amazon Neptune Dev Environment
- CloudFormation Template: Added
cookbook/introduction/neptune-setup.yamlto provision a development Neptune cluster with public endpoints and IAM auth. - Documentation: Updated
cookbook/introduction/21_Amazon_Neptune_Store.ipynbwith deployment guides, cost estimates, and IAM best practices. - Linting: Added
cfn-lintto pre-commit hooks for CloudFormation validation.
Comprehensive Test Suite
- Unit Tests: Added
tests/test_relations_llm.pycovering typed and structured response paths for relation extraction. - Integration Tests: Added
tests/integration/test_relations_groq.pyfor real Groq API validation.
🐛 Fixed
LLM Relation Extraction Parsing
- Zero Relations Fix: Resolved issue where relation extraction returned zero results despite successful API calls.
- Response Normalization: Normalized typed responses from Instructor/OpenAI/Groq to a consistent dictionary format.
- JSON Fallback: Added structured JSON fallback when typed generation yields empty results.
- Parameter Cleanup: Removed unsupported kwargs (
max_tokens,max_entities_prompt) from internal calls to prevent API errors.
Pipeline Circular Import
- Resolved Import Cycles: Fixed circular dependency between
pipeline_builderandpipeline_validator(Issues #192, #193). - Lazy Loading: Implemented lazy loading for
PipelineValidatorto ensure stable imports.
JupyterLab Stability
- Progress Output Control: Added
SEMANTICA_DISABLE_JUPYTER_PROGRESSenvironment variable. - Memory Fix: Fallback to console-style output when enabled to prevent JupyterLab out-of-memory errors from infinite scrolling tables (Issue #181).
⚡ Changed
Relation Extraction API
- Simplified Interface: Removed unused kwargs to prevent parameter leakage.
- Better Debugging: Improved error handling and verbose logging for extraction workflows.
- Robust Parsing: Enhanced post-response parsing stability across different LLM providers.
Vector Store Defaults
- Standardized Concurrency: Set default
max_workers=6forVectorStoreparallel ingestion. - Simplified Usage: Updated documentation to rely on smart defaults rather than manual configuration.