Deep Extraction, BYOM & Pinecone Support (v0.2.5)
Semantica v0.2.5
π Release Highlights
This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.
π New Features
Pinecone Vector Store Support
- Implemented native
PineconeStorewith full CRUD capabilities. - Support for serverless and pod-based indexes, namespaces, and metadata filtering.
- Fully integrated with the unified
VectorStoreinterface and registry. - (Closes #219, Resolves #220)
Configurable LLM Retry Logic
- Exposed
max_retriesparameter inNERExtractor,RelationExtractor, andTripletExtractor. - Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
- Propagated retry configuration through chunked processing helpers for consistent long-document handling.
Bring Your Own Model (BYOM) Support
- Custom Hugging Face Models: Enabled full support for custom models in
NERExtractor,RelationExtractor, andTripletExtractor. - Custom Tokenizers: Added support for models with non-standard tokenization requirements.
- Runtime Overrides:
extract(model=...)now correctly overrides configuration defaults.
Enhanced Extraction Capabilities
- NER: Added configurable aggregation strategies (
simple,first,average,max) and robust IOB/BILOU parsing. - Relation Extraction: Implemented standard entity marker techniques (
<subj>,<obj>) and structured output parsing. - Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.
π Bug Fixes
- LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing
max_retrieslimits. - Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
- Import Handling: Fixed circular import issues in test suites via improved mocking strategies.
π¦ Installation
pip install semantica==0.2.5