Skip to content

Deep Extraction, BYOM & Pinecone Support (v0.2.5)

Choose a tag to compare

@github-actions github-actions released this 27 Jan 16:26
· 1022 commits to main since this release

Semantica v0.2.5

πŸš€ Release Highlights

This release brings native Pinecone Vector Store support, configurable LLM retry logic, and major enhancements to the Semantic Extraction module, including robust support for custom Hugging Face models (BYOM), improved NER/Relation extraction, and completed Triplet extraction logic.

🌟 New Features

Pinecone Vector Store Support

  • Implemented native PineconeStore with full CRUD capabilities.
  • Support for serverless and pod-based indexes, namespaces, and metadata filtering.
  • Fully integrated with the unified VectorStore interface and registry.
  • (Closes #219, Resolves #220)

Configurable LLM Retry Logic

  • Exposed max_retries parameter in NERExtractor, RelationExtractor, and TripletExtractor.
  • Defaults to 3 retries to handle JSON validation failures or API timeouts gracefully.
  • Propagated retry configuration through chunked processing helpers for consistent long-document handling.

Bring Your Own Model (BYOM) Support

  • Custom Hugging Face Models: Enabled full support for custom models in NERExtractor, RelationExtractor, and TripletExtractor.
  • Custom Tokenizers: Added support for models with non-standard tokenization requirements.
  • Runtime Overrides: extract(model=...) now correctly overrides configuration defaults.

Enhanced Extraction Capabilities

  • NER: Added configurable aggregation strategies (simple, first, average, max) and robust IOB/BILOU parsing.
  • Relation Extraction: Implemented standard entity marker techniques (<subj>, <obj>) and structured output parsing.
  • Triplet Extraction: Added specialized parsing for Seq2Seq models (e.g., REBEL) to generate structured triplets directly from text.

πŸ› Bug Fixes

  • LLM Extraction Stability: Fixed infinite retry loops by strictly enforcing max_retries limits.
  • Model Parameter Precedence: Resolved issues where config defaults overrode runtime arguments.
  • Import Handling: Fixed circular import issues in test suites via improved mocking strategies.

πŸ“¦ Installation

pip install semantica==0.2.5