feat: wire NER entity extraction into Lithoglyph importer#32
Merged
Conversation
After evidence is imported from Lithoglyph, extract named entities from content_text using regex-based NER (titled names, org suffixes, capitalised sequences), resolve them against existing entities via Entities.resolve_ner_output (exact → fuzzy → auto-create), and create :mentions relationship edges in ArangoDB. Entity linking is best-effort — failures are logged but don't block the import. - Add NERExtractor module with 3 extraction strategies - Wire NER into Importer.import_single_record post-create step - Extend Relationship schema with :entity type and :mentions edges - Update graph traversal helpers to handle entity nodes - Add 13 unit tests for NER extraction (all passing) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
NERExtractor) with 3 strategies: titled names, organisation suffixes, capitalised multi-word sequencescontent_text, resolved viaEntities.resolve_ner_output/2(exact match → fuzzy Jaro-Winkler → auto-create), and linked with:mentionsrelationship edges:entitynode type and:mentionsedge typeget_node_relationships,find_path,parse_nodes) for entity nodesTest plan
🤖 Generated with Claude Code