Phrase Matcher fails on OOV tokens #4473
Labels
feat / matcher
Feature: Token, phrase and dependency matcher
more-info-needed
This issue needs more information
How to reproduce the behaviour
I created a phrasematcher to match titles (eg: queen, manager, mayor, etc.) and it fails when applied to a document containing out of vocabulary tokens.
The error it throws is:
ERROR:root:error: "[E018] Can't retrieve string for hash '4332798303416328849'."
I got around this by creating a "clean doc" from the original doc to feed through the phrase matcher like so:
(I added string 'OOV' to replace the oov tokens because I needed the token indices to match the original doc)
I am wondering if there is a better way around this or a way for the phrase matcher code to inherently ignore oov tokens rather than trying to process them
Info about spaCy
The text was updated successfully, but these errors were encountered: