Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BertDefaultStringTextTermsMapper -- non masked entity values might be with separation between words #377

Closed
nicolay-r opened this issue Jul 26, 2022 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@nicolay-r
Copy link
Owner

In a nutshell, there is a class StringWithEmbeddingNetworkTermMapping, in which we cover this as follows:

def map_entity(self, e_ind, entity):
assert(isinstance(entity, Entity))
# Value extraction
str_formatted_entity = super(StringWithEmbeddingNetworkTermMapping, self).map_entity(
e_ind=e_ind,
entity=entity)
# Vector extraction
emb_word, vector = self.__vectorizers[TermTypes.ENTITY].create_term_embedding(term=str_formatted_entity)
return emb_word, vector

where create_term_embedding function provides an opportunity to replace with -.

The same does not work for BERT.

@nicolay-r nicolay-r added the bug Something isn't working label Jul 26, 2022
@nicolay-r nicolay-r self-assigned this Jul 26, 2022
nicolay-r added a commit that referenced this issue Jul 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant