Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Missing add_special_tokens in biencoder? #122

Open
jojonki opened this issue Aug 3, 2022 · 1 comment
Open

Missing add_special_tokens in biencoder? #122

jojonki opened this issue Aug 3, 2022 · 1 comment

Comments

@jojonki
Copy link

jojonki commented Aug 3, 2022

Hi! Thank you for your great work!

To my understanding, BLINK uses special tokens to represent a mention position and entity title for both bi-encoder and cross-encoder.

In cross-encoder, your code actually sets special tokens to the tokenizer.
https://github.com/facebookresearch/BLINK/blob/main/blink/crossencoder/crossencoder.py#L82-L89

But in bi-encoder, add_special_tokens is not called which means special tokens are just processed as [UNK].
https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/biencoder.py#L82-L87

Did you write this intentionally? If so, could you elaborate on that?

@abhinavkulkarni
Copy link

@jojonki: Yes, the special tokens are missing form biencoder tokenizer.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants