Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MASK] is appended to the end of questions instead of [QUESTION] when finetuning #4

Closed
Liangtaiwan opened this issue Jul 13, 2021 · 2 comments

Comments

@Liangtaiwan
Copy link
Contributor

Hi,

I found that in qass_preprocess.py.
You append [MASK] to the end of questions instead of [QUESTION]

MASK_TOKEN = "[MASK]"

def insert_mask_to_qa_end(data_point):
transformed_data_point = data_point.copy()
qas = []
for qa in transformed_data_point["qas"]:
qa = qa.copy()
qa["question"] += f" {MASK_TOKEN}."
qas.append(qa)
transformed_data_point["qas"] = qas
return transformed_data_point

Is there a difference between implementation and paper?
or this is a bug?

@oriram
Copy link
Owner

oriram commented Jul 13, 2021

Hi @Liangtaiwan, this is not a bug.
The mask token is replaced with a question token in the model itself.
This is not the most elegant solution and will be changed when the model is integrated in HuggingFace's transformers library.
See these lines:

if self.replace_mask_with_question_token:
input_ids = input_ids.clone()
input_ids[input_ids == self.mask_id] = self.question_token_id

@Liangtaiwan
Copy link
Contributor Author

Hi @oriram,
Thanks :)
just found that I did not see

This will add a [MASK] token to each question in the training data, which will later be replaced by a [QUESTION] token automatically by the QASS layer implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants