Vocab is not accessible, it is in gs://t5-data #14

cnut1648 · 2022-11-30T19:53:21Z

Hello, first of all, I want to say nice work!

When I want to reproduce your results on chemprot, I notice the following auth issue in the code

model.finetune(
    mixture_or_task_name="re_all",
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=FINETUNE_STEPS
)

2022-11-30 14:46:16.639835: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".

Turns out that this is caused by not being able to find vocab which is in 'gs://t5-data/vocabs/cc_all.32000/sentencepiece.model'. But currently only gs://scifive is accessible.

Could you please release the vocab or share with us how exactly did you obtain the sentencepiece vocab so that we can reproduce the results? Thank you!

The text was updated successfully, but these errors were encountered:

justinphan3110 · 2022-12-04T04:21:40Z

Hey @cnut1648 , Sorry for the late response. On the Chemprot task, we found out that we used the same eval approach as in scibert. Therefore, Chemprot result reported in the paper is not comparable to other recent works which use different eval approaches.

The new results for Scifive on chemprot should be: ~78

Let me know if you still want the code to reproduce this result

cnut1648 · 2022-12-05T19:07:52Z

Hi @justinphan3110 thank you for the info!
It would still be nice to reproduce it. I am working on an experiment on low-resource medical RE setting so I probably will reproduce your results and also extend to low-resource chemprot & ddi.
If it's not a big burden to ask, may we access to relevant training files (eg vocab) to reproduce chemprot and ddi experiments?
Thank you!

justinphan3110 · 2022-12-12T22:13:56Z

@cnut1648 I have just updated the script to use the similar chemprot dataset as the BLURB Leaderboard. You can access it here. Let me know if you still have any question

cnut1648 · 2022-12-17T08:50:25Z

Hi @justinphan3110 thanks! I can reproduce the results now! Great work!

cnut1648 closed this as completed Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocab is not accessible, it is in gs://t5-data #14

Vocab is not accessible, it is in gs://t5-data #14

cnut1648 commented Nov 30, 2022 •

edited

justinphan3110 commented Dec 4, 2022 •

edited

cnut1648 commented Dec 5, 2022

justinphan3110 commented Dec 12, 2022

cnut1648 commented Dec 17, 2022

Vocab is not accessible, it is in gs://t5-data #14

Vocab is not accessible, it is in gs://t5-data #14

Comments

cnut1648 commented Nov 30, 2022 • edited

justinphan3110 commented Dec 4, 2022 • edited

cnut1648 commented Dec 5, 2022

justinphan3110 commented Dec 12, 2022

cnut1648 commented Dec 17, 2022

cnut1648 commented Nov 30, 2022 •

edited

justinphan3110 commented Dec 4, 2022 •

edited