Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocab is not accessible, it is in gs://t5-data #14

Closed
cnut1648 opened this issue Nov 30, 2022 · 4 comments
Closed

Vocab is not accessible, it is in gs://t5-data #14

cnut1648 opened this issue Nov 30, 2022 · 4 comments

Comments

@cnut1648
Copy link

cnut1648 commented Nov 30, 2022

Hello, first of all, I want to say nice work!

When I want to reproduce your results on chemprot, I notice the following auth issue in the code

model.finetune(
    mixture_or_task_name="re_all",
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=FINETUNE_STEPS
)
2022-11-30 14:46:16.639835: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "NOT_FOUND: Could not locate the credentials file.". Retrieving token from GCE failed with "FAILED_PRECONDITION: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".

Turns out that this is caused by not being able to find vocab which is in 'gs://t5-data/vocabs/cc_all.32000/sentencepiece.model'. But currently only gs://scifive is accessible.

Could you please release the vocab or share with us how exactly did you obtain the sentencepiece vocab so that we can reproduce the results? Thank you!

@justinphan3110
Copy link
Owner

justinphan3110 commented Dec 4, 2022

Hey @cnut1648 , Sorry for the late response. On the Chemprot task, we found out that we used the same eval approach as in scibert. Therefore, Chemprot result reported in the paper is not comparable to other recent works which use different eval approaches.

The new results for Scifive on chemprot should be: ~78

Let me know if you still want the code to reproduce this result

@cnut1648
Copy link
Author

cnut1648 commented Dec 5, 2022

Hi @justinphan3110 thank you for the info!
It would still be nice to reproduce it. I am working on an experiment on low-resource medical RE setting so I probably will reproduce your results and also extend to low-resource chemprot & ddi.
If it's not a big burden to ask, may we access to relevant training files (eg vocab) to reproduce chemprot and ddi experiments?
Thank you!

@justinphan3110
Copy link
Owner

@cnut1648 I have just updated the script to use the similar chemprot dataset as the BLURB Leaderboard. You can access it here. Let me know if you still have any question

@cnut1648
Copy link
Author

Hi @justinphan3110 thanks! I can reproduce the results now! Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants