Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about v3 pretraining code of DeBERTa #1

Open
stefan-it opened this issue Jan 29, 2023 · 3 comments
Open

Question about v3 pretraining code of DeBERTa #1

stefan-it opened this issue Jan 29, 2023 · 3 comments

Comments

@stefan-it
Copy link

Hi @DaoTranbk and @HyTruongSon,

many thanks for open sourcing the repo for ViDeBERTa!

I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:

CUDA_VISIBLE_DEVICES=1 python -m DeBERTa.apps.run \

However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: microsoft/DeBERTa#93.

Did you modify the code to add support for GDES? I would highly be interested in that implementation.

Many thanks and cheers,

Stefan

@musabgultekin
Copy link

Any updates on this?

@musabgultekin
Copy link

Kindly pinging @DaoTranbk and @HyTruongSon.

@DaoTranbk
Copy link
Contributor

Thank @stefan-it for your interest in the v3 pretraining of DeBERTa.

In this work, we have modified the code of DeBERTa to add GDES in pretraining, following the DeBERTaV3 paper. If you are interested in that implementation, you can take a look on the latest v3 pretraining code at the original source: https://github.com/microsoft/DeBERTa.

Hope it can be helpful for you.

Regards,
Cong Dao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants