Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust vocab size calculation of DPO model to be dynamic #141

Open
p-ferreira opened this issue Aug 28, 2023 · 0 comments
Open

Adjust vocab size calculation of DPO model to be dynamic #141

p-ferreira opened this issue Aug 28, 2023 · 0 comments

Comments

@p-ferreira
Copy link
Contributor

The current DPO model returns a hardcoded value (-11 that will be exp(-11)) as the nearest integer value that is less than equal probability value across all logits (1/50257), reference in #138

Chunk of code of openvalidators/reward/dpo.py:

...
# Check if completion is 
        if completion.strip() == '' or len(completion) <= 5:
            return -11 # exp(-11)=1.67e-5 < 2e-5=1/50257 (typical vocab size)
...

The 50257 vocab size is taken as typical vocab size but that could be different for other models / tokenizers.
Ideally, this value would be calculated automatically like 1 / model.vocab_size , rather than a hard-coded number

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant