-
Notifications
You must be signed in to change notification settings - Fork 791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(mlx-lm): fix the top_p implementation. #602
Conversation
Btw @mzbac looks like the CI issue is resolved for you, thanks!! |
@awni I have added the unit test, so it is ready for review again. Please let me know if there is anything else that needs to be updated. |
I'm not sure about introducing a separate file for that. Do you think it's necessary? Is the idea that we will add more samplers? |
The main reason for putting it into a separate file is the current utils.py has become too large, making it difficult to maintain. So, I think we may need to start splitting up the utils.py file. sampling utils might be a good starting point. Sampling utils would include top_p, repetitive penalty and any future sampling methods that we want to support. |
Agreed.
I'm not wild about the name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏 thanks for fixing that and for the tests!
The top_p is supposed to sort prob in ascending order and select the cumulative token prob above the threshold. The current implementation works because we have a bug in reversing the sorted prob, which accidentally makes sorted prob in the correct order. So, it's better to fix the implementation to reduce the confusion.