-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Update Llama2 cached sin/cos to use max_sequence_length #780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hzfengsy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General LGTM. also cc @MasterJH5574
mlc_llm/core.py
Outdated
| cache_path = os.path.join(args.artifact_path, "mod_cache_before_build.pkl") | ||
| args.raw_params_path = os.path.join(args.artifact_path, "raw_params") | ||
| use_cache = args.use_cache and os.path.isfile(cache_path) | ||
| use_cache = args.use_cache and os.path.isfile(cache_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix style plz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
This one fixes error introduced in mlc-ai#780, where `max_seq_len` is set to `-1` when `max_seq_len` is not specified in the config file.
This one fixes error introduced in #780, where `max_seq_len` is set to `-1` when `max_seq_len` is not specified in the config file.
This PR fixes an issue introduced by mlc-ai#780, which broke our intended behavior to make the cos/sin shape independent of the max sequence length, so that no matter what max sequence length people use, they can always use a same set of prebuilt weight and do not need to clone different weight repositories. This intended behavior is broken by mlc-ai#780. However, it is true that the needs for larger max sequence length are growing. Prior to mlc-ai#780, when the max sequence length is larger than 2048, the cached cos/sin do not work anymore and break. To be compatible as much as possible, this PR changes the behavior to "taking the maximum value of 2048 and the specified max sequence length when building the model lib". With this fix, when the maximum sequence length is smaller than 2048, we are still able to use the prebuilt weights. And when it is larger than 2048, we will only be able to use the weight converted along the build.
…840) This PR fixes an issue introduced by #780, which broke our intended behavior to make the cos/sin shape independent of the max sequence length, so that no matter what max sequence length people use, they can always use a same set of prebuilt weight and do not need to clone different weight repositories. This intended behavior is broken by #780. However, it is true that the needs for larger max sequence length are growing. Prior to #780, when the max sequence length is larger than 2048, the cached cos/sin do not work anymore and break. To be compatible as much as possible, this PR changes the behavior to "taking the maximum value of 2048 and the specified max sequence length when building the model lib". With this fix, when the maximum sequence length is smaller than 2048, we are still able to use the prebuilt weights. And when it is larger than 2048, we will only be able to use the weight converted along the build.
* 'main' of https://github.com/jimscard/mlc-llm: [Doc] Minor update to `Build Android Package from Source` section (mlc-ai#785) added cors to fast api (mlc-ai#757) Update Llama2 cached sin/cos to use max_sequence_length (mlc-ai#780) Update gpu.rst to add sudo apt update before first install (mlc-ai#784) [Doc] Update doc for prebuilt models (mlc-ai#767) Improve code completion experience (mlc-ai#772) Automatically set 'offset' parameter if 'messages' parameter is set (mlc-ai#754) Update tokenizers-cpp to latest and fix rust build error (mlc-ai#762) [Utils] Skip generating benchmark scripts in cases (mlc-ai#759) [Android] Add libOpenCL-pixel for supporintg Pixel phones. (mlc-ai#723)
… and cos functions.
Currently if you need to make the max sequence length of a llama2 model larger than 2048 it will run into issues actually handling more than 2048 tokens. The model would spit out garbage data or segfault if greater than 2048 tokens was sent in. The root cause I believe is due to this hardcoding of the sin and cos functions and I have been able to increase the max sequence length using these updates.