-
Notifications
You must be signed in to change notification settings - Fork 286
Add NVFP4 DS #2356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NVFP4 DS #2356
Conversation
Signed-off-by: yiliu30 <yi4.liu@intel.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
User description
Signed-off-by: yiliu30 yi4.liu@intel.com
PR Type
Enhancement
Description
Added support for NVFP4 quantization scheme
Updated usage instructions and validation checks
Modified environment variable settings for NVFP4
Diagram Walkthrough
File Walkthrough
quantize.py
Add NVFP4 configurationexamples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/quantize.py
config_dictenable_torch_compileto Truelow_gpu_mem_usageparameterrun_evaluation.sh
Update evaluation script for NVFP4examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_evaluation.sh
run_generate.sh
Update generation script for NVFP4examples/pytorch/nlp/huggingface_models/language-modeling/quantization/auto_round/deepseek/run_generate.sh