New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zero] Suggests a minor change to confusing variable names in the ZeRO optimizer. #3173
Conversation
Hi @yhna940 Thanks for your contribution. But the way you named is little confusing. For each |
The code coverage for the changed files is 85%. Click me to view the complete report
|
Thank you for your feedback. I understand the concern about the naming conventions and appreciate your suggestion. However, I would like to propose an alternative term, I hope this explanation clarifies my reasoning for suggesting the term To summarize my suggestions:
|
Hi @1SAA Based on our previous discussions, I have renamed the variables that were causing confusion related to Mixed Precision. Could you please review them once again? Thank you! |
Hi @yhna940 Thanks for your contribution, but there are some conflicts in this PR. Could you please solve them first? Thanks. |
Hello @binmakeswell , the conflict has been resolved. thank you |
Hi @yhna940 I am willing to merge your pr once tests in CI are passed. |
Hi @1SAA Github Action CI Logs
Local Test Log
|
Hi @yhna940 It seems like there exists a memory leak in our tests. I think this problem is not caused by your code. I will fix this error soon. |
@yhna940 Can you sync our main branch updates first? We've already fixed the CI test. So after this, we can merge your this PR. Thanks. |
@ver217 I have synced the main branch updates to this PR. Thanks for letting me know that the CI test issue has been resolved. |
The code coverage for the changed files is 86%. Click me to view the complete report
|
…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173
…O optimizer (#183) ## Title - [zero] Suggests a minor change to confusing variable names in the ZeRO optimizer ## Description It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me. Currently, the entire parameter group is named `fp16_param_groups`, and the parts managed by the gpu at the current rank are described as `fp32_flat_param_groups_of_current_rank`. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the `__init__`method is fp16. In other cases, however, its characteristics do not correspond to the variable it. I would like to propose an alternative term, `working_param` and `master_param`. The term is more closely related to the concept of the mixed-precision training context. Using `working_param` and `master_weight` would create a clear distinction between the two types of parameters and help avoid confusion. To summarize my suggestions: - `fp16` -> `working` - `fp32` -> `master` ## Linked Issues - N/A ## Reference - hpcaitech/ColossalAI#3173
📌 Checklist before creating the PR
[doc/gemini/tensor/...]: A concise description
🚨 Issue number
N/A
📝 What does this PR do?
It seems that the variable names related to the mixed precision parameter group do not comprehensively cover its characteristics, so I suggest a few changes. These changes are very trivial, but hopefully they will alleviate some of the confusion for beginners like me.
Currently, the entire parameter group is named
fp16_param_groups
, and the parts managed by the gpu at the current rank are described asfp32_flat_param_groups_of_current_rank
. This state perfectly represents the characteristics when the master weight is a half-tensor or the dtype specified in the__init__
method is fp16. In other cases, however, its characteristics do not correspond to the variable it. So I want it to be renamed according to the sharding state, not the data type, according to thefsdp
convention ofpytorch
. (with names like flatten_sharded_optim_state_dict and full_optim_state_dict).This is a related but more trivial issue, but it seems that the
param_store
methods don't even need to specify fp16.Thank you :)
💥 Checklist before requesting a review
⭐️ Do you enjoy contributing to Colossal-AI?
Tell us more if you don't enjoy contributing to Colossal-AI.