FP16 with Zero and Gradient Accumulation in Configuration File #1190
-
Hi, Is there a specific way to define fp16 with ZeRO and gradient accumulation in the configuration file? Does defining ZeRO in the configuration file automatically handle mixed precision and use fp16 for training, so you would not even need to define a dictionary for fp16? I receive this error when using both in the
For example, the currently defined from colossalai.amp import AMP_TYPE
from colossalai.zero.shard_utils import TensorShardStrategy
fp16 = dict(
mode = AMP_TYPE.TORCH,
init_scale = 2.**16,
growth_factor = 2.0,
backoff_factor = 0.5,
growth_interval = 2000,
enabled = True
)
zero = dict(
model_config = dict(
shard_strategy = TensorShardStrategy(),
tensor_placement_policy = 'cpu',
reuse_fp16_shard = False
)
)
gradient_accumulation = 4
clip_grad_norm = 1.0 Also as a side note, how should we cite ColossalAI? Is there a preferred method? Thank you, Enrico |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi, Enrico |
Beta Was this translation helpful? Give feedback.
Hi, Enrico
ZeRO will convert the master weight into fp16 type during training, and convert them into fp32 type during steping. So you don't need to set fp16 configuration when ZeRO is used in your training.