You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running the examples_deepspeed/generate_text.sh.
From now on, I can success run this script with 1 node 8 gpus when experts=1.
But, when I set experts = 8, errors happen.
The compete errors are as follows :
using world size: 8, data-parallel-size: 8, sequence-parallel size: 1, tensor-model-parallel size: 1, pipeline-model-parallel size: 1
> building GPT2BPETokenizer tokenizer ...
> padded vocab (size: 50257) with 47 dummy tokens (new size: 50304)
> initializing torch distributed ...
[2024-01-22 08:37:16,393] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,399] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,399] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-22 08:37:16,648] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,668] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,669] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,687] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,852] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-22 08:37:16,856] [INFO] [comm.py:637:init_distributed] cdb=None
> initialized tensor model parallel with size 1
> initialized pipeline model parallel with size 1
> setting random seeds to 1234 ...
> initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
> compiling dataset index builder ...
make: Entering directory '/home/ai/jrtPain/Megatron-DeepSpeed/megatron/data'
make: Nothing to be done for 'default'.
make: Leaving directory '/home/ai/jrtPain/Megatron-DeepSpeed/megatron/data'
>>> done with dataset index builder. Compilation time: 0.061 seconds
WARNING: constraints for invoking optimized fused softmax kernel are not met. We default back to unfused kernel invocations.
> compiling and loading fused kernels ...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ai/jrtPain/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_upper_triang_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_upper_triang_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ai/jrtPain/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_masked_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_masked_softmax_cuda...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ai/jrtPain/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja...
Building extension module scaled_softmax_cuda...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module scaled_softmax_cuda...
>>> done with compiling and loading fused kernels. Compilation time: 2.871 seconds
building GPT model ...
[2024-01-22 08:37:20,393] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,430] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,463] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,488] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,518] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,547] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,577] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,620] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,654] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,686] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,719] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
[2024-01-22 08:37:20,749] [INFO] [logging.py:96:log_dist] [Rank 0] Creating MoE layer with num_experts: 8 | num_local_experts: 8 | expert_parallel_size: 1
Emitting ninja build file /home/ai/.cache/torch_extensions/py310_cu121/transformer_inference/build.ninja...
Building extension module transformer_inference...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module transformer_inference...
Time to load transformer_inference op: 0.11441206932067871 seconds
Traceback (most recent call last):
File "/home/ai/jrtPain/Megatron-DeepSpeed/tools/generate_samples_gpt.py", line 178, in <module>
main()
File "/home/ai/jrtPain/Megatron-DeepSpeed/tools/generate_samples_gpt.py", line 141, in main
model = ds_inference(model, args)
File "/home/ai/jrtPain/Megatron-DeepSpeed/tools/generate_samples_gpt.py", line 164, in ds_inference
engine = deepspeed.init_inference(model=model,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/__init__.py", line 342, in init_inference
engine = InferenceEngine(model, config=ds_inference_config)
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 158, in __init__
self._apply_injection_policy(config)
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/inference/engine.py", line 418, in _apply_injection_policy
replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 342, in replace_transformer_layer
replaced_module = replace_module(model=model,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 586, in replace_module
replaced_module, _ = _replace_module(model, policy, state_dict=sd)
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 646, in _replace_module
_, layer_id = _replace_module(child,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 646, in _replace_module
_, layer_id = _replace_module(child,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 646, in _replace_module
_, layer_id = _replace_module(child,
[Previous line repeated 1 more time]
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 622, in _replace_module
replaced_module = policies[child.__class__][0](child,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 298, in replace_fn
new_module = replace_with_policy(child,
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/replace_module.py", line 250, in replace_with_policy
_container.transpose()
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/containers/features/megatron.py", line 28, in transpose
super().transpose()
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/containers/base.py", line 286, in transpose
self.transpose_mlp()
File "/home/ai/miniconda3/envs/jrt-singlegpu-success/lib/python3.10/site-packages/deepspeed/module_inject/containers/base.py", line 295, in transpose_mlp
self._h4h_w = self.transpose_impl(self._h4h_w.data)
AttributeError: 'list' object has no attribute 'data'
the version of my environment is :
deepspeed 0.12.6
torch 2.1.1
transformers 4.25.0
The text was updated successfully, but these errors were encountered:
I am running the examples_deepspeed/generate_text.sh.
From now on, I can success run this script with 1 node 8 gpus when experts=1.
But, when I set experts = 8, errors happen.
The compete errors are as follows :
the version of my environment is :
The text was updated successfully, but these errors were encountered: