Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

" ValueError: max() arg is an empty sequence " while converting mamba 2 hybrid checkpoint to nemo #10182

Closed
SkanderBS2024 opened this issue Aug 16, 2024 · 4 comments
Assignees
Labels
bug Something isn't working stale

Comments

@SkanderBS2024
Copy link

SkanderBS2024 commented Aug 16, 2024

Describe the bug

As described in the title, after finishing all of the installs and building nemo and megatron-lm from source, assuming that the model has been trained with megatron-lm.

Steps/Code to reproduce bug

[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:280: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(ctx, input, weight, bias, allreduce_dgrad):

[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:290: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, grad_output):

[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:380: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(

[NeMo W 2024-08-16 12:43:58 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/tensor_parallel/layers.py:419: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, grad_output):

[WARNING  | megatron.core.dist_checkpointing.strategies.zarr]: `zarr` distributed checkpoint backend is deprecated. Please switch to PyTorch Distributed format (`torch_dist`).
[NeMo W 2024-08-16 12:43:59 nemo_logging:349] /workspace/megatron/Megatron-LM/megatron/core/dist_checkpointing/strategies/torch.py:22: DeprecationWarning: `torch.distributed._sharded_tensor` will be deprecated, use `torch.distributed._shard.sharded_tensor` instead
      from torch.distributed._sharded_tensor import ShardedTensor as TorchShardedTensor

[NeMo W 2024-08-16 12:43:59 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/tensor_quant.py:84: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
      scaled_e4m3_abstract = torch.library.impl_abstract("trt::quantize_fp8")(

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:164: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(ctx, xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py:240: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, dout):

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:959: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/layer_norm.py:1018: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, dout, *args):

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:26: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(ctx, x, weight, bias, process_group=None, sequence_parallel=True):

[NeMo W 2024-08-16 12:44:02 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/distributed/tensor_parallel.py:62: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, grad_output):

[NeMo W 2024-08-16 12:44:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:736: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
      def forward(ctx, zxbcdt, conv1d_weight, conv1d_bias, dt_bias, A, D, chunk_size, initial_states=None, seq_idx=None, dt_limit=(0.0, float("inf")), return_final_states=False, activation="silu",

[NeMo W 2024-08-16 12:44:03 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/triton/ssd_combined.py:814: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
      def backward(ctx, dout, *args):

Traceback (most recent call last):
  File "/workspace/nemo/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py", line 190, in <module>
    convert(args)
  File "/workspace/nemo/NeMo/scripts/checkpoint_converters/convert_mamba2_pyt_to_nemo.py", line 115, in convert
    num_layers = max(layer_numbers) + 1' 

Expected behavior

Expected to convert the mamba trained model to a .nemo format for fine-tuning.

Environment overview (please complete the following information)

  • Environment location: ubuntu, Docker, fluidstack VM 2 * A100 80.
  • Method of NeMo install: Installed from source and integrated megatron from source.
  • If method of install is [Docker], provide docker pull & docker run commands used :

Docker pull command :

 sudo docker pull nvcr.io/nvidia/pytorch:24.07-py3 

Docker Run command :

docker run --gpus all -it --rm --ipc=host \
  --shm-size=40g \
  -v /ephemeral/megatron:/workspace/megatron \
  -v /ephemeral/data:/workspace/dataset/data \
  -v /ephemeral/outfix:/workspace/dataset/outfix \
  -v /ephemeral/tok:/workspace/dataset/tok \
  -v /ephemeral/checkpoints:/workspace/checkpoints \
  -v /ephemeral/nemo:/workspace/nemo \
  nvcr.io/nvidia/pytorch:24.07-py3 

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version : Ubuntu 22.04.3 LTS
  • PyTorch version : 2.4
  • Python version : 3.10.12

Additional context

Nvidia pytorch container : 24.07 (assmuming training was made with 24.03)
GPUS : 2 * GPU A100 80

Followed steps here : tutorials/llm/mamba/mamba.rst

@SkanderBS2024 SkanderBS2024 added the bug Something isn't working label Aug 16, 2024
@JRD971000
Copy link
Collaborator

Hi @SkanderBS2024, I see you are mounting. You are not using the NeMo container nvcr.io/nvidia/nemo:24.07, and you are mounting the NeMo. I tested the conversion script in the nvcr.io/nvidia/nemo:24.07, and it works fine. However, there is an update needed for the latest main, for which I have raised a PR. #10224. You can either checkout this PR or use the 24.07 nemo container. Thanks for reporting the issue!

@SkanderBS2024
Copy link
Author

hello @JRD971000 , yep i worked with the nvcr.io/nvidia/nemo:24.07 container and everything worked fine thank you for your response.

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Sep 21, 2024
Copy link
Contributor

This issue was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants