Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Device side assert triggered on AWQ Mistral converted model #2562

Closed
kdcyberdude opened this issue Feb 14, 2024 · 2 comments
Closed

Device side assert triggered on AWQ Mistral converted model #2562

kdcyberdude opened this issue Feb 14, 2024 · 2 comments

Comments

@kdcyberdude
Copy link

I have converted the TheBloke/Starling-LM-7B-alpha-AWQ model using the following command -
python tools/convert_HF.py --model_dir TheBloke/Starling-LM-7B-alpha-AWQ --output ./Starling-LM-7B-alpha-AWQ-onmt/ --format pytorch --nshards 1

And I am not able to run the inference on the converted model. Getting the following error -
Command I am using to run - python translate.py --config ./Starling-LM-7B-alpha-AWQ-onmt/inference.yaml --src ./input_prompt.txt --output ./output.txt
input_prompt.txt content -
GPT-4 User: How do you manage stress?<|end_of_turn|>GPT4 Assistant:

Traceback (most recent call last):
  File "/mnt/sea/c2/OpenNMT-py/translate.py", line 6, in <module>
    main()
  File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 47, in main
    translate(opt)
  File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 22, in translate
    _, _ = engine.infer_file()
  File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 35, in infer_file
    scores, preds = self._translate(infer_iter)
  File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 159, in _translate
    scores, preds = self.translator._translate(
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 496, in _translate
    batch_data = self.translate_batch(batch, attn_debug)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1067, in translate_batch
    return self._translate_batch_with_strategy(batch, decode_strategy)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1149, in _translate_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 432, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 379, in advance
    self.is_finished_list = self.topk_ids.eq(self.eos).tolist()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

And I have one more question - I am not able to understand the example prompts provided for the mistral model - like the tokens used over there i.e. ⦅newline⦆. I'd appreciate it if you could provide some explanation or documentation link for this.

@vince62s
Copy link
Member

maybe use the forum instead, and give more details like the yaml content.
https://forum.opennmt.net/latest

@kdcyberdude
Copy link
Author

My inference.yaml config file content -

transforms: [sentencepiece]

src_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"
tgt_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"

model: "Starling-LM-7B-alpha-AWQ-onmt/Starling-LM-7B-alpha-AWQ-onmt.pt"

seed: 13
max_length: 256
gpu: 0
batch_type: sents
batch_size: 60
world_size: 1
gpu_ranks: [0]

precision: fp16
beam_size: 1
n_best: 1
profile: false
report_time: true
src: None

Added the topic to the forum as well - https://forum.opennmt.net/t/device-side-assert-triggered-on-awq-mistral-converted-model/5656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants