Device side assert triggered on AWQ Mistral converted model #2562

kdcyberdude · 2024-02-14T15:00:51Z

I have converted the TheBloke/Starling-LM-7B-alpha-AWQ model using the following command -
python tools/convert_HF.py --model_dir TheBloke/Starling-LM-7B-alpha-AWQ --output ./Starling-LM-7B-alpha-AWQ-onmt/ --format pytorch --nshards 1

And I am not able to run the inference on the converted model. Getting the following error -
Command I am using to run - python translate.py --config ./Starling-LM-7B-alpha-AWQ-onmt/inference.yaml --src ./input_prompt.txt --output ./output.txt
input_prompt.txt content -
GPT-4 User: How do you manage stress?<|end_of_turn|>GPT4 Assistant:

Traceback (most recent call last):
  File "/mnt/sea/c2/OpenNMT-py/translate.py", line 6, in <module>
    main()
  File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 47, in main
    translate(opt)
  File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 22, in translate
    _, _ = engine.infer_file()
  File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 35, in infer_file
    scores, preds = self._translate(infer_iter)
  File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 159, in _translate
    scores, preds = self.translator._translate(
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 496, in _translate
    batch_data = self.translate_batch(batch, attn_debug)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1067, in translate_batch
    return self._translate_batch_with_strategy(batch, decode_strategy)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1149, in _translate_batch_with_strategy
    decode_strategy.advance(log_probs, attn)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 432, in advance
    super(BeamSearchLM, self).advance(log_probs, attn)
  File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 379, in advance
    self.is_finished_list = self.topk_ids.eq(self.eos).tolist()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

And I have one more question - I am not able to understand the example prompts provided for the mistral model - like the tokens used over there i.e. ｟newline｠. I'd appreciate it if you could provide some explanation or documentation link for this.

The text was updated successfully, but these errors were encountered:

vince62s · 2024-02-14T17:44:35Z

maybe use the forum instead, and give more details like the yaml content.
https://forum.opennmt.net/latest

kdcyberdude · 2024-02-14T19:15:25Z

My inference.yaml config file content -

transforms: [sentencepiece]

src_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"
tgt_subword_model: "Starling-LM-7B-alpha-AWQ-onmt/tokenizer.model"

model: "Starling-LM-7B-alpha-AWQ-onmt/Starling-LM-7B-alpha-AWQ-onmt.pt"

seed: 13
max_length: 256
gpu: 0
batch_type: sents
batch_size: 60
world_size: 1
gpu_ranks: [0]

precision: fp16
beam_size: 1
n_best: 1
profile: false
report_time: true
src: None

Added the topic to the forum as well - https://forum.opennmt.net/t/device-side-assert-triggered-on-awq-mistral-converted-model/5656

kdcyberdude mentioned this issue Feb 14, 2024

Support for "mistralai/Mistral-7B-Instruct-v0.1" model OpenNMT/CTranslate2#1501

Closed

vince62s closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device side assert triggered on AWQ Mistral converted model #2562

Device side assert triggered on AWQ Mistral converted model #2562

kdcyberdude commented Feb 14, 2024

vince62s commented Feb 14, 2024

kdcyberdude commented Feb 14, 2024

Device side assert triggered on AWQ Mistral converted model #2562

Device side assert triggered on AWQ Mistral converted model #2562

Comments

kdcyberdude commented Feb 14, 2024

vince62s commented Feb 14, 2024

kdcyberdude commented Feb 14, 2024