You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have converted the TheBloke/Starling-LM-7B-alpha-AWQ model using the following command - python tools/convert_HF.py --model_dir TheBloke/Starling-LM-7B-alpha-AWQ --output ./Starling-LM-7B-alpha-AWQ-onmt/ --format pytorch --nshards 1
And I am not able to run the inference on the converted model. Getting the following error -
Command I am using to run - python translate.py --config ./Starling-LM-7B-alpha-AWQ-onmt/inference.yaml --src ./input_prompt.txt --output ./output.txt
input_prompt.txt content - GPT-4 User: How do you manage stress?<|end_of_turn|>GPT4 Assistant:
Traceback (most recent call last):
File "/mnt/sea/c2/OpenNMT-py/translate.py", line 6, in <module>
main()
File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 47, in main
translate(opt)
File "/mnt/sea/c2/OpenNMT-py/onmt/bin/translate.py", line 22, in translate
_, _ = engine.infer_file()
File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 35, in infer_file
scores, preds = self._translate(infer_iter)
File "/mnt/sea/c2/OpenNMT-py/onmt/inference_engine.py", line 159, in _translate
scores, preds = self.translator._translate(
File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 496, in _translate
batch_data = self.translate_batch(batch, attn_debug)
File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1067, in translate_batch
return self._translate_batch_with_strategy(batch, decode_strategy)
File "/mnt/sea/c2/OpenNMT-py/onmt/translate/translator.py", line 1149, in _translate_batch_with_strategy
decode_strategy.advance(log_probs, attn)
File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 432, in advance
super(BeamSearchLM, self).advance(log_probs, attn)
File "/mnt/sea/c2/OpenNMT-py/onmt/translate/beam_search.py", line 379, in advance
self.is_finished_list = self.topk_ids.eq(self.eos).tolist()
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [98,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1292: indexSelectLargeIndex: block: [228,0,0], thread: [99,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
And I have one more question - I am not able to understand the example prompts provided for the mistral model - like the tokens used over there i.e. ⦅newline⦆. I'd appreciate it if you could provide some explanation or documentation link for this.
The text was updated successfully, but these errors were encountered:
I have converted the
TheBloke/Starling-LM-7B-alpha-AWQ
model using the following command -python tools/convert_HF.py --model_dir TheBloke/Starling-LM-7B-alpha-AWQ --output ./Starling-LM-7B-alpha-AWQ-onmt/ --format pytorch --nshards 1
And I am not able to run the inference on the converted model. Getting the following error -
Command I am using to run -
python translate.py --config ./Starling-LM-7B-alpha-AWQ-onmt/inference.yaml --src ./input_prompt.txt --output ./output.txt
input_prompt.txt content -
GPT-4 User: How do you manage stress?<|end_of_turn|>GPT4 Assistant:
And I have one more question - I am not able to understand the example prompts provided for the mistral model - like the tokens used over there i.e.
⦅newline⦆
. I'd appreciate it if you could provide some explanation or documentation link for this.The text was updated successfully, but these errors were encountered: