Checking high priority models #145

gyulaz-htec · 2023-10-24T08:22:37Z

gyulaz-htec · 2023-10-24T09:07:20Z

Yolov5

The model passes, but I will leave the repro steps to get the onnx model:

clone repo
check export.py for additional requirements
generate model with python3 export.py --weights yolov5s.pt --include onnx

gyulaz-htec · 2023-10-24T11:02:34Z

Vicuna

vicuna-7b-v1.5 passes. Repro steps:

https://github.com/lm-sys/FastChat points to huggingface so optimum can be used
optimum-cli export onnx --model lmsys/vicuna-7b-v1.5 vicuna-7b-v_1_5
This fails on our available MI210 machines because we run out of RAM during wight processing, but the onnx file is generated in decoder_model.onnx, which can be compiled

The same applies to vicuna-7b-v1.5-16k

gyulaz-htec · 2023-10-24T12:26:57Z

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna
Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h
We still have to look into https://github.com/ggerganov/llama.cpp

gyulaz-htec · 2023-10-24T13:01:39Z

Stable Diffusion 2.1

Optimum command to get stable-diffusion-2-1: optimum-cli export onnx --model stabilityai/stable-diffusion-2-1 ./stable-diffusion-2-1
The models successfully compile with the following commands:

migraphx-driver compile sd_2-1/vae_decoder/model.onnx --input-dim @latent_sample 2 4 64 64 --gpu
migraphx-driver compile sd_2-1/vae_encoder/model.onnx --input-dim @sample 2 3 512 512 --gpu
migraphx-driver compile sd_2-1/unet/model.onnx --input-dim @sample 2 4 64 64 @timestep 1 @encoder_hidden_states 2 64 1024 --fp16

music-dino · 2023-10-25T08:40:18Z

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/
Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

gyulaz-htec · 2023-10-25T12:22:22Z

GPT-J 6B

Optimum command: optimum-cli export onnx --model EleutherAI/gpt-j-6B gpt-j/
The model compiles with the following migraphx command: migraphx-driver compile optimum_models/gpt-j/decoder_model.onnx --fill1 input_ids attention_mask --input-dim @input_ids 1 64 --input-dim @attention_mask 1 64

gyulaz-htec · 2023-10-30T13:05:16Z

Inception v3

The model compiles with migraphx-driver.
To generate the onnx model from pytorch hub model use this python script

gyulaz-htec · 2023-10-30T13:54:21Z

RetinaNet

The model compiles with migraphx-driver.
To generate the onnx model (with ResNet50 backbone) from pytorch hub model use this python script

gyulaz-htec · 2023-10-31T11:07:05Z

MLSR

We checked two of the top trending SR models from huggingface A2N and AWSRN-BAM .Both are having 2, 3 and 4 scale versions.
These are only avaiable in pytorch version, the download and conversion scripts: A2N, AWSRN

Results

AWSRN-BAM models - failing with:
migraphx-driver: /code/AMDMIGraphX/src/targets/gpu/lowering.cpp:76: void migraphx::gpu::miopen_apply::check_shape(shape, instruction_ref): Assertion 'x == i->get_shape()' failed.
Which comes from here
- The failing instruction (i) is a reshape with shape (1, 3, 170, 170)
- x is non-standard, i is standard
- The model compiles when add_reshape_lazy_op is disabled
- lowering log
A2N scale 2 - passing
A2N scale 3 - passing
A2N scale 4 - Fails with:

MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE: ERROR (1
)                                                                                                                                                     
MIOpen(HIP): Error [BuildAsm] comgr status = ERROR (1)                                                                                                
MIOpen(HIP): Warning [BuildAsm] warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument]                             
<instantiation>:3:13: error: Error: Immediate offset is too large for buffer_load instruction                                                         
            .error "Error: Immediate offset is too large for buffer_load instruction"                                                                 
            ^                                                                                                                                         
<instantiation>:25:9: note: while in macro instantiation                                                                                              
        .single_vload line_base, s_off, mbufs_cnt_A, 2, 1                                                                                             
        ^                                                                                                                                             
<instantiation>:17:13: note: while in macro instantiation                                                                                             
            .load_input_line line_base, s_off, mbufs_cnt_A                                                                                            
            ^                                                                                                                                         
<instantiation>:1:1: note: while in macro instantiation                                                                                               
.rept input_lines_per_sgpr                                                                                                                            
^                                                                                                                                                     
<instantiation>:5:5: note: while in macro instantiation
    .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
    ^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
  .load_input linesA, mbufs_cnt_A
  ^
<instantiation>:31:1: error: unmatched .ifs or .elses

^
<instantiation>:25:9: note: while in macro instantiation
        .single_vload line_base, s_off, mbufs_cnt_A, 2, 1
        ^
<instantiation>:17:13: note: while in macro instantiation
            .load_input_line line_base, s_off, mbufs_cnt_A
            ^
<instantiation>:1:1: note: while in macro instantiation
.rept input_lines_per_sgpr
^
<instantiation>:5:5: note: while in macro instantiation
    .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
    ^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
  .load_input linesA, mbufs_cnt_A
  ^

MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/hipoc/hipoc_program.cpp:304: Code object build failed. Sou
rce: conv3x3.s                                                                                                                                        
terminate called after throwing an instance of 'migraphx::version_2_9_0::exception'                                                                   
  what():  /code/AMDMIGraphX/src/targets/gpu/include/migraphx/gpu/miopen.hpp:114: find_solution: MIOpen: miopenFindSolutions failed

The convolution details:

x=[1, 24, 340, 340]
w=[24, 24, 3, 3]
p=[1, 1, 1, 1]
s=[1, 1]
d=[1, 1]
g=1
workspace_size=99878400

Which hits the offset limit: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/src/kernels/conv3x3.s#L463-L466

kahmed10 · 2023-11-07T23:15:10Z

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

Did not see this error on my end. And I got both an encoder model and decoder model.
To run the encoder model (using whisper-tiny as example):
./bin/driver perf /onnx/whisper-tiny/encoder_model.onnx --input-dim @input_features 1 80 3000
To run the decoder model:
./bin/driver perf /onnx/whisper-tiny/decoder_model.onnx --fill1 input_ids --input-dim @input_ids 1 256 @encode_hidden_states 1 256 384

attila-dusnoki-htec · 2023-11-22T13:23:47Z

Stable Diffusion 2.1

These models fail with the latest develop on MI200

With reshape lazy enabled:

Text Encoder, UNet, VAE-Decoder
- src/include/migraphx/op/reshape_lazy.hpp:238: static_compute_shape: reshape_lazy on axis that is not packed.
Without reshape lazy
- Text Encoder compiles
- VAE-Decoder, UNet (ref version compiles)
  - check_shapes.hpp:296: packed_layouts: gpu::convolution: Shapes are not packed with correct layout

attila-dusnoki-htec · 2023-11-27T13:51:23Z

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

Ignore this, since it is a decoder, it will generate it one-by-one, make sense to use {1, 1} shape.
~~It compiles without any arguments. But with that input_ids and attn_mask will be {1, 1}.~~

~~Changing it to e.g. {1, 4096} (the largest supported size) will fail with~~
~~migraphx-driver compile model_zoo/llama2-7b-hf/decoder_model.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096~~

~~operator: MatMul~~
~~/code/AMDMIGraphX/src/include/migraphx/op/dot.hpp:93: compute_shape: DOT: static inner dimensions do not match: {1, 32, 1, 4096} x {1, 32, 1, 128}~~

The microsoft version fails with the following:

migraphx-driver read 7B_float32/ONNX/LlamaV2_7B_float32.onnx --input-dim @x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048

operator: Slice
/code/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:157: only_dims: SLICE: inputs (starts, ends, and input_axes): Only 1d supported

The onnxruntime converted version:

migraphx-driver compile llama2-7b-hf-ort/rank_0_Llama-2-7b-hf_decoder_merged_model_fp32_opt.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096

operator: Add
/code/AMDMIGraphX/src/common.cpp:48: operator(): COMPUTE_BROADCASTLEN: shape {1, 1, 1, 4096} and {1, 1, 1, 2} mismatch!

gyulaz-htec assigned music-dino and gyulaz-htec Oct 24, 2023

gyulaz-htec changed the title ~~Check models AMD interested in~~ Check high priority models Oct 24, 2023

gyulaz-htec changed the title ~~Check high priority models~~ Checking high priority models Oct 24, 2023

gyulaz-htec mentioned this issue Oct 25, 2023

SplitToSequence operator is unsupported #130

Open

attila-dusnoki-htec closed this as completed Nov 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checking high priority models #145

Checking high priority models #145

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023 •

edited by attila-dusnoki-htec

Loading

music-dino commented Oct 25, 2023 •

edited

Loading

gyulaz-htec commented Oct 25, 2023

gyulaz-htec commented Oct 30, 2023

gyulaz-htec commented Oct 30, 2023

gyulaz-htec commented Oct 31, 2023 •

edited by attila-dusnoki-htec

Loading

kahmed10 commented Nov 7, 2023

Whisper

attila-dusnoki-htec commented Nov 22, 2023 •

edited

Loading

attila-dusnoki-htec commented Nov 27, 2023 •

edited

Loading

LLaMA-2 7B

Checking high priority models #145

Checking high priority models #145

Comments

gyulaz-htec commented Oct 24, 2023 • edited Loading

gyulaz-htec commented Oct 24, 2023 • edited Loading

Yolov5

gyulaz-htec commented Oct 24, 2023

Vicuna

gyulaz-htec commented Oct 24, 2023 • edited Loading

LLaMA-2 7B

gyulaz-htec commented Oct 24, 2023 • edited by attila-dusnoki-htec Loading

Stable Diffusion 2.1

music-dino commented Oct 25, 2023 • edited Loading

Whisper

gyulaz-htec commented Oct 25, 2023

GPT-J 6B

gyulaz-htec commented Oct 30, 2023

Inception v3

gyulaz-htec commented Oct 30, 2023

RetinaNet

gyulaz-htec commented Oct 31, 2023 • edited by attila-dusnoki-htec Loading

MLSR

kahmed10 commented Nov 7, 2023

Whisper

attila-dusnoki-htec commented Nov 22, 2023 • edited Loading

Stable Diffusion 2.1

attila-dusnoki-htec commented Nov 27, 2023 • edited Loading

LLaMA-2 7B

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023 •

edited

Loading

gyulaz-htec commented Oct 24, 2023 •

edited by attila-dusnoki-htec

Loading

music-dino commented Oct 25, 2023 •

edited

Loading

gyulaz-htec commented Oct 31, 2023 •

edited by attila-dusnoki-htec

Loading

attila-dusnoki-htec commented Nov 22, 2023 •

edited

Loading

attila-dusnoki-htec commented Nov 27, 2023 •

edited

Loading