Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking high priority models #145

Closed
11 tasks done
gyulaz-htec opened this issue Oct 24, 2023 · 12 comments
Closed
11 tasks done

Checking high priority models #145

gyulaz-htec opened this issue Oct 24, 2023 · 12 comments
Assignees

Comments

@gyulaz-htec
Copy link

gyulaz-htec commented Oct 24, 2023

@gyulaz-htec
Copy link
Author

gyulaz-htec commented Oct 24, 2023

Yolov5

The model passes, but I will leave the repro steps to get the onnx model:

  1. clone repo
  2. check export.py for additional requirements
  3. generate model with python3 export.py --weights yolov5s.pt --include onnx

@gyulaz-htec
Copy link
Author

Vicuna

vicuna-7b-v1.5 passes. Repro steps:

  1. https://github.com/lm-sys/FastChat points to huggingface so optimum can be used
  2. optimum-cli export onnx --model lmsys/vicuna-7b-v1.5 vicuna-7b-v_1_5
    This fails on our available MI210 machines because we run out of RAM during wight processing, but the onnx file is generated in decoder_model.onnx, which can be compiled

The same applies to vicuna-7b-v1.5-16k

@gyulaz-htec
Copy link
Author

gyulaz-htec commented Oct 24, 2023

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna
Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h
We still have to look into https://github.com/ggerganov/llama.cpp

@gyulaz-htec
Copy link
Author

gyulaz-htec commented Oct 24, 2023

Stable Diffusion 2.1

Optimum command to get stable-diffusion-2-1: optimum-cli export onnx --model stabilityai/stable-diffusion-2-1 ./stable-diffusion-2-1
The models successfully compile with the following commands:

migraphx-driver compile sd_2-1/vae_decoder/model.onnx --input-dim @latent_sample 2 4 64 64 --gpu
migraphx-driver compile sd_2-1/vae_encoder/model.onnx --input-dim @sample 2 3 512 512 --gpu
migraphx-driver compile sd_2-1/unet/model.onnx --input-dim @sample 2 4 64 64 @timestep 1 @encoder_hidden_states 2 64 1024 --fp16

@gyulaz-htec gyulaz-htec changed the title Check models AMD interested in Check high priority models Oct 24, 2023
@gyulaz-htec gyulaz-htec changed the title Check high priority models Checking high priority models Oct 24, 2023
@music-dino
Copy link

music-dino commented Oct 25, 2023

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/
Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

@gyulaz-htec
Copy link
Author

GPT-J 6B

Optimum command: optimum-cli export onnx --model EleutherAI/gpt-j-6B gpt-j/
The model compiles with the following migraphx command: migraphx-driver compile optimum_models/gpt-j/decoder_model.onnx --fill1 input_ids attention_mask --input-dim @input_ids 1 64 --input-dim @attention_mask 1 64

@gyulaz-htec
Copy link
Author

Inception v3

The model compiles with migraphx-driver.
To generate the onnx model from pytorch hub model use this python script

@gyulaz-htec
Copy link
Author

RetinaNet

The model compiles with migraphx-driver.
To generate the onnx model (with ResNet50 backbone) from pytorch hub model use this python script

@gyulaz-htec
Copy link
Author

gyulaz-htec commented Oct 31, 2023

MLSR

We checked two of the top trending SR models from huggingface A2N and AWSRN-BAM .Both are having 2, 3 and 4 scale versions.
These are only avaiable in pytorch version, the download and conversion scripts: A2N, AWSRN

Results

  • AWSRN-BAM models - failing with:
    migraphx-driver: /code/AMDMIGraphX/src/targets/gpu/lowering.cpp:76: void migraphx::gpu::miopen_apply::check_shape(shape, instruction_ref): Assertion 'x == i->get_shape()' failed.
    Which comes from here
    • The failing instruction (i) is a reshape with shape (1, 3, 170, 170)
    • x is non-standard, i is standard
    • The model compiles when add_reshape_lazy_op is disabled
    • lowering log
  • A2N scale 2 - passing
  • A2N scale 3 - passing
  • A2N scale 4 - Fails with:
MIOpen(HIP): Error [Do] 'amd_comgr_do_action(kind, handle, in.GetHandle(), out.GetHandle())' AMD_COMGR_ACTION_ASSEMBLE_SOURCE_TO_RELOCATABLE: ERROR (1
)                                                                                                                                                     
MIOpen(HIP): Error [BuildAsm] comgr status = ERROR (1)                                                                                                
MIOpen(HIP): Warning [BuildAsm] warning: argument unused during compilation: '-nogpulib' [-Wunused-command-line-argument]                             
<instantiation>:3:13: error: Error: Immediate offset is too large for buffer_load instruction                                                         
            .error "Error: Immediate offset is too large for buffer_load instruction"                                                                 
            ^                                                                                                                                         
<instantiation>:25:9: note: while in macro instantiation                                                                                              
        .single_vload line_base, s_off, mbufs_cnt_A, 2, 1                                                                                             
        ^                                                                                                                                             
<instantiation>:17:13: note: while in macro instantiation                                                                                             
            .load_input_line line_base, s_off, mbufs_cnt_A                                                                                            
            ^                                                                                                                                         
<instantiation>:1:1: note: while in macro instantiation                                                                                               
.rept input_lines_per_sgpr                                                                                                                            
^                                                                                                                                                     
<instantiation>:5:5: note: while in macro instantiation
    .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
    ^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
  .load_input linesA, mbufs_cnt_A
  ^
<instantiation>:31:1: error: unmatched .ifs or .elses

^
<instantiation>:25:9: note: while in macro instantiation
        .single_vload line_base, s_off, mbufs_cnt_A, 2, 1
        ^
<instantiation>:17:13: note: while in macro instantiation
            .load_input_line line_base, s_off, mbufs_cnt_A
            ^
<instantiation>:1:1: note: while in macro instantiation
.rept input_lines_per_sgpr
^
<instantiation>:5:5: note: while in macro instantiation
    .load_input_lines_on_same_sgpr input_lines_per_sgpr, mbufs_cnt_A
    ^
/tmp/comgr-acc642/input/conv3x3.s:1605:3: note: while in macro instantiation
  .load_input linesA, mbufs_cnt_A
  ^

MIOpen Error: /long_pathname_so_that_rpms_can_package_the_debug_info/src/extlibs/MLOpen/src/hipoc/hipoc_program.cpp:304: Code object build failed. Sou
rce: conv3x3.s                                                                                                                                        
terminate called after throwing an instance of 'migraphx::version_2_9_0::exception'                                                                   
  what():  /code/AMDMIGraphX/src/targets/gpu/include/migraphx/gpu/miopen.hpp:114: find_solution: MIOpen: miopenFindSolutions failed 

The convolution details:

x=[1, 24, 340, 340]
w=[24, 24, 3, 3]
p=[1, 1, 1, 1]
s=[1, 1]
d=[1, 1]
g=1
workspace_size=99878400

Which hits the offset limit: https://github.com/ROCmSoftwarePlatform/MIOpen/blob/develop/src/kernels/conv3x3.s#L463-L466

@kahmed10
Copy link

kahmed10 commented Nov 7, 2023

Whisper

Optimum command: optimum-cli export onnx --model openai/whisper-large whisper/ Optimum fails toward the end, but the model gets generated successfully and can be compiled with MIGraphX.

Did not see this error on my end. And I got both an encoder model and decoder model.
To run the encoder model (using whisper-tiny as example):
./bin/driver perf /onnx/whisper-tiny/encoder_model.onnx --input-dim @input_features 1 80 3000
To run the decoder model:
./bin/driver perf /onnx/whisper-tiny/decoder_model.onnx --fill1 input_ids --input-dim @input_ids 1 256 @encode_hidden_states 1 256 384

@attila-dusnoki-htec
Copy link

attila-dusnoki-htec commented Nov 22, 2023

Stable Diffusion 2.1

These models fail with the latest develop on MI200

With reshape lazy enabled:​

  • Text Encoder, UNet, VAE-Decoder​

    • src/include/migraphx/op/reshape_lazy.hpp:238: static_compute_shape: reshape_lazy on axis that is not packed.​
  • Without reshape lazy​

    • Text Encoder compiles​

    • VAE-Decoder, UNet (ref version compiles)​

      • check_shapes.hpp:296: packed_layouts: gpu::convolution: Shapes are not packed with correct layout

@attila-dusnoki-htec
Copy link

attila-dusnoki-htec commented Nov 27, 2023

LLaMA-2 7B

Huggingface Llama-2-7b-hf and Llama-2-7b-chat-hf are compiling, however the export fails similarly to vicuna Optimum command to get the model: optimum-cli export onnx --model meta-llama/Llama-2-7b-hf ./Llama-2-7b-h We still have to look into https://github.com/ggerganov/llama.cpp

Ignore this, since it is a decoder, it will generate it one-by-one, make sense to use {1, 1} shape.
It compiles without any arguments. But with that input_ids and attn_mask will be {1, 1}.

Changing it to e.g. {1, 4096} (the largest supported size) will fail with
migraphx-driver compile model_zoo/llama2-7b-hf/decoder_model.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096

operator: MatMul
/code/AMDMIGraphX/src/include/migraphx/op/dot.hpp:93: compute_shape: DOT: static inner dimensions do not match: {1, 32, 1, 4096} x {1, 32, 1, 128}

The microsoft version fails with the following:

migraphx-driver read 7B_float32/ONNX/LlamaV2_7B_float32.onnx --input-dim @x 1 2048 4096 @k_cache 1 32 2048 32 128 @v_cache 1 32 2048 32 128 @pos 1 @attn_mask 1 2048 2048

operator: Slice
/code/AMDMIGraphX/src/include/migraphx/check_shapes.hpp:157: only_dims: SLICE: inputs (starts, ends, and input_axes): Only 1d supported

The onnxruntime converted version:

migraphx-driver compile llama2-7b-hf-ort/rank_0_Llama-2-7b-hf_decoder_merged_model_fp32_opt.onnx --input-dim @input_dims 1 4096 @attention_mask 1 4096

operator: Add
/code/AMDMIGraphX/src/common.cpp:48: operator(): COMPUTE_BROADCASTLEN: shape {1, 1, 1, 4096} and {1, 1, 1, 2} mismatch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🎉 Done
Development

No branches or pull requests

4 participants