Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load data for backend XnnpackBackend #3848

Open
allzero-kwon opened this issue Jun 5, 2024 · 6 comments
Open

Failed to load data for backend XnnpackBackend #3848

allzero-kwon opened this issue Jun 5, 2024 · 6 comments
Assignees
Labels
module: examples Issues related to demos under examples directory

Comments

@allzero-kwon
Copy link

allzero-kwon commented Jun 5, 2024

I'm trying to build llava_encoder using XNNPack & Android ToolChain.
Whenever i attempt to pass inputs to my model, it failed to delegate data to XNNBackend.

I've already checked .pte model working using xnn_executor_runner in linux (below command)

./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=./xnn_llava_encoder.pte

and the input tensor shape was same with android input tensor. (1,3,336,336).

If anyone can help, I would greatly appreciate it.

type=1327 audit(1717601983.365:799): proctitle="com.example.executorchdemo"
File /data/user/0/com.example.executorchdemo/files/xnn_llava_encoder.pte: offset 5320704 + size 4198960 > file_size_ 8912896
Failed to load data for backend XnnpackBackend
Process: com.example.executorchdemo, PID: 886
   java.lang.Exception: Execution of method forward failed with status 0x12
                             	at org.pytorch.executorch.NativePeer.forward(Native Method)
                                at org.pytorch.executorch.Module.forward(Module.java:56)
                                at com.example.executorchdemo.MainActivity.run(MainActivity.java:199)
                                at java.lang.Thread.run(Thread.java:1012)

<MainActivity.java>

mBitmap = Bitmap.createScaledBitmap(mBitmap, 336, 336, true);
final Tensor inputTensor =
        TensorImageUtils.bitmapToFloat32Tensor(
            mBitmap,
            TensorImageUtils.TORCHVISION_NORM_MEAN_RGB,
                TensorImageUtils.TORCHVISION_NORM_STD_RGB);
Tensor outputTensor = mModule.forward(EValue.from(inputTensor))[0].toTensor();
@iseeyuan
Copy link
Contributor

iseeyuan commented Jun 6, 2024

@allzero-kwon Thanks for the feedback! How did you export xnn_llava_encoder.pte, could you share the command?

@allzero-kwon
Copy link
Author

@iseeyuan

I exported .pte with XnnpackPartitioner following this guide.

Add below lines to llava_encoder and do python3 examples/models/llava_encoder/model.py


model = LlavaModel()
with torch.no_grad(): 
    print('# 1. torch.export: Defines the program with the ATen operator set.')
    exported_program = export(model.get_eager_model(), model.get_example_inputs())

    print('# 2. to_edge: Make optimizations for Edge devices')
    edge: EdgeProgramManager = to_edge(exported_program)

    edge = edge.to_backend(XnnpackPartitioner())
    print(edge.exported_program().graph_module)



    print('# 3. to_executorch: Convert the graph to an ExecuTorch program')
    exec_prog = edge.to_executorch()


    print('# 4. Save the compiled .pte program')
    with open(f"xnn_{_model_name}_encoder.pte", "wb") as file:
        # file.write(exec_prog.buffer)
        exec_prog.write_to_file(file)

@allzero-kwon
Copy link
Author

allzero-kwon commented Jun 7, 2024

@iseeyuan I resolved it ! I exported model with aot_compiler script and got an result.
I guess there are some differences between aot_compiler and XnnpackPartitioner.
It looks # of subgraphs from xnnpack_partitioner are different..

BTW, I have another problem on inference.
As you know, llava-1.5-7b use 'openai/clip-vit-large-patch14-336' as vision_tower backborn model.
Because of size (i guess), it took almost 1min for inference with XNNPack backend, so i tried to quantize it and failed again. :(

RuntimeError: Model llava_encoder is not a valid name. or not quantizable right now, please contact executorch team if you want to learn why or how to support quantization for the requested modelAvailable models are ['linear', 'add', 'add_mul', 'dl3', 'ic3', 'ic4', 'mv2', 'mv3', 'resnet18', 'resnet50', 'vit', 'w2l', 'edsr', 'mobilebert', 'llama2'].

@iseeyuan
Copy link
Contributor

@allzero-kwon Glad you've resolved it! Yes, export should be used to get the graph.

@iseeyuan
Copy link
Contributor

@allzero-kwon could you share how you did the quantization?

@digantdesai digantdesai added the module: examples Issues related to demos under examples directory label Jun 13, 2024
@allzero-kwon
Copy link
Author

I tried with python -m examples.xnnpack.aot_compiler --model_name="llava_encoder" --delegate --quantize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: examples Issues related to demos under examples directory
Projects
None yet
Development

No branches or pull requests

4 participants