-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Android app - Error - Attempted to resize a static tensor to a new shape at dimension 0 #1350
Comments
The difference between Lite Interpreter (Pytorch Mobile) and ExecuTorch is that, in Executorch, we plan memory ahead of time, which can help us to re-use/reduce memory usage in runtime. What is the dynamic part in the origin pytorch model? Can the dynamic part be upper bound? Reference doc: https://pytorch.org/executorch/stable/compiler-memory-planning.html |
@cccclai Thanks for the response. Please bear with me as I am a beginner with Pytorch (my background is java and Android application development). I did not create the model I am using (https://github.com/sharonrichushaji/trajectory-prediction-transformers/tree/master). I modified slightly to accommodate my datasets (produced by my Android application). When you ask about the dynamic part of the model, could you please clarify? Dynamic with respect to which variables/parts of the model? The model is an attention-based Transformer Network. Depending on what you mean by dynamic, being an encoder/decoder model the input from one part of the model to another is dynamic. I read the document you referenced in your message. Would it make sense for me to use the Thanks |
What this means is that the inputs you provide will be a) copied to the memory planned for it during memory planning pass if IO was part of memory planning OR b) will not be copied since memory planning did not plan for this. By default IO is planned and hence if follow this, https://github.com/pytorch/executorch/blob/main/examples/demo-apps/android/jni/jni_layer.cpp#L345, you will see that the output returned from the executor is references directly and there is a comment on the lifetime of the pointer referenced by the output tensor. Now with respect to dynamic size. THere are a couple of things.
|
@kimishpatel Thanks! a) It looks like I may have a problem as XNNPACK does not support shape dynamism as input sequences in the model I use are of varying length. b) Your assumption is correct. I expected to be able to provide input of varying size c) Given what @cccclai said in his comment regarding the difference between pytorch mobile and executorch, and also based on what you said in your point 1. above, I will try the torch.export.dynamic_dim() API. d) Why is delegation (e.g. via XNNPACK) necessary in order to lower a model onto an edge device using executorch? Sorry for bringing it up again. With pytorch mobile it was not (unless I am mistaken). Is there an alternative to using XNNPACK? Thanks |
Delegation is for delegate part of or the whole model to some powerful backends on device. Different edge devices may have different backends. XNNPACK(https://github.com/google/XNNPACK) is the one of the most powerful backends on CPU. For example, on iOS there will be some other powerful backends (https://github.com/pytorch/executorch/tree/main/backends/apple) like coreml and mps. Qualcomm chipset might too. In PyTorch Mobile, XNNPACK is pretty much like a default backend and it runs after we call |
@cccclai Thanks. |
@adonnini to provide some context, in PyTorch mobile, the With ExecuTorch, this process is a bit more involved. Essentially a model is represented by default using the Edge IR. However, since XNNPACK is a powerful library, we provide a delegate which will consume the Edge IR and convert the model to XNNPACK's representation. The converted graph can then be executed using XNNPACK. Essentially, ExecuTorch provides more control over how your model executes. Regarding your initial issue, would you mind sharing how you produced your model? As mentioned before, XNNPACK doesn't support dynamic shapes. |
@SS-JIA Here is a link to the model I use: |
Hi, after fixing dynamic_dim error (#1379) with @angelayi greatly appreciated help, I tried once again to use the model for inference in my Android app. Unfortunately the result was once again a Please let me know if you need me do do anything and what I should do next. TRACEBACK LOG (IT'S LONG. i INCLUDED ALL MESSAGES RELATED TO THE FAILURE RATHER THAN ASSUMING WHAT IS RELEVANT)
|
@SS-JIA in your comment above you state: My model does use dynamic shapes, and I was able to run it for inference successfully from my Android application using the PyTorch Mobile (skipping the optimization step) runtime engine. If I was able to run my model successfully using PyTorch Mobile because I skipped the optimization step, then why is there not a way to skip optimization when using Executorch? This would seem to be a reasonable option to have. As far as I know, models with dynamic shapes are not the exception. How will it be possible (when?) to run models with dynamic shapes on Android devices using the Executorch runtime engine? If the answer to both questions above is negative, Then it looks like I will not be able to use Executorch for my models. That would be really too bad. Please let me know if I misunderstood your comment and if I am missing something. Thanks |
@SS-JIA would it help if I sent you the .pte file produced when training my model using executorch? Also, Here is a link to the model I use: I hope you will have the time to let me know how I should proceed. Thanks |
@mcr229 can you take a look and see the dynamic shape support issue in xnnpack |
HI @adonnini XNNPACK Delegate currently can only support taking in inputs with static shapes. We are actively working on upstreaming dynamic shape support to XNNPACK, and once finished, we will be able to leverage this by updating our XNNPACK commit. |
Thanks for the update. As far as you can tell at the moment, is it a matter of weeks before you will update the XNNPACK commit? Just so that I can plan accordingly. Thanks On Feb 14, 2024 6:08 PM, Max Ren ***@***.***> wrote:
HI @adonnini XNNPACK Delegate currently can only support taking in inputs with static shapes. We are actively working on upstreaming dynamic shape support to XNNPACK, and once finished, we will be able to leverage this by updating our XNNPACK commit.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
We expect to have this ready within the next two weeks. |
Thanks!On Feb 14, 2024 6:29 PM, Max Ren ***@***.***> wrote:
We expect to have this ready within the next two weeks.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@adonnini I believe ExecuTorch does upper bounded memory planning and I know that XNNPACK delegate does as well. While I'm not entirely sure how executorch will do with very large max values with respect to memory planning. The XNNPACK delegated portions will use the upper bound to do initial memory planning. XNNPACK will actually be able to go above the maximum value, however this will come at the cost of some performance as we will reallocate memory for the new larger amount at that inference. I fear using too large a maximum value for XNNPACK as it may throw errors with not enough memory as it tries to allocate memory for extremely large intermediate tensors. So I would say to try to put the most realistic maximum tensor size. cc. @JacobSzwejbka, @cccclai, @larryliu0820 for the dynamic memory planning |
@mcr229 after adding the dynamic shapes code, execution failed producing the traceback log reported below. For your reference, below you will also find the code that produced the failure. The executorch related code is inserted in the training epoch loop. It runs after execution of a training step which based on the logs ran successfully. I am pointing this out because I find this line in the traceback log puzzling (Not surprising):
Here is a print of their shapes:
Probably, I just don't understand the error statement. Please let me know what I should do next, and if you need any additional information. Thanks TRACEBACK LOG
CODE
|
@adonnini the statement means that a guard was generated during export that checks to ensure that
In the logs search for "guard added" and you should be able to see which line of model source code generated this guard. |
@tarun292 I will do as you ask and let you know what I find.
I don't understand why this check would be done/enabled in the first place since, unless I am mistaken, it is not the case that Thanks |
@adonnini before or after (near this log) there should be another print indicating which source line generated this guard. Are you able to see that? |
I think I may have found the source of this particular problem. Please bear with me. Beofre working with executorch I used torchscript to run the model for inference from my Android app using Pytorch Mobile. In order to do that I had to change this line:
After making this change, I was able to use torchscript to create a lowered model. Please note that model training and validation works equally well with either of the above lines enabled. I think the guard error is caused by the above line (the one I had to change) since the error log reports that the error occurred at the line I changed (see above):
To support this conclusion, if I change back to the original line: Code execution fails producing the error log you will find below which is different from the one I reported in this issue. Sorry about this. Please let me know if you have any questions, or need me to do anything else, and what I should do next. Thanks ERROR LOG guard addedI0528 16:24:03.115558 139740707194688 torch/fx/experimental/symbolic_shapes.py:4035] [0/0] eval Ne(5000, s0) [guard added] at Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/model.py:355 in forward (_dynamo/utils.py:1764 in run_node)
|
Nice that's a good sign. Now to get past this can you try replacing:
with
|
@tarun292 I did as you suggested. TRACEBACK LOG
CODE
EXECUTION LOG PRECEDING TRACEBACK LOG
|
@adonnini i don't see the actual place where this guard would have been generated. Can you share the whole log output? Is it also possible to share the repro for this, or is that very hard? |
@tarun292 Below you will find the entire log produced by the execution. The repo (I think you meant repo not repro?) for the model is at I posted the executorch related code I use in my comment about under the heading "CODE" Please let me know if you need additional information, or need me to do anything. Thanks EXECUTION LOG
|
@adonnini i guess what i meant was is there a way we can repro this on our end? Would be faster to debug that way. |
@tarun292 Your are right. The reason I gave you the link to the repo of the model I am using was for you to add the exeutorch code I use to the train.py module and run the module as instructed in the repo's read,md page. I can send you my executorch related code again, tell you where I placed it, and let you know about a couple of other changes I made to train.py, or I could simply send you a copy of train.py, which I renamed train-minimum.py, I am using. Would this not work? Another way of reproducing my set-up I can think of is for me to create a repo with the code and give you access to it. Did you have something else (simpler) in mind? Please let me know. |
@adonnini i think sending a copy of train-minimum.py should be good enough and the instructions to run it. |
@tarun292 I set up a public repository with a copy of my set-up. Here is a link to the README.MD Please let me know if you have problems accessing the repository or have any other problems. As I say in the Notes section, please keep in mind that I come from the Java world and am a Python beginner. I hope this helps. Thanks |
@tarun292 I hope I am not bothering you too much. I know you are busy (truly I know). When do you think you will get a chance to take a look at the repository with my set-up and the error? |
@adonnini definitely appreciate your patience, i haven't forgotten about this issue. I'll take a look at it this weekend for sure. |
@tarun292 Thanks! |
@Tarum please let me know if you have had any problems in using the code in the repository I set up. Thanks |
@tarun292 Will you have time to take a look at this issue in the next days? As you may remember, I set up a public repository with a copy of my set-up. Here is a link to the README.MD Please let me know if you have problems accessing the repository or have any other problems. As I say in the Notes section, please keep in mind that I come from the Java world and am a Python beginner. I hope this helps. Thanks |
@adonnini sorry for the delay. Yes i will definitely take a look at it this week. |
@tarun292 Sorry to bother you again. Do you think you will have time to take a look at this issue in the coming week? I would really appreciate it. |
@adonnini apologies for the delay. Last few weeks have been hectic. I finally got time to clone your repro and give it a try and ran into the following issue.
I added a |
@tarun292 Sorry. The datasets folder needs to be replaced. I have a zip archive containing the replacement datasets folder (~10.6 MB) |
When you run the code with the replacement datasets folder code execution should fail producing the traceback log file reported below. TRACEBACK LOG FILE
|
@tarun292, I hope I am not being too much of a nuisance. Given how busy you are, when do you think you'll get a chance to try again to run the model (after replacing the datasets folder)? |
yep i can confirm that fixes the issue and i can repro the actual export issue. Will let you know once i have more insights into what the issue is. |
Thanks! I appreciate it |
@tarun292 , Sorry to bother you again. It's been over a month since we last connected. (When) will you be able to take another look at this issue? I am still waiting for its resolution in order to be able to proceed with some of my work. |
@tarun292 Again bugging you. Sorry. Please let me know if you will not be able to work on this issue. Thanks |
My Android application fails with
Attempted to resize a static tensor to a new shape at dimension 0
error. Please find the full logcat below.The shape of input datasets in my model is not static. Specifically, the number of steps in any one sequence varies.
here is the code I use to define the input dataset for the model in the Android application:
where 4 is the number of features and
tmpData.length
is the size of the input dataset (with n rows and 4 columns)here is the code I use to run inference:
when I run inference on my model processed with torchscript and processed using pytorch mobile. I produce the input dataset as follows:
and run inference as follows:
This works producing reasonable results.
I would appreciate any thoughts as to what is causing the problem, and how I might go about fixing it.
Thanks
LOGCAT
The text was updated successfully, but these errors were encountered: