Replies: 1 comment
-
|
@psiddh Do you have any suggestions? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi guys
I am trying to follow along the raspberry pi pico2 demo from last week (https://github.com/pytorch/executorch/tree/main/examples/raspberry_pi/pico2) but with the modification that I want to quantized the model to int8
First below you can see that I first try to get the libquantized_ops_aot_lib.so with the steps provided in the links that are in the README (https://github.com/pytorch/executorch/blob/main/examples/raspberry_pi/pico2/README.md) for quantization
Then I create a new script export_mlp_mnist_int8.py that exports a file called balanced_tiny_mlp_mnist_quantized.pte that I will use to compile with build_firmware_pico.sh. This will be quantized model compiled into a pico2 .uf2 file
Below you are going to see that I can build the original demo and execute on my pico2. However to prove my point I changed the allocator size to 120KB instead of the 200KB that is shown in the git hub main.cpp file (https://github.com/pytorch/executorch/blob/main/examples/raspberry_pi/pico2/main.cpp#L358)
I execute the DEMO with the unquantized model balanced_tiny_mlp_mnist.pte and everything works proving that there is only need for 120KB for allocator size
Then I try to compile the balanced_tiny_mlp_mnist_quantized.pte still with the same allocator size of 120KB and does not load the model because of insufficient
memory error 33 in decimal, 0x21 hex
/// Could not allocate the requested memory.
MemoryAllocationFailed = 0x21,
Then I increase the allocator size to 280KB and recompile the main.cpp and now it works
Why did the memory allocator utilization increased and doubled? I thought with quantization both the ROM size and RAM size will reduce
Below are all my steps. Thanks for helping me understand why the memory usage increase and maybe I am doing some wrong on the quantization steps below
cd pico/executorch
python3 -m venv .pico3 && source .pico3/bin/activate
./install_executorch.sh
examples/arm/setup.sh --i-agree-to-the-contained-eula
source examples/arm/ethos-u-scratch/setup_path.sh
export PATH="~/pico/executorch/examples/arm/ethos-u-scratch/arm-gnu-toolchain-13.3.rel1-x86_64-arm-none-eabi/bin/:$PATH"
TO BUILD the so library for obtaining the quantized model, I run examples/xnnpack/quantization/test_quantize.sh cmake add
This produces cmake-out/kernels/quantized/libquantized_ops_aot_lib.so
Get the .pte file with python examples/raspberry_pi/pico2/export_mlp_mnist.py
Get the .pte file for the quantized model
THE MODEL IS QUANTIZED with this new script I created export_mlp_mnist_int8.py.
It uses torch.ops.load_library("cmake-out/kernels/quantized/libquantized_ops_aot_lib.so")
NOW We have a quantized model balanced_tiny_mlp_mnist_quantized.pte. Look at the size difference
-rw-r--r-- 1 106216 Oct 19 10:33 balanced_tiny_mlp_mnist.pte
-rw-r--r-- 1 33104 Oct 21 20:09 balanced_tiny_mlp_mnist_quantized.pte
COMPILE THE DEMO with balanced_tiny_mlp_mnist.pte and change the memory allocator in the main.cpp to 120KB
static uint8_t method_allocator_pool[120 * 1024]; // it used to be 200KB - plenty for method metadata
static uint8_t activation_pool[120 * 1024]; // it used to be 200KB - plenty for activations
NOW WE START BUILDING THE RASPBERRY PI PICO FIRMWARE from the demo but with 120KB instead of 200KB. Notice it works
NOW FLASH
NOW Build the quantized version with the same 120KB size for allocator
In build_firmware_pico.sh we add
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED_AOT=ON \
Then in we add the quantized libs in CMakeList.txt
-Wl,--whole-archive
${BAREMETAL_BUILD_DIR}/lib/libportable_ops_lib.a
${BAREMETAL_BUILD_DIR}/lib/libquantized_ops_lib.a
-Wl,--no-whole-archive
${BAREMETAL_BUILD_DIR}/lib/libportable_kernels.a
${BAREMETAL_BUILD_DIR}/lib/libquantized_kernels.a
COMPILE
examples/raspberry_pi/pico2/build_firmware_pico.sh --model=balanced_tiny_mlp_mnist_quantized.pte
[SERIAL/DIRECT] CONNECTED TO PORT COM15 (115200-8N1)
CHANGE main.cpp allocator size to 280KB. and now it WORKS . BUT Why did the memory needs went up so much from an unquantized model to the quantized model
RECOMPILE the quantized model and it runs now
static uint8_t method_allocator_pool[120 * 1024];
static uint8_t activation_pool[280 * 1024];
Beta Was this translation helpful? Give feedback.
All reactions