- 
                Notifications
    You must be signed in to change notification settings 
- Fork 706
Arm Backend: Enable Pybindings for tosa_serialization lib #15356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm Backend: Enable Pybindings for tosa_serialization lib #15356
Conversation
These bindings should drastically increase serialization performance, both speed wise and memory usage wise. This patch also enables serialization of tosa flatbuffers that are over 2GB in size. - Ensure each tosa operator has an attribute. - Replace TosaOp.Op() calls with Op enum - Convert tosa clamp data to numpy int8 to allow c++ byte serialization - Replace "import tosa_serializer.serializer" with "import tosa_serializer" for pybind - Remove serialize to json. - Remove Darwin support workaround in mlsdk dependencies - Update mlsdk to latest tag, but overwrite the tosa_mlir_translator to a version that supports offset buffers - Update tosa-tools to the monorepo - Update slice to clamp end and start dim. This avoids the object not having a number field from being too large or small. Change-Id: Id03582b6b7b9eef296480f9ea3a4744a08b22603 Signed-off-by: Ryan O'Shea <ryan.oshea3@arm.com>
| 🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15356
 Note: Links to docs will display an error until the docs builds have been completed. ❌ 9 New Failures, 2 Cancelled Jobs, 3 Unrelated FailuresAs of commit f350ce4 with merge base 788ef2f ( NEW FAILURES - The following jobs have failed:
 
 CANCELLED JOBS - The following jobs were cancelled. Please retry:
 
 FLAKY - The following job failed but was likely due to flakiness present on trunk:
 
 BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures 
 This comment was automatically generated by Dr. CI and updates every 15 minutes. | 
| failures unrelated | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal review completed so we're happy with this.
| This was a huge change (esp in terms of tosa serializer changes), it is breaking internal CI. Let me try to fwd fix. In the future if we are touching to many files, we should also pull internally and test (at least until we improve OSS testing for buck :(). | 
|  | ||
| pushd tosa-tools | ||
| git checkout 8468d041c50c6d806f3c1c18c66d7ef641e46580 # serialization lib pybindings | ||
| if [[ ! -d "reference_model" ]]; then | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess tosa/tosa-reference-model repo on gitlab.arm.com is dead (but its there) and long live tosa/tosa-tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah individual repos probably wont change anymore, but should stay online. tosa-tools is a monorepo and is the future of the tooling as far as I'm aware
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re. the tosa-serializer, IIUC we have both c++ (through pybind) and direct python APIs, i.e. tosa_serializer vs. _tosa_serializer why do we have two? Any differences? Are you going to deprecate the direct python ones over pybind ones for perf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its mostly just a quick of the reference model atm, when you pip install reference model it installs old python api ( import serializer.tosa_serializer ), but if you pip install serializer you get c++ ( import tosa_serializer ). Im not sure what the tosa teams plans are, but i think the pure python api is likely going to be frozen where it is, it already lags behind the c++ anyway. For us in ET we have no plans to use the old python version anymore.
Theres not really any difference that I can see, the graphs should serialize to the same flatbuffer and have the same graph, I dont think either one has more or less features right now, but that could change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I am having difficulty in buckify the C++ variant, and since this is breaking things internally, let me revert this for now, we can try to land this again later this week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it will likely impact quite a few follow on fixes which depend on this change if we have to revert this (and so a revert chain) - is there anything we can do to help get this through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is, because of the travel etc. this is blocking a bunch of changes internally since we can't land this due to failures. Do you know which PRs are depending on this? Besides the import name change, this PR (on its own) seems quite independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its mostly the large revert chain upstream and internally that we were hoping to avoid
| fi | ||
|  | ||
| pushd tosa-tools | ||
| git checkout 8468d041c50c6d806f3c1c18c66d7ef641e46580 # serialization lib pybindings | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any impact on BC esp on Vela 4.4.[12]? I see the CI is OK so should be low risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fine, tosa is supposed to be backwards compatible so unless we use new tosa features ( we dont atm ) then nothing should change for vela. The main changes are:
Pybind11 instead of pure python backend, so better perf when lowering. eg. Llama 3.2 1b in int8 goes from ~1h lowering on my laptop to like 3 minutes, and ram usage goes from over 30gb to ~9.
Support for >2GB flatbuffers, these modify the flatbuffer layout if it is >2GB, but there is a guard around ethos-u backend to try and block those from entering vela until its supported there
| 
 It should just be as simple as a setup.sh run, front-end usage should be identical, its purely the internals of the backend that are api broken, but they shouldnt really be exposed so as long as the library exists there should be no change | 
…rch#15356)" This reverts commit fdfeaa4.
These bindings should drastically increase serialization
performance, both speed wise and memory usage wise.
This patch also enables serialization of tosa flatbuffers
that are over 2GB in size.
Change-Id: Id03582b6b7b9eef296480f9ea3a4744a08b22603
cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai