-
Notifications
You must be signed in to change notification settings - Fork 737
Open
Description
Currently, bpte schema stores program data as bytes inside the flatbuffer. This causes high latency and memory usage when converting llms ~1GB into bpte files, as we go through the flatbuffer json conversion.
We can follow the executorch schema and move the program into segments to avoid the flatbuffer conversion of large byte strings.
Program-data separation is a solution (without weights, program is small) but this is not a guaranteed solution for all users.
Conversion:
executorch/devtools/bundled_program/serialize/__init__.py
Lines 82 to 100 in ca4e363
| def serialize_from_bundled_program_to_flatbuffer( | |
| bundled_program: BundledProgram, | |
| ) -> bytes: | |
| """ | |
| Serialize a BundledProgram into FlatBuffer binary format. | |
| Args: | |
| bundled_program (BundledProgram): The `BundledProgram` variable to be serialized. | |
| Returns: | |
| The serialized FlatBuffer binary data in bytes. | |
| """ | |
| bundled_program_in_schema = bundled_program.serialize_to_schema() | |
| return convert_to_flatbuffer( | |
| serialize_from_bundled_program_to_json(bundled_program_in_schema) | |
| ) | |
Schema:
executorch/devtools/bundled_program/schema/bundled_program_schema.fbs
Lines 78 to 95 in ca4e363
| table BundledProgram { | |
| // Schema version. | |
| version:uint; | |
| // Test sets to run against the program. | |
| // Each BundledMethodTestSuite should be used for the method of program sharing same name. | |
| method_test_suites: [BundledMethodTestSuite]; | |
| // The binary data of a serialized Executorch program. | |
| // The following `force_align` may sliently override any larger force_align | |
| // used in the program. Therefore, to keep the data (including constant | |
| // tensor, delegate data, etc, see schema.fbs for more info) in the | |
| // executorch program keeps the same alignment as original no matter how | |
| // the program schema changes, we need to make the force_align here the max | |
| // one around all kinds of force_align in the current and future program | |
| // schema, so we use the 32 as the force_align here. | |
| program: [ubyte] (force_align: 32); | |
| } |
zingo
Metadata
Metadata
Labels
No labels
Type
Projects
Status
To triage