Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
538 commits
Select commit Hold shift + click to select a range
657789e
Qualcomm AI Engine Direct - Apply spin quant R1 and R2 (#5175)
shewu-quic Sep 10, 2024
549f14b
Restore constant segment
lucylq Sep 10, 2024
e826de3
Add Half/BFloat16 tests for op_mul
manuelcandales Sep 10, 2024
43e2f2d
Qualcomm AI Engine Direct - support skip quantization (#5070)
haowhsu-quic Sep 10, 2024
30acae5
Switch over backend tests to export_for_training
tarun292 Sep 10, 2024
db34239
[LLava] Fix stats for C++ runner
digantdesai Sep 10, 2024
02304d7
Update bundled_program to use new namespace
dbort Sep 10, 2024
c76b22f
Qualcomm AI Engine Direct - Fixed the order of the transforms for lla…
shewu-quic Sep 10, 2024
d38ca81
Android refactor cmake build
kirklandsign Sep 10, 2024
a4d67e2
Android: Leverage prefillPrompt and prefillImage on Llava
Riandy Sep 10, 2024
b54206d
Update the minimum C++ version to C++17
dbort Sep 10, 2024
4ce0f9d
Introduce PlatformMemoryAllocator
manuelcandales Sep 10, 2024
2b50c76
Use dynamic bound by default.
shoumikhin Sep 10, 2024
ced40f4
Fix models in benchinfra (#5226)
guangy10 Sep 10, 2024
e245590
App side change
kirklandsign Sep 10, 2024
4cce620
Minor fix: Create root dir when it doesn't exist. (#5075)
freddan80 Sep 10, 2024
ab6d91c
Fix internal executorch_llama_jni
kirklandsign Sep 10, 2024
f07e4d5
Update setup-with-qnn.sh with runner util flag (#5210)
WuhanMonkey Sep 10, 2024
cac2c05
[ET-VK] Integrate axis mapping into optimized matrix multiplication s…
SS-JIA Sep 10, 2024
cba5bee
fbshipit-source-id: f63634ba171da01328849d84552b125b829403e8
facebook-github-bot Sep 11, 2024
ca889fb
Minibench use model_dir instead (#5250)
kirklandsign Sep 11, 2024
e4d72ce
Update setup.sh for LlamaDemo (#5235)
kirklandsign Sep 11, 2024
d423131
Android app UI/flow improvements (#5241)
Riandy Sep 11, 2024
7942d2c
Allow core aten op exception list (#5237)
larryliu0820 Sep 11, 2024
69aed24
link whole quantized_ops_lib (#5253)
kirklandsign Sep 11, 2024
41bc1ce
spinquant in eager mode (#5125)
Sep 11, 2024
d7a7ec6
Updated the workflow to upload models to S3 (#5232)
Sep 11, 2024
7e374d7
Add model execution scripts and runner (#5217)
neuropilot-captain Sep 11, 2024
af80804
Debug event populates event name (#5142)
Olivia-liu Sep 11, 2024
68397af
Optimized op_mm using CPUBlas gemm (#5242)
swolchok Sep 11, 2024
d73a653
Add optimized op_linear (#5243)
swolchok Sep 11, 2024
3171ede
Add scalar tensor tests. (#5260)
shoumikhin Sep 11, 2024
4da3c5d
Add CoreML Quantize (#5228)
Sep 11, 2024
d6b800b
Add helper function to create empty, full, ones and zeros tensors. (#…
shoumikhin Sep 11, 2024
75a56a2
Add helper function to create random tensors. (#5266)
shoumikhin Sep 11, 2024
e462e5a
Bug fix partitioner (#5239)
mcr229 Sep 11, 2024
0af6c12
Use ones() to create tensors. (#5273)
shoumikhin Sep 11, 2024
92d0559
Add missing Pyre mode headers] [batch:21/424] [shard:6/N]
Sep 11, 2024
41b463e
Do not load constant_segment if only the placeholder exists (#5229)
lucylq Sep 11, 2024
d80f78f
Read SpinQuant checkpoints (#5259)
mergennachin Sep 11, 2024
6ac1365
Fix Android LlamaDemo setup.sh (#5274)
kirklandsign Sep 11, 2024
d689722
Make `compare_results()` import path public (#5225)
Olivia-liu Sep 11, 2024
338ef26
Remove explicit dereferencing for TesnorPtr converted implicitly to E…
shoumikhin Sep 11, 2024
6328d41
Rename "SDK" -> "Developer Tools" in documentations (OSS files) (#5238)
Olivia-liu Sep 11, 2024
7c76e03
Switch Optimizer to std::map (#5230)
JacobSzwejbka Sep 11, 2024
de30572
Just print Android instrument log when the test passes (#5280)
huydhn Sep 11, 2024
7e3ec96
Remove references to exec_aten::RuntimeContext (#5257)
dbort Sep 11, 2024
f9da675
Tuning LLM from PTE (#5233)
dpalmasan Sep 11, 2024
c5c121b
Move test spec file (#5218)
kirklandsign Sep 12, 2024
a4be79f
Switch Apple benchmark workflow to use the generic ET benchmark iOS a…
huydhn Sep 12, 2024
665fa03
Fix android setup qnn sh (#5275)
kirklandsign Sep 12, 2024
81b5438
Reskin the demo app with new UI assets and colors (#5282)
Sep 12, 2024
12a25c6
Preserve undelegated Linear ops in Llama demo export (#5244)
swolchok Sep 12, 2024
5e7efe6
port bf16 dot product kernel from ATen CPUBlas (#5245)
swolchok Sep 12, 2024
17cf782
Add trailing dims memoization to improve performance of permute_copy_…
swolchok Sep 12, 2024
88508c5
Make ForcedUnroll usage in bf16 BlasKernel actually work for -Oz buil…
swolchok Sep 12, 2024
5022deb
Migrate RuntimeContext users to KernelRuntimeContext (#5270)
dbort Sep 12, 2024
c5c69a9
Qualcomm AI Engine Direct - build with rpath for ease of use (#5268)
haowhsu-quic Sep 12, 2024
08c8c6e
Add test for stateful model and fix output backings issue (#5294)
cymbalrush Sep 12, 2024
3ad2f16
Rename the "executorch/examples/sdk" folder to "executorch/examples/d…
Olivia-liu Sep 12, 2024
623b7b6
Use Apple perf workflow to validate the test spec (#5293)
huydhn Sep 12, 2024
4c61317
Failing DW test on executorch (#4929)
Sep 12, 2024
7ea5a1d
Fix flaky tests. (#5305)
shoumikhin Sep 12, 2024
40e6e52
Expose shape dynamism. (#5306)
shoumikhin Sep 12, 2024
1d37332
Add APIs to make from an existing Tensor and clone by copying and own…
shoumikhin Sep 12, 2024
4053a18
Remove dim order check in unsqueeze op to unblock supernova model rel…
Gasoonjia Sep 12, 2024
8888c0d
Adding per-method tracers to the module utility. Changing set_output_…
meta-emilian Sep 12, 2024
8874de2
Benchmark app update (#5240)
kirklandsign Sep 12, 2024
7998b7f
Fix issue with debug TOSA dumps being overwritten (#5029)
benkli01 Sep 12, 2024
1793c4a
Preserve SDPA for CoreML (#5258)
Sep 12, 2024
272d4d8
int(max_seq_len) (#5269)
digantdesai Sep 12, 2024
bcd156b
Enable Ethos-U85 support in Vela (#5002)
per Sep 12, 2024
38892ac
Clean up non-exec_aten references to tensor types (#5254)
dbort Sep 12, 2024
905df29
Rename exec_aten:: to executorch::aten:: (#5296)
dbort Sep 12, 2024
1d46d72
Android Java introduce Experimental API annotation (#5303)
kirklandsign Sep 12, 2024
b904833
Add fast_hadamard_transform and fast_hadamard_transform_28N kernels (…
swolchok Sep 12, 2024
327a5b6
Quantized fast hadamard transform (#5284)
swolchok Sep 12, 2024
ab75531
add quantized fast_hadamard_transform_28N (#5285)
swolchok Sep 12, 2024
2d4b9ed
Clean commit of FFHT dependency (#5286)
swolchok Sep 12, 2024
ce1f8bd
Remove unneeded FFHT files (#5287)
swolchok Sep 12, 2024
d3fb502
FFHT: just expect benchmark to be installed (#5288)
swolchok Sep 12, 2024
eedc38a
FFHT: ARM NEON port (#5289)
swolchok Sep 12, 2024
4218d45
FFHT enhancements to fast hadamard transform kernels (#5290)
swolchok Sep 12, 2024
4b3f1c5
Set EXECUTORCH_BUILD_QNN=ON for benchmarking app build (#5315)
kirklandsign Sep 12, 2024
c032194
Fix Hard Fault in arm_executor_runner (#5302)
per Sep 12, 2024
523b41e
Update executorch documentation to use export_for_training (#5219)
tarun292 Sep 12, 2024
53c49fb
update doc for phi-3-mini (#5320)
helunwencser Sep 12, 2024
b9ae0ec
Rename cmake option "EXECUTORCH_BUILD_SDK" to "EXECUTORCH_BUILD_DEVTO…
Olivia-liu Sep 12, 2024
3264a7b
Add lint suppressions for fast_hadamard_transform_special_unstrided_c…
swolchok Sep 12, 2024
9256b4a
Fix so name in setup-with-qnn.sh (#5333)
kirklandsign Sep 13, 2024
c20ed5e
Fix env in android perf so aar build (#5330)
kirklandsign Sep 13, 2024
aa1bcc3
Update native library in AndroidManifest (#5331)
kirklandsign Sep 13, 2024
bdd6a8e
Fix API warning for older SDKs (#5337)
DenisVieriu97 Sep 13, 2024
fcbbef4
Update minibench to support qnn (#5325)
kirklandsign Sep 13, 2024
fe53d41
Qualcomm AI Engine Direct - Add the tutorial to deploy llama3 8B Inst…
shewu-quic Sep 13, 2024
ca2ac54
Enable optimized build for portable configuration of test-models (#5317)
swolchok Sep 13, 2024
9301ebb
Add experimental API for preserving ops from decomposition (#5236)
Sep 13, 2024
c080c48
Polish UI (#5319)
Sep 13, 2024
9845019
Let Module tests use Tensor extension and aten mode. (#5298)
shoumikhin Sep 13, 2024
07e15a9
Precompute multiplicative inverse when possible in op_div (#5209)
swolchok Sep 13, 2024
0d0b14a
Clean up LlamaDemo test on AWS (#5340)
huydhn Sep 13, 2024
7dbe15e
Simplify setting output. (#5334)
shoumikhin Sep 13, 2024
37f77ed
Refactor kernel registration tutorials (#5297)
larryliu0820 Sep 13, 2024
a0a249e
Fix tests. (#5349)
shoumikhin Sep 13, 2024
933685b
mark sgd as experimental (#5316)
JacobSzwejbka Sep 13, 2024
bba4040
Compile and deploy QNN models to S24 (#5137)
Sep 13, 2024
1bb5b20
Mark all pybindings.portable_lib names as @experimental (#5329)
dbort Sep 13, 2024
6d1a573
Use parallel_for in bfloat16 gemm_transa_ kernel (#5248)
swolchok Sep 13, 2024
0d1644f
Define generic Android benchmark metric structure (#5332)
huydhn Sep 13, 2024
034e098
Revert D62617066 (#5351)
Sep 13, 2024
71602a0
Allow Partitioner to Force Dynamic Linear Computation (#5338)
mcr229 Sep 13, 2024
31e652d
Integrate axis mapping into naive matrix multiplication shaders (#5277)
SS-JIA Sep 13, 2024
25168b7
Enable tensor aliases with texture storage (#5347)
SS-JIA Sep 13, 2024
62024d8
Use ms for number report (#5362)
kirklandsign Sep 13, 2024
bfce743
Try to fix test_model.sh (#5361)
larryliu0820 Sep 13, 2024
9b5ba1f
Increase timeout threshold for on-device benchmarking (#5324)
Sep 13, 2024
3450ecc
Fix API warning for older SDKs #2 (#5365)
DenisVieriu97 Sep 13, 2024
f507871
No matrix needed in benchmarking apk build (#5339)
kirklandsign Sep 14, 2024
2001b3c
Revert D62621640: Increase timeout threshold for on-device benchmarking
Sep 14, 2024
67be84b
Script to export 🤗 models (#4723)
Sep 14, 2024
69d33a8
minibench README (#5367)
kirklandsign Sep 14, 2024
0aa75e6
Fix EValue construction from a smart pointer. (#5357)
shoumikhin Sep 14, 2024
68b75cd
Refine the tests to compare the result with the error code. (#5358)
shoumikhin Sep 14, 2024
a7618c5
Add API to set inputs independently from execution. (#5356)
shoumikhin Sep 14, 2024
8d5ef1d
Add nongenai models to apple perf (#5370)
Sep 14, 2024
69ae1e1
Use bfdot if compiled with ARM_FEATURE_BF16 (#5249)
swolchok Sep 14, 2024
248ba52
Build optimized kernels with bf16 support and gate usage at runtime (…
swolchok Sep 14, 2024
08ecd73
Llava+llama demo (#5372)
shoumikhin Sep 14, 2024
262dfc0
Move examples/models/... out of the `torch` namespace (#5318)
dbort Sep 14, 2024
74a56e4
Update Android ExecuTorch Llama demo app readme docs (#5364)
cmodi-meta Sep 14, 2024
768f5c9
Simplify setting output. (#5363)
shoumikhin Sep 14, 2024
e0c9312
Revert D62466496: Multisect successfully blamed "D62466496: [ExecuTor…
Sep 14, 2024
58700fa
Update iOS Llama demo app readme docs (#5359)
Riandy Sep 14, 2024
eb0cdf7
Upload exported models and Android apps directly to S3 (#5375)
huydhn Sep 14, 2024
eecf74f
Qualcomm AI Engine Direct - Intermediate Tensor Dump (#5310)
winskuo-quic Sep 15, 2024
08f16d0
Update phi-3-mini lora export code and readme (#5327)
jackzhxng Sep 16, 2024
ef31608
Add sdpa arg comments (#5323)
jackzhxng Sep 16, 2024
0a501eb
Add extra checks for the provided dim order and strides. (#5377)
shoumikhin Sep 16, 2024
c252553
Move type arg to the end to match Aten constructors. (#5379)
shoumikhin Sep 16, 2024
26375cc
Introduce `virtual_transpose()` to `vTensor` for no copy transpositio…
SS-JIA Sep 16, 2024
2b3cc27
Reapply D62466496: Build optimized kernels with bf16 support and gate…
swolchok Sep 16, 2024
a166a25
Export aoti for preprocess (#5354)
lucylq Sep 16, 2024
07c77be
Remove unnecessary member structs from `Allocation` struct to reduce …
SS-JIA Sep 16, 2024
a9ffb3a
Increase timeout threshold for on-device benchmarking (#5371)
Sep 16, 2024
2460e15
New URL for developer tools tutorial (#5384)
Olivia-liu Sep 16, 2024
fa77771
New URL for developer tools overview page (#5385)
Olivia-liu Sep 16, 2024
30ab618
Add a note about cleaning the build system after a sync (#5395)
dbort Sep 16, 2024
f7954f6
Fix batch dimension adjustment in shader indexing utils (#5399)
SS-JIA Sep 16, 2024
8a8e876
Register preprocess in pytorch (#5350)
lucylq Sep 16, 2024
7c661d7
Add XOR Model Example (#5397)
JacobSzwejbka Sep 16, 2024
9c068ab
Create qualcomm_README.md (#5394)
WuhanMonkey Sep 17, 2024
5c3be4a
Migrate examples/apple/... away from the torch:: namespace (#5405)
dbort Sep 17, 2024
c5d1661
Add a README.md for benchmarking infra (#5403)
Sep 17, 2024
c8a7762
Add SpinQuant into README (#5412)
mergennachin Sep 17, 2024
c605bae
pad_max_tiles (#5271)
lucylq Sep 17, 2024
e8a557c
Cast the vector from deduced type to desired type if needed. (#5409)
shoumikhin Sep 17, 2024
9c5994d
Arm Backend: Generate input if not supplied in the non semihosting (#…
zingo Sep 17, 2024
16daeb4
Increase memory allocated for inputs in case semi hosting is used. (#…
freddan80 Sep 17, 2024
3befc8a
Update version.txt post 0.4 branch cut (#5411)
jackzhxng Sep 17, 2024
1645af0
Upgrade the coremltools version. (#5425)
shoumikhin Sep 17, 2024
aaf73d8
vTensor cleanup 1/N - swap order of `vTensor` and `vTensorStorage` in…
SS-JIA Sep 17, 2024
92adb94
vTensor cleanup 2/N - remove `discard_and_reallocate` API (#5422)
SS-JIA Sep 17, 2024
a6f8389
Add android-arm64 execution platform in shim & plumb it to extract_so…
swolchok Sep 17, 2024
19f5ed8
Refactor fast_hadamard_transform_test shared implementation functions…
swolchok Sep 17, 2024
acfe0ba
Custom op for fast hadamard transform kernel (#5291)
swolchok Sep 17, 2024
618466e
Change memory planning API to accept full algorithm as argument as op…
Sep 17, 2024
3e2cfc7
Integrate axis mapping into binary op (#5408)
nathanaelsee Sep 17, 2024
442b45a
Remove buck2 from llama mac test (#5414)
kirklandsign Sep 17, 2024
fe53daf
Check for contiguous dim order in op_fast_hadamard_transform (#5419)
swolchok Sep 17, 2024
0658dce
Print overload name in print_ops_info (#5404)
mcremon-meta Sep 17, 2024
06c0fa3
Unbreak Intel Apple Buck builds (#5431)
swolchok Sep 17, 2024
aebc2e3
Update llava readme docs to reference demo apps (#5427)
Riandy Sep 17, 2024
8f7d9d5
Allow mutating input tensor (#4850)
helunwencser Sep 17, 2024
2e1043b
Update readme to include OSS references (#5433)
lucylq Sep 17, 2024
b7dfd8a
Fix Android x86_64 build (#5434)
kirklandsign Sep 17, 2024
b14dea8
Add convenience load methond for forward. (#5446)
shoumikhin Sep 17, 2024
8a0b48e
aten.hardsigmoid.default in unary_ops (#5396)
Abhi-hpp Sep 18, 2024
ab6cbd9
Update executorch pin (#5438)
tugsbayasgalan Sep 18, 2024
3e1a578
Add an error message to help debug issue when accessing secrets in CI…
Sep 18, 2024
41c21a4
Integrate axis mapping in embedding op (#5440)
nathanaelsee Sep 18, 2024
861a7bf
Fix Android ExecuTorchDemo docs (#5439)
kirklandsign Sep 18, 2024
444480b
Buckify Cadence HiFi4 Operators. (#5154)
hsharma35 Sep 18, 2024
1e4c316
Add more data points from benchmarking infra (#5432)
Sep 18, 2024
53c1a5f
Batch-aware torch.ops.llama.sdpa_with_kv_cache (#4822)
meta-emilian Sep 18, 2024
26c736e
Training demo (#5445)
JacobSzwejbka Sep 18, 2024
0648a8a
Fix various using namespace issues in executorch (#5464)
r-barnes Sep 18, 2024
d2a38cc
Update stories cmd to use kv cache (#5460)
lucylq Sep 18, 2024
6ed8873
Add training readme
JacobSzwejbka Sep 18, 2024
1a5c8bf
Make llama and llava more prominent in top-level readmes (#5471)
mergennachin Sep 18, 2024
ebff33c
aten.hardswish fix w coordinate (#5418)
Abhi-hpp Sep 18, 2024
958afe1
vTensor cleanup 3/N - Introduce conversion constructors for `vec` typ…
SS-JIA Sep 18, 2024
f5f54b8
vTensor cleanup 4/N - consolidate texture positions and extents to be…
SS-JIA Sep 18, 2024
61e5d4c
Improve iOS demo app readme (#5453)
Riandy Sep 18, 2024
67752ee
Add llama animated gif to llama readme (#5474)
mergennachin Sep 18, 2024
ad95e46
Update a ReadMe (#5473)
digantdesai Sep 18, 2024
fdc7e45
Add llava animated gif to llava readme
mergennachin Sep 18, 2024
8e28188
Add test_llama bf16 portable config to CI (#5472)
swolchok Sep 18, 2024
5a3eceb
added detailed instructions for xtensa cmake and devserver setup (#5398)
zonglinpeng Sep 18, 2024
0a9bbaa
Add DeviceInfo in iOS benchmark run (#5410)
huydhn Sep 18, 2024
2afcd96
apply output layer pruning (#5426)
navsud Sep 18, 2024
e148c1d
vTensor cleanup 5/N - clean up `indexing_utils.h` and clarify functio…
SS-JIA Sep 18, 2024
b89c52c
Add definition to `etrecord_path` to the devtools tutorial (#5458)
Olivia-liu Sep 18, 2024
8ef6c79
Move examples/qualcomm out from under the torch namespace (#5400)
dbort Sep 18, 2024
73244a9
Add LLM subpages to navi (#5475)
kirklandsign Sep 19, 2024
47f4f07
Forward fix pull.yml (#5489)
kirklandsign Sep 19, 2024
7c6d58a
vTensor cleanup 6/N - Do not use `gpu_memory_layout` as a source of t…
SS-JIA Sep 19, 2024
90d5191
vTensor cleanup 7/N - Blanket replacement of `packed_dim_whcn_idx` wi…
SS-JIA Sep 19, 2024
af098c3
Enable Workspace sharing by default (#5336)
digantdesai Sep 19, 2024
16673f9
Update GH link in docs (#5493)
kirklandsign Sep 19, 2024
28c9a1d
Fix underflow error in `calculate_dim_order()` (#5498)
SS-JIA Sep 19, 2024
a556a2d
Support SpinQuant to run on ET (#5435)
Sep 19, 2024
2df0cc1
Fix missing newline typo on ios docs (#5491)
Riandy Sep 19, 2024
3512148
introduce {load|write}_texel_lpos helper for fetching/writing logical…
nathanaelsee Sep 19, 2024
a9f3f81
Android custom lib (#5501)
kirklandsign Sep 20, 2024
b5741a6
Fix javadoc for LlamaModule.java (#5502)
kirklandsign Sep 20, 2024
01dcebd
Remove `torch::` references from arm_executor_runner (#5506)
dbort Sep 20, 2024
8618607
Fix phi-3-mini build (#5513)
dbort Sep 20, 2024
7de3f81
Fix broken images in docs (#5514)
Riandy Sep 20, 2024
613cfd6
Add CMake instructions to apple-runtime.md (#5533)
dbort Sep 20, 2024
c50f9fe
update copy_offset to new layout specifier gen & axis mapping (#5505)
nathanaelsee Sep 21, 2024
0eee42a
Don't require -march compiler flags to use bfdot (#5444)
swolchok Sep 21, 2024
d5fdbd4
update conv1d to new layout specifier gen, axis mapping, and use non-…
nathanaelsee Sep 21, 2024
3ec4161
Fix optimized kernels build. (#5534)
shoumikhin Sep 21, 2024
45210bb
Fix tensor cloning when data is null. (#5535)
shoumikhin Sep 21, 2024
55d6b0d
Fix Xcode project. (#5539)
shoumikhin Sep 22, 2024
b2517d6
Remove TIP Format and Replaced with Subheader in README (#5517)
cmodi-meta Sep 22, 2024
e12b37e
Arm backend: Track target flash size metrics (#5342)
zingo Sep 23, 2024
0ec003b
Add "px" unit to image sizes in readme (#5540)
cmodi-meta Sep 23, 2024
182f138
Move examples/mediatek out from under the torch namespace (#5478)
dbort Sep 23, 2024
b361f91
Remove `torch::` namespace reference from LLaMMARunner.mm (#5516)
dbort Sep 23, 2024
3b63839
Fix duplicating latest prompt (#5546)
cmodi-meta Sep 23, 2024
abe9c36
Remove stray uses of `torch::executor::` from examples/... (#5512)
dbort Sep 23, 2024
f68a138
Remove `torch::` references from devtools/example_runner (#5495)
dbort Sep 23, 2024
cab6335
Allow using custom SDPA for non-float32 dtypes in llama demo (#5548)
swolchok Sep 23, 2024
b611d59
add CI job for phi-3-mini (#5532)
helunwencser Sep 23, 2024
8be3ce5
Fix image sizes in README.md (#5550)
svekars Sep 23, 2024
2eae7a9
Move QMat2 to buffer storage and scales_and_zeros to Channels Packed …
SS-JIA Sep 23, 2024
0a72cb0
Support bfloat16 in op_index (#5499)
swolchok Sep 23, 2024
badd76e
Support bfloat16 in op_index_put (#5500)
swolchok Sep 23, 2024
28c2ab6
add BFloat16 to aten_bridge (#5519)
swolchok Sep 23, 2024
286799c
Include optimized kernels in pybindings' portable_lib if building the…
swolchok Sep 23, 2024
61cb5b0
Adding support to demo prompt classification with Llama Guard (#5553)
Riandy Sep 23, 2024
ca0e48c
Refactor codegen components to prepare for benchmark generation (#5560)
SS-JIA Sep 24, 2024
5a984cc
Generate benchmarks automatically (#5561)
SS-JIA Sep 24, 2024
f4728f4
Add all relevant testcases for Arm Ethos-U85 (#5346)
zingo Sep 24, 2024
df72b8c
Use TensorMeta to check if inputs and outputs are memory planned (#5565)
JacobSzwejbka Sep 24, 2024
ddbe681
Update base for Update on "[ET-VK] Implement slice as a view"
SS-JIA Sep 24, 2024
17c3689
Update on "[ET-VK] Implement slice as a view"
SS-JIA Sep 24, 2024
0997981
Update base for Update on "[ET-VK] Implement slice as a view"
SS-JIA Sep 24, 2024
16cbc60
Update on "[ET-VK] Implement slice as a view"
SS-JIA Sep 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/pytorch.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
c42ac54d9e817bf0a0366eb78e6c8beba4d5eff5
aec9b2ab77389967ef39bb9c10662fd0fe3e185a
1 change: 1 addition & 0 deletions .ci/docker/ci_commit_pins/torchao.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0916b5b29b092afcbf2b898caae49abe80662bac
4 changes: 4 additions & 0 deletions .ci/docker/common/install_linter.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/utils.sh"
# NB: Install all linter dependencies, the caching of lintrunner init could be
# done after Executorch becomes public
pip_install -r requirements-lintrunner.txt

# Install google-java-format
curl -L --retry 3 https://github.com/google/google-java-format/releases/download/v1.23.0/google-java-format_linux-x86-64 > /opt/google-java-format
chmod +x /opt/google-java-format
6 changes: 4 additions & 2 deletions .ci/scripts/build-qnn-sdk.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@
# LICENSE file in the root directory of this source tree.

set -eux
set -o xtrace

build_qnn_backend() {
echo "Start building qnn backend."
export ANDROID_NDK_ROOT=/opt/ndk
export QNN_SDK_ROOT=/tmp/qnn/2.23.0.240531
export QNN_SDK_ROOT=/tmp/qnn/2.25.0.240728
export EXECUTORCH_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/../.." && pwd)"

bash backends/qualcomm/scripts/build.sh --skip_aarch64 --job_number 2 --release
Expand All @@ -26,8 +27,9 @@ set_up_aot() {
-DCMAKE_INSTALL_PREFIX=$PWD \
-DEXECUTORCH_BUILD_QNN=ON \
-DQNN_SDK_ROOT=${QNN_SDK_ROOT} \
-DEXECUTORCH_BUILD_SDK=ON \
-DEXECUTORCH_BUILD_DEVTOOLS=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
-DPYTHON_EXECUTABLE=python3 \
-DEXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT=OFF
Expand Down
3 changes: 2 additions & 1 deletion .ci/scripts/build_llama_android.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,9 @@ install_executorch_and_backend_lib() {
-DANDROID_PLATFORM=android-23 \
-DCMAKE_INSTALL_PREFIX=cmake-android-out \
-DCMAKE_BUILD_TYPE=Release \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
Expand Down
33 changes: 33 additions & 0 deletions .ci/scripts/setup-ios.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

set -exu

# This script follows the instructions from GitHub to install an Apple certificate
# https://docs.github.com/en/actions/use-cases-and-examples/deploying/installing-an-apple-certificate-on-macos-runners-for-xcode-development

CERTIFICATE_PATH="${RUNNER_TEMP}"/build_certificate.p12
PP_PATH="${RUNNER_TEMP}"/build_pp.mobileprovision
KEYCHAIN_PATH="${RUNNER_TEMP}"/app-signing.keychain-db

# Import certificate and provisioning profile from secrets
echo -n "$BUILD_CERTIFICATE_BASE64" | base64 --decode -o $CERTIFICATE_PATH
echo -n "$BUILD_PROVISION_PROFILE_BASE64" | base64 --decode -o $PP_PATH

# Create a temporary keychain
security create-keychain -p "$KEYCHAIN_PASSWORD" $KEYCHAIN_PATH
security set-keychain-settings -lut 21600 $KEYCHAIN_PATH
security unlock-keychain -p "$KEYCHAIN_PASSWORD" $KEYCHAIN_PATH

# Import certificate to the keychain
security import $CERTIFICATE_PATH -P "" -A -t cert -f pkcs12 -k $KEYCHAIN_PATH
security set-key-partition-list -S apple-tool:,apple: -k "$KEYCHAIN_PASSWORD" $KEYCHAIN_PATH
security list-keychain -d user -s $KEYCHAIN_PATH

# Apply provisioning profile
mkdir -p ~/Library/MobileDevice/Provisioning\ Profiles
cp $PP_PATH ~/Library/MobileDevice/Provisioning\ Profiles
1 change: 0 additions & 1 deletion .ci/scripts/setup-linux.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,5 @@ fi

# As Linux job is running inside a Docker container, all of its dependencies
# have already been installed
install_flatc_from_source
install_executorch
build_executorch_runner "${BUILD_TOOL}"
2 changes: 0 additions & 2 deletions .ci/scripts/setup-macos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,5 @@ if [[ -z "${GITHUB_RUNNER:-}" ]]; then
fi

print_cmake_info
install_pytorch_and_domains
install_flatc_from_source
install_executorch
build_executorch_runner "${BUILD_TOOL}"
26 changes: 24 additions & 2 deletions .ci/scripts/setup-qnn-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@

set -ex

verify_pkg_installed() {
echo $(dpkg-query -W --showformat='${Status}\n' $1|grep "install ok installed")
}

install_qnn() {
echo "Start installing qnn."
QNN_INSTALLATION_DIR=/tmp/qnn
mkdir -p "${QNN_INSTALLATION_DIR}"

curl -Lo /tmp/v2.23.0.24.06.24.zip "https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.23.0.24.06.24.zip"
curl -Lo /tmp/v2.25.0.24.07.28.zip "https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.25.0.240728.zip"
echo "Finishing downloading qnn sdk."
unzip -qo /tmp/v2.23.0.24.06.24.zip -d /tmp
unzip -qo /tmp/v2.25.0.24.07.28.zip -d /tmp
echo "Finishing unzip qnn sdk."


Expand All @@ -26,4 +30,22 @@ install_qnn() {
ls -lah "${QNN_INSTALLATION_DIR}"
}

setup_libc++() {
sudo apt-get update
pkgs_to_check=('libc++-dev')
j=0
while [ $j -lt ${#pkgs_to_check[*]} ]; do
install_status=$(verify_pkg_installed ${pkgs_to_check[$j]})
if [ "$install_status" == "" ]; then
sudo apt-get install -y ${pkgs_to_check[$j]}
if [[ $? -ne 0 ]]; then
echo "ERROR: Failed to install required packages for libc++"
exit 1
fi
fi
j=$(( $j +1));
done
}

setup_libc++
install_qnn
11 changes: 7 additions & 4 deletions .ci/scripts/test_llama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ source "$(dirname "${BASH_SOURCE[0]}")/utils.sh"

MODEL_NAME=$1 # stories110M
BUILD_TOOL=$2 # buck2 or cmake
DTYPE=$3 # fp16 or fp32
DTYPE=$3 # fp16, bf16, or fp32
MODE=${4:-"xnnpack+custom"} # portable or xnnpack+custom or xnnpack+custom+qe
UPLOAD_DIR=${5:-}
if [[ $# -lt 4 ]]; then # Assuming 4 mandatory args
Expand All @@ -29,7 +29,7 @@ if [[ -z "${BUILD_TOOL:-}" ]]; then
fi

if [[ -z "${DTYPE:-}" ]]; then
echo "Missing dtype, choose fp16 or fp32, exiting..."
echo "Missing dtype, choose fp16, bf16, or fp32, exiting..."
exit 1
fi

Expand Down Expand Up @@ -75,7 +75,7 @@ echo "COREML option ${COREML}"
if [[ "${MODE}" =~ .*qnn.* ]]; then
QNN=ON
export EXECUTORCH_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)"
export QNN_SDK_ROOT=/tmp/qnn/2.23.0.240531
export QNN_SDK_ROOT=/tmp/qnn/2.25.0.240728
export LD_LIBRARY_PATH="${QNN_SDK_ROOT}/lib/x86_64-linux-clang"
export PYTHONPATH=".."
cp schema/program.fbs exir/_serialize/program.fbs
Expand Down Expand Up @@ -107,8 +107,9 @@ cmake_install_executorch_libraries() {
retry cmake \
-DCMAKE_INSTALL_PREFIX=cmake-out \
-DCMAKE_BUILD_TYPE=Debug \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM="$CUSTOM" \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
Expand Down Expand Up @@ -173,6 +174,8 @@ fi
EXPORTED_MODEL_NAME="llama2"
if [[ "${DTYPE}" == "fp16" ]]; then
EXPORTED_MODEL_NAME="${EXPORTED_MODEL_NAME}_h"
elif [[ "${DTYPE}" == "bf16" ]]; then
EXPORTED_MODEL_NAME="${EXPORTED_MODEL_NAME}_bf"
elif [[ "${DTYPE}" == "fp32" ]]; then
:
else
Expand Down
161 changes: 126 additions & 35 deletions .ci/scripts/test_llava.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,44 +8,99 @@
set -exu
# shellcheck source=/dev/null

BUILD_TYPE=${1:-Debug}
TARGET_OS=${2:-Native}
BUILD_DIR=${3:-cmake-out}

echo "Building with BUILD_TYPE: $BUILD_TYPE, TARGET_OS: $TARGET_OS, BUILD_DIR: $BUILD_DIR"

if [[ -z "${PYTHON_EXECUTABLE:-}" ]]; then
PYTHON_EXECUTABLE=python3
PYTHON_EXECUTABLE=python3
fi

TARGET_OS_lower="$(echo "${TARGET_OS}" | awk '{print tolower($0)}')"
if [[ "${TARGET_OS_lower}" == "android" ]]; then
if [[ -z "${ANDROID_NDK}" ]]; then
echo "Set ANDROID_NDK environment variable to build for Android."
exit 1
fi
fi

# Number of processes for a parallel build
NPROC=8
if hash nproc &> /dev/null; then NPROC=$(nproc); fi

EXECUTORCH_COMMON_CMAKE_ARGS=" \
-DCMAKE_INSTALL_PREFIX=${BUILD_DIR} \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
-DEXECUTORCH_ENABLE_LOGGING=ON \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_DO_NOT_USE_CXX11_ABI=ON \
-DEXECUTORCH_XNNPACK_SHARED_WORKSPACE=ON"

cmake_install_executorch_libraries() {
cmake \
-DCMAKE_INSTALL_PREFIX=cmake-out \
-DCMAKE_BUILD_TYPE=Debug \
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DEXECUTORCH_DO_NOT_USE_CXX11_ABI=ON \
-DEXECUTORCH_XNNPACK_SHARED_WORKSPACE=ON \
-Bcmake-out .


cmake --build cmake-out -j9 --target install --config Debug
cmake \
${EXECUTORCH_COMMON_CMAKE_ARGS} \
-B${BUILD_DIR} .

cmake --build ${BUILD_DIR} -j${NPROC} --target install --config ${BUILD_TYPE}
}

cmake_install_executorch_libraries_for_android() {
cmake \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-23 \
${EXECUTORCH_COMMON_CMAKE_ARGS} \
-B${BUILD_DIR} .

cmake --build ${BUILD_DIR} -j${NPROC} --target install --config ${BUILD_TYPE}
}


LLAVA_COMMON_CMAKE_ARGS=" \
-DPYTHON_EXECUTABLE="$PYTHON_EXECUTABLE" \
-DCMAKE_INSTALL_PREFIX=${BUILD_DIR} \
-DCMAKE_BUILD_TYPE=${BUILD_TYPE} \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON"

cmake_build_llava_runner() {
dir=examples/models/llava
python_lib=$($PYTHON_EXECUTABLE -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')

cmake \
-DCMAKE_INSTALL_PREFIX=cmake-out \
-DCMAKE_BUILD_TYPE=Debug \
-DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
-DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
-DEXECUTORCH_BUILD_XNNPACK=ON \
-DCMAKE_PREFIX_PATH="$python_lib" \
-Bcmake-out/${dir} \
cmake \
${LLAVA_COMMON_CMAKE_ARGS} \
-DCMAKE_PREFIX_PATH="$python_lib" \
-B${BUILD_DIR}/${dir} \
${dir}

cmake --build ${BUILD_DIR}/${dir} -j${NPROC} --config ${BUILD_TYPE}
}


cmake --build cmake-out/${dir} -j9 --config Debug
cmake_build_llava_runner_for_android() {
dir=examples/models/llava
python_lib=$($PYTHON_EXECUTABLE -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')

cmake \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-23 \
${LLAVA_COMMON_CMAKE_ARGS} \
-DCMAKE_PREFIX_PATH="$python_lib" \
-DLLAVA_RUNNER_NO_TORCH_DUMMY_IMAGE=ON \
-B${BUILD_DIR}/${dir} \
${dir}

cmake --build ${BUILD_DIR}/${dir} -j${NPROC} --config ${BUILD_TYPE}
}

# only export the one without custom op for now since it's
Expand All @@ -54,6 +109,13 @@ export_llava() {
$PYTHON_EXECUTABLE -m executorch.examples.models.llava.export_llava --pte-name llava.pte --with-artifacts
}

# Download a new image with different size, to test if the model can handle different image sizes
prepare_image_tensor() {
echo "Downloading image"
curl -o basketball.jpg https://upload.wikimedia.org/wikipedia/commons/7/73/Chicago_Bulls_and_New_Jersey_Nets%2C_March_28%2C_1991.jpg
$PYTHON_EXECUTABLE -m executorch.examples.models.llava.image_util --image-path basketball.jpg --output-path image.pt
}

run_and_verify() {
NOW=$(date +"%H:%M:%S")
echo "Starting to run llava runner at ${NOW}"
Expand All @@ -69,17 +131,33 @@ run_and_verify() {
echo "tokenizer.bin is missing."
exit 1
fi
RUNTIME_ARGS="--model_path=llava.pte \
--tokenizer_path=tokenizer.bin \
--image_path=image.pt \
--prompt=ASSISTANT: \
--temperature=0 \
--seq_len=650"
cmake-out/examples/models/llava/llava_main ${RUNTIME_ARGS} > result.txt



RUNTIME_ARGS="--model_path=llava.pte \
--tokenizer_path=tokenizer.bin \
--image_path=image.pt \
--prompt=ASSISTANT: \
--temperature=0 \
--seq_len=650"

if [[ "${TARGET_OS_lower}" == "android" ]]; then
echo "Transfer relevant files to the phone via ADB and run llava_main with following args,"
echo "$ llava_main ${RUNTIME_ARGS} "
exit 0;
fi

${BUILD_DIR}/examples/models/llava/llava_main ${RUNTIME_ARGS} > result.txt

# verify result.txt
RESULT=$(cat result.txt)
# set the expected prefix to be the same as prompt because there's a bug in sdpa_with_kv_cache that causes <unk> tokens.
EXPECTED_PREFIX="ASSISTANT:"
if [[ "$(uname)" == "Darwin" ]]; then
EXPECTED_PREFIX="ASSISTANT: image captures a basketball game in progress, with several players on the court. One of the players is dribbling the ball, while the others are in various"
else
# set the expected prefix to be the same as prompt because there's a bug in sdpa_with_kv_cache that causes <unk> tokens.
EXPECTED_PREFIX="ASSISTANT:"
fi
if [[ "${RESULT}" == *"${EXPECTED_PREFIX}"* ]]; then
echo "Expected result prefix: ${EXPECTED_PREFIX}"
echo "Actual result: ${RESULT}"
Expand All @@ -93,7 +171,20 @@ run_and_verify() {
fi
}

cmake_install_executorch_libraries
cmake_build_llava_runner
# Step1. Build stuff
if [[ "${TARGET_OS_lower}" == "android" ]]; then
cmake_install_executorch_libraries_for_android
cmake_build_llava_runner_for_android
elif [[ "${TARGET_OS_lower}" == "native" ]]; then
cmake_install_executorch_libraries
cmake_build_llava_runner
else
echo "Invalid TARGET_OS ($2): ${TARGET_OS}"
fi

# Step2. Generate the PTE
export_llava

# Step3. Run
prepare_image_tensor
run_and_verify
Loading